Choosing a format for archiving and compressing files nowadays presents a hurdle to some users as there are many possibilities. This article aims to present some of the top choices that both ordinary end users and specialists can use in order to archive and/or compress their data. We present both the most popular solutions and those that are used in some specific cases giving a thorough perspective on this topic.
What Is the Difference Between Archive and Compression ?
Among computer users the two terms archiving and compression are often confused and intermixed. Тhey are processes that take user data and process them with the required algorithms. The input and output files are always computer files.
The archiving process is when multiple files are placed in a single file contained with metadata. This is particularly useful with large directory structures as archives can correctly preserve a tree of interlinked data. Nowadays every single archive format contains error detection and correction data as a mandatory feature. Encryption is also available on most of them.
Compression on the other hand is the process of reducing the file size of the chosen targets. This is done via a special algorithm that can be of two types — loss or lossless. The lossless compression methods aim to reduce the file size by identifying and eliminating statistical redundancy, while with the lossless compression unnecessary information is removed.
Most of the file formats discussed in this article have combined capabilities — they can either archive, compress or do both. They are ranked according to their popularity and usage. We recommend that computers read the whole guide before choosing which format to use for their files as there may be specifics for some of the data.
1. The ZIP File Format
The ZIP format is probably the most popular one among desktop users as it is widely supported out-of-the-box by most modern modern operating systems (including most versions of Microsoft Windows and Mac OS X). It is both an archive and a compression format featuring a lossless algorithm. The format was originally developed back and released to the public domain that in 1989. Its intended purpose was to be a replacement for an earlier format called ARC.
This has become a standard by the ISO group and thus such versions are prescribed with the following restrictions:
- Files in ZIP archives may only be stored uncompressed, or using the “deflate” compression
- The encryption features are prohibited
- The digital signature features are prohibited
- The “patched data” features are prohibited
- Archives may not span multiple volumes or be segmented
All compressed and/or archived files are named with the .zip extension.
2. The RAR File Format
The RAR format is one of the most popular compression and archiving of data, as well as error recovery and file spanning. It is a proprietary algorithm however the WinRAR application for Windows is one of the most popular software downloaded by end users which provides support for the file format. One of the reasons why it has become widespread both among end users and administrators is that the end user software supports many options which may not be available with other formats. RAR files can be encrypted with a very strong cipher which makes them very difficult to break.
Compared to the ZIP file format RAR is composed of several additional features:
- Multi-volume splitting of the archive in standard file sizes according to the user’s requests.
- Solid Mode Compression
- strong AES-256 encryption
- Recovery records
- Unicode Support
3. The Tar.GZ File Format
Upon first seeing this extension many users might be confused into thinking that it is a single format. However this is one of the cases where a combination of two technologies is used — the tar archive format and gzip compression. This is a very popular technique in the UNIX and UNIX-like operating system “world” where such files are referenced as “tarballs”. The reason why they are used is that they are developed as part of the GNU Project which is the collection of utilities of which the Gnu/Linux operating system is based on. They are open-source creations and as such are are adopted by various projects. Another advantage of this combination is that it can be freely by most free utilities.
4. The 7Z File Format
The 7z format is a new archive format which is designed to provide a high compression ratio. It is another option which is based on an “open architecture”, features advanced AES-256 encryption. The output files can use any combination of compression/conversion/encryption and supports very large file sizes. The latest version of the format supports the following compression methods:
- LZMA — Improved and optimized version of LZ77 algorithm
- LZMA2 — Improved version of LZMA
- PPMD — Dmitry Shkarin’s PPMdH with small changes
- BCJ — Converter for 32-bit x86 executables
- BCJ2 — Converter for 32-bit x86 executables
- BZip2 — Standard BWT algorithm
- Deflate — Standard LZ77-based algorithm
The 7Z format is popular with end users as it supports multi-threading operations — this means that all compression and decompression processes will take advantage of the available hardware resources.
5. The JAR File Format
The JAR file stands for Java Archive and it is the main package format used to contain Java class files and related data. It is based on the aforementioned ZIP file format and its main purpose is to allow the deployment of entire applications in a single file. Java is one of the most popular programming languages and as such these files are widely used by end users. The contents of the JAR files can be extracted using almost all standard decompression utilities which makes it a popular choice for other uses as well. The JAR files can be digitally signed and if desired obfuscated — this can make reverse engineering of the contents very difficult.
6. The APK File Format
APK files are the standard format for the Android application package which is used to distribute and install packages (software) for the mobile operating system. These files are archives by nature and are based on a ZIP file format so there is a form of compression. The APK data contained within are as a result useful mostly for the mobile operating system. However as they confront to a standard file format they can be accessed by most end user utilities used for decompression. The files contain within a manifest, certificate and a SHA1 digest of the manifest which is used to verify its contents.
7. The CAB File Format
A CAB file stands for Cabinet which is a well-known format used by Microsoft Windows applications. It supports lossless data compression and built-in certificates that guarantee the archive integrity. They can employ several compression algorithms:
- DEFLATE — by the author of the ZIP file format
- Quantum compression — licensed from the author of the Quantum archiver
- LZX — used by Microsoft
These archives have the option of reserving empty space in the archive for individual files which is used in some special purposes — for example writing signatures and temporary data. Several installation technologies used on the Microsoft Windows ecosystem rely on the CAB files — the Windows Installer, Setup API and others. Most popular end-user software can freely access the contents of the CAB data.
8. The DMG File Format
The DMG file format is a special disk image format which is made by Apple and made specifically for macOS, its name stands for Apple Disk Image. It is primarily used as an archive format, optionally it can also be configured to compress the data. In comparison with other common formats when opened it does not open like a regular file,but is mounted as a volume and appears like a real disk. The data that is created using this particular file format can be protected with a password. As it is based on a standard specification made by Apple several tools have been made to programmed to support it by reverse engineering it. This is available both in free and trial/paid programs.
9. The KGB File Format
The .KGB file format is a file format used both for archiving and compression purposes primarily supported by a program called the KGB Archiver for Microsoft Windows. Subsequently support for this format was added in other applications as well, the primary advantage it has is the high compression ratio. However in order for this to be a factor the program will place a heavvy load on the performance of the computers during the archiving operations — both the CPU and memory load will be higher than other compression methods.
Since its creation the main program has been made open-source which has allowed the .KGB file format to be supported by many other programs. For added security the compressed files can be encrypted with an AES-256 cipher.
10. The PEA File Format
The PEA file format stands for Pack Encrypt Authenticate which is a general-purpose format allowing the users to compress their data. A convenient feature it supports is the creation of multiple volumes which can be conveniently split between mediums. According to the developers behind it it aims to offer computer users the ability to easily carry their files over discs, flash drives and etc. The privacy and security of the files is guarded via the inclusion of encryption options and checksums. creation.