Zip file tester




















Taking the derivative and finding the zero gives us N OPT , the optimal number of files. H N OPT gives the optimal amount of space to allocate for file headers. From this we see that the output size grows quadratically in the input size. As we make the zip file larger, eventually we run into the limits of the zip format. It happens that the first limit we hit is the one on uncompressed file size. Accepting that we cannot increase N nor the size of the kernel without bound, we would like find the maximum compression ratio achievable while remaining within the limits of the zip format.

The way to proceed is to make the kernel as large as possible, and have the maximum number of files. Even though we can no longer maintain the roughly even split between kernel and file headers, each added file does increase the compression ratio—just not as fast as it would if we were able to keep growing the kernel, too.

In fact, as we add files we will need to decrease the size of the kernel to make room for the maximum file size that gets slightly larger with each added file. Any major improvements to the compression ratio can only come from reducing the input size, not increasing the output size.

Among the metadata in the central directory header and local file header is a CRC checksum of the uncompressed file data. This poses a problem, because directly calculating the CRC of each file requires doing work proportional to the total unzipped size, which is large by design.

It's a zip bomb, after all. We would prefer to do work that in the worst case is proportional to the zipped size. Two factors work in our advantage: all files share a common suffix the kernel , and the uncompressed kernel is a string of repeated bytes.

We will represent CRC as a matrix product—this will allow us not only to compute the checksum of the kernel quickly, but also to reuse computation across files. You can model CRC as a state machine that updates a bit state register for each incoming bit. The basic update operations for a 0 bit and a 1 bit are:. To see why, observe that multiplying a matrix by a vector is just summing the columns of the matrix, after multiplying each column by the corresponding element of the vector.

This representation is called homogeneous coordinates. The matrices M 0 and M 1 are shown. The benefit of a matrix representation is that matrices compose.

Suppose we want to represent the state change effected by processing the ASCII character 'a', whose binary representation is 2.

We can represent the cumulative CRC state change of those 8 bits in a single transformation matrix:. And we can represent the state change of a string of repeated 'a's by multiplying many copies of M a together—matrix exponentiation. For example, the matrix representing the state change of a string of 9 'a's is.

The square-and-multiply algorithm is useful for computing M kernel , the matrix for the uncompressed kernel, because the kernel is a string of repeated bytes.

To produce a CRC checksum value from a matrix, multiply the matrix by the zero vector. The zero vector in homogeneous coordinates, that is: 32 0's followed by a 1. Here we omit the minor complication of pre- and post-conditioning the checksum. To compute the checksum for every file, we work backwards. Continue the procedure, accumulating state change matrices into M , until all the files have been processed.

Earlier we hit a wall on expansion due to limits of the zip format—it was impossible to produce more than about TB of output, no matter how cleverly packed the zip file. It is possible to surpass those limits using Zip64 , an extension to the zip format that increases the size of certain header fields to 64 bits. Support for Zip64 is by no means universal , but it is one of the more commonly implemented extensions.

As regards the compression ratio, the effect of Zip64 is to increase the size of a central directory header from 46 bytes to 58 bytes, and the size of a local directory header from 30 bytes to 50 bytes. Referring to the formula for optimal expansion in the simplified model, we see that a zip bomb in Zip64 format still grows quadratically, but more slowly because of the larger denominator—this is visible in the figure below in the Zip64 line's slightly lower vertical placement.

In exchange for the loss of compatibility and slower growth, we get the removal of all practical file size limits. Suppose we want a zip bomb that expands to 4. How big must the zip file be? Using binary search, we find that the smallest zip file whose unzipped size exceeds the unzipped size of With Zip64, it's no longer practically interesting to consider the maximum compression ratio, because we can just keep increasing the zip file size, and the compression ratio along with it, until even the compressed zip file is prohibitively large.

An interesting threshold, though, is 2 64 bytes 18 EB or 16 EiB —that much data will not fit on most filesystems. Binary search finds the smallest zip bomb that produces at least that much output: it contains 12 million files and has a compressed kernel of 1.

The total size of the zip file is 2. I didn't make this one downloadable, but you can generate it yourself using the source code. Probably the second most common algorithm is bzip2 , while not as compatible as DEFLATE, is probably the second most commonly supported compression algorithm.

Empirically, bzip2 has a maximum compression ratio of about 1. Ignoring the loss of compatibility, does bzip2 enable a more efficient zip bomb? Yes—but only for small files. So it is not possible to overlap files and reuse the kernel—each file must have its own copy, and therefore the overall compression ratio is no better than the ratio of any single file. There is still hope for using bzip2—an alternative means of local file header quoting discussed in the next section.

Additionally, if you happen to know that a certain zip parser supports bzip2 and tolerates mismatched filenames, then you can use the full-overlap construction , which has no need for quoting.

So far we have used a feature of DEFLATE to quote local file headers, and we have just seen that the same trick does not work with bzip2. There is an alternative means of quoting, somewhat more limited, that only uses features of the zip format and does not depend on the compression algorithm. At the end of the local file header structure there is a variable-length extra field whose purpose is to store information that doesn't fit into the ordinary fields of the header APPNOTE.

TXT section 4. The extra field is a length—value structure: if we increase the length field without adding to the value, then it will grow to include whatever comes after it in the zip file—namely the next local file header. Each local file header "quotes" the local file headers that follow it by enclosing them within its own extra field.

It does not chain: each local file header must enclose not only the immediately next header but all headers which follow. The extra fields increase in length as they get closer to the beginning of the zip file. We want a header ID that will make parsers ignore the quoted data, not try to interpret it as meaningful metadata. Zip parsers are supposed to ignore unknown header IDs, so we could choose one at random, but there is the risk that the ID may be allocated in the future, breaking compatibility.

The figure illustrates the possibility of combining extra-field quoting with bzip2, with and without Zip Both "extra-field-quoted bzip2" lines have a knee at which the growth transitions from quadratic to linear. The line stops completely when the number of files reaches , and we run out of room in the extra field. In the Zip64 case, the knee occurs at files, after which the size of files can be increased, but not their number.

It increases the compression ratio of zbsm. Gynvael Coldwind has previously suggested slide 47 overlapping files. Pellegrino et al. We have designed the quoted-overlap zip bomb construction for compatibility, taking into consideration a number of implementation differences, some of which are shown in the table below.

The resulting construction is compatible with zip parsers that work in the usual back-to-front way, first consulting the central directory and using it as an index of files. Among these is the example zip parser included in Nail , which is automatically generated from a formal grammar.

The construction is not compatible, however, with "streaming" parsers, those that parse the zip file from beginning to end in one pass without first reading the central directory. By their nature, streaming parsers do not permit any kind of file overlapping. The most likely outcome is that they will extract only the first file.

They may even raise an error besides, as is the case with sunzip , which parses the central directory at the end and checks it for consistency with the local file headers it has already seen. If you need the extracted files to start with a certain prefix so that they will be identified as a certain file type, for example , you can insert a data-carrying DEFLATE block just before the block that quotes the next header.

Not every file has to participate in the bomb construction: you can include ordinary files alongside the bomb files if you need the zip file to conform to some higher-level format. The source code has a --template option to facilitate this use case. PDF is in many ways similar to zip. It is not concerned with displaying any details of the compressed data stored in the zip file. Here's a small sample from its output:. Error handling is still a work in progress.

If the program encounters a problem reading a zip file it is likely to terminate with an unhelpful error message. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.

Create a free Team What is Teams? Learn more. Test integrity of ZIP file? Ask Question. Asked 6 years, 9 months ago. Active 2 months ago. Viewed 74k times. Improve this question. Marc Rochkind. Marc Rochkind Marc Rochkind 1 1 gold badge 4 4 silver badges 4 4 bronze badges. What about unzip -t? Same behavior as zip.

Add a comment. Active Oldest Votes. Improve this answer. Theophrastus Theophrastus 6 6 silver badges 13 13 bronze badges. There are 2 CRCs per file: local and central. Not the place to explain internal structure of ZIP files. With this program, you can create simple and self-extracting archives with public and encrypted data, split them into multiple linked volumes of different sizes as well as convert them to the most common file formats such as ZIP and 7z.

There are five degrees of file compression to choose from. Each level will determine the size of the resulting archives. Zipware has a very simple interface. Moreover, all the data you work with is protected by the AES and ZipCrypto encryption algorithms. Verdict: I included Haozip on my best free unzip program list because of its powerful features.

This software can run on bit and bit Windows operating systems and allows you to compress, unpack and extract almost all popular archive types.

Due to its simple and multilingual interface, this archiving tool is a perfect option both for amateurs and professionals. Verdict: Zipeg is a free program with an intuitive interface designed specifically for extracting files from compressed archives. With Zipeg, you can view the content of the archive and extract necessary files. Thus, it is possible to save hard drive space. Zipeg allows you to control both the process of extracting the content of an archive and the process of data decompression.

The program is great for professional use as it works even with rare file types. Verdict: Xarchiver is an open-source archive manager with a simple GUI. It is developed specifically for Linux and systems created on the BSD license basis. This free zip software comes with impressive functionality. It allows you to combine multiple files into one catalog as well as compress, read, unpack files and create multi-volume archives. Using this program, you can perform multi-volume data archiving as well as simple tasks like viewing and extracting files.

KGB has a slightly higher compression ratio compared to popular files like 7Zip and RAR, but it requires more time to archive and extract files. This program can create self-extracting archives and uses AES encryption. Currently, AES offers one of the safest encryption methods for data protection available on the market. KGB Archiver is a very user-friendly program.



0コメント

  • 1000 / 1000