How to recover data from .7z files

Suppose you have a .7z file, and the archive is “corrupt”. If it is not corrupt, but missing the end of the archive you will get an error. Here is how you can recover at least some of the data from that.

First step is to take the bad file and see how long the archive is supposed to be. The header has a pointer to the end header in it. At offset 0x0C in the start header is the offset of the end header, stored in 8 bytes.  The next 8 bytes is the length of the end header, which means the end of the file.

Now that you know how big the file was supposed to be, you can recover all of the data that you have but all of the files will be concatenated together. You don’t know the length of any of them, or even how many separate files there are. The important thing though is that you can extract the data.

The next step is to create a new archive that when compressed is larger than the original archive. You must have a single file as the source and can easily do this by using random data, it will not compress and you wont have to guess at how big to make it. The important part is to have a dictionary size that is larger than the original archive. Watch out for just using the maximum dictionary size, the higher the dictionary size, the longer it will take to compress. The compression type needs to match as well, LZMA, LZMA2, etc.

Now you will have two archives, the bad one and a larger good one. To extract the data you need to take the header from the good data, the compressed stream from the bad file, and the remaining data from the good file and put them into their own files.

To do this first take the first 32 bytes of the good file and make it into its own file. Then remove the first 32 bytes of the bad file and save the remaining bytes as the compressed data stream. Finally, add the size of the header and compressed datastream, and skip that far into the good file, saving the remaining into a third file.

You now have  a valid header, a valid data stream, and the remaining data which has a broken data stream, but a valid dictionary and end header. Name the files with the same name, extensions being .7z.001, .7z.002, .7z.003. Use p7zip(or your favorite) to open the .001 file. It will extract a file with the filename of the good archive, but the data from the bad file. When it reaches the broken compression stream in the .003 file you will get a CRC error but you will be left with all of your missing data in a single file.