How-To Geek Forums / Geek Stuff
File compression
(4 posts)Think of it like a piece of furniture you buy in a box about 36" x 10" x 4". But, when you open it and follow the instructions you get a bookcase that is 3' x 1' x 5' It's the instructions that take files apart and put them back together again. These instructions are called algorithms. Each compression software maker has their own algorithm.
Compression is not difficult to do or understand. Basically you look for repeated patterns of bits/characters. You start the compressed file with 2 numbers, the next 'pattern' start, the count of bytes in the pattern, and then the pattern. Self-extracting compressed file has a lead program that does it, ZIP/7ZIP files have a header describing the filetype and a point to the first pattern start.
Not all files benefit from compression. Pictures and already compressed files grow larger usually (picture files that reduce 'resolution' are a different type of compression).
Easy way to think of this is consider a document. Generally speaking you will not repeat many characters. However, some documents fill out the complete line and lines (even if blank). In those cases you'd see many 'blanks' in a row, and that would be the savings.
In files that are install files, they have many files when 'decompressed', and those files, like any file on your computer are on the disk in 'clusters'. 4096 byte are 'normal'. Great, but a 10 byte file also takes up that space with 'zero padding' for the last 4086 byte. So that file would turn into a header of maybe 10 bytes, 2 byte for the data header (assuming no dups), the actual total data for the file, 10 bytes, another 2 bytes of counters, one indicating the last field, then a count of 4086 followed by 0, so the size in the compressed file for this file would be 25 bytes instead of 4096. Of course if there were text data such as a HELP or even Screen text, there could be significant savings there.
You can probably find more details Googling Disk Compression or File Compression.
Irv S.
