Jump to content

How Much Faster Is Making A Tar Archive Without Gzip?


aum

Recommended Posts

About Gzip And Tar

Gzip Logo

Everybody on Linux and BSD seems to use a program called gzip, frequently in conjunction with another program called tar. Tar, named from Tape ARchive, is a program which copies files and folders (“directories”) to a format originally designed for archiving on magnetic tape. But tar archives also can be saved to many other file systems besides tape. Tar archives can be saved to normal hard drives, solid state drives, NVMe drives, and more.

 

When making an archive, people frequently want to minimize the archive’s size. That’s where gzip comes into play. Gzip reduces the size of the archives so they take up less storage space. Later, the gzipped tar archives can be “unzipped.” Unzipping restores the tar archives to their original size. While unzipping, the tar program can be used again to “extract” or “untar” the archive. Extraction hopefully restores the archived original files exactly as they had been when the archive was created.

 

Besides archiving for long term storage, many people frequently use tar and gzip for short term backup. For example, on my server, Darkstar, I compile and install many programs. Before compiling, I use tar to make a short term backup of how things were before the compile and install.

Three Good Reasons To Compile

First, compiling gets us the most current source code for the programs. Second, once we have done compiling a few times, compiling a program from its latest sources can be easier than figuring out how to install an often older version with our distribution’s package manager. Third, compiling ourselves results in having the program sources readily available.

 

The programs that I compile on Darkstar usually live in /usr/local. Before I put a new program into usr/local I like (in addition to my regular backups of Darkstar) to make an archive of /usr/local as it exists just before the new software addition. With a handy /usr/local archive, if something goes crazy wrong during my new install, it’s easy to revert.

Creating Pre-Compile Backups Can Take Too Long

Lately, as more software has been added to /usr/local, it’s been taking too long to make the pre-compile archive, about half an hour.

Recently, using the top(1) command I watched an archive being formed. I noticed that gzip was reported as using 100% of one cpu throughout the archive formation.

How Much Faster And Bigger Are Plain Tar Archives Made Without Gzip?

I wondered how the overall time required to make my pre-compile archive would change if I did not use gzip. I also wondered how much bigger the archive would be. Below are shown the data and the analysis of the surprisingly large creation time difference I found. The archive size difference also is a lot, but nowhere near as much as the creation time difference.

Creation Time Data

I ran the pre-compilation archive twice, once with gzip and once without gzip. I made a line numbered transcript of both tests.

 

000023 root@darkstar:/usr# time tar cvzf local-revert.tgz local
000024 local/
[ . . . ]
401625 local/include/gforth/0.7.3/config.h
401626
401627 real 28m11.063s
401628 user 27m1.436s
401629 sys 1m21.425s
401630 root@darkstar:/usr# time tar cvf local-revert.tar local
401631 local/
[ . . . ]
803232 local/include/gforth/0.7.3/config.h
803233
803234 real 1m14.494s
803235 user 0m4.409s
803236 sys 0m46.376s
803237 root@darkstar:/usr#

 

This Stack Overflow post explains the differences between the real, user, and sys times reported by the time(1) command. The “real” time is wall clock time, so the “real” time shows how long our command took to finish.

Gzip Took 22 Times Longer!

Here, we can see that making the archive with gzip took approximately 28 minutes. Making the archive without gzip took only 1.25 minutes. The gzipped archive took 22 times longer to make than the unzipped archive!

Archive Size Data

Now let’s check the archive sizes.

root@darkstar:/usr# ls -lh local-revert.t*
-rw-r--r-- 1 root root 22G Oct 4 05:22 local-revert.tar
-rw-r--r-- 1 root root 10G Oct 4 05:20 local-revert.tgz
root@darkstar:/usr#

 

The gzipped archive is 10 gigabytes and the plain, not zipped tar archive is 22 gigabytes.

Gzip’s Compression Was 55%.

The zipped archive was compressed by 55%. That’s a lot of compression!

Conclusion

On Darkstar, there is abundant extra disk space. So having an archive that is twice as big but created 22 times faster might be the best choice. Going forward, before compiling, I will skip doing any compression at all when backing up /usr/local to enable revert. Now I won’t have to wait that half an hour any more!

Additional Reflections

Creation time and archive size results would be expected to differ according to the types of files involved. For example, unlike the files in Darkstar’s /usr/local, many image file formats already are compressed, so additional compression doesn’t reduce their size very much.

 

As I was preparing this article, I found out about pigz. Pigz (pronounced “pig-zee”) is an implementation of gzip which allows taking advantage of multicore processors. Maybe pigz soon will be a new neighbor in Darkstar’s /usr/local.

 

Another approach to speeding up compression is to use a different compression program than gzip. There are quite a few which are popular, such as bzip2 and xz. These other compression programs can be called with tar’s -I option.

 

Of course it is one thing to change the compression program with tar’s -I option and another thing to make tar itself work in parallel. Here is a Stack Exchange post about tarring in parallel. I will have to try that.

 

Finally, unlike when we get our sources and our compiled programs separately, it seems fully clear that the sources we compile ourselves are the sources to the programs we’re actually running. However, way back in 1984, Ken Thompson recognized that the programs we compile ourselves sometimes can be very different than what we expected.

 

Source

Link to comment
Share on other sites


The simple reason Gzip + tar took longer.. is this .. TAR does not compress the files.. it stores them..  aka .TAR

Gzip +tar simple works like this.. it archives the files into a TAR files then compresses using Gzip to compress the archive. .TGZ

 

funny thing.. if you take that TAR file and then compress it with say 7-zip or RAR you would get a higher compression ratio than 55%

closer to 75%

 

Edited by andy2004
Link to comment
Share on other sites


As mentioned, tar does not compress.  The compression utilities cannot handle folders (aka directories).  They can only compress a single file.  So, you combine the folders into a single file by passing them through tar, then you compress the tar file using one of the compression utilities.

 

Modern zip and 7-zip have been updated to handle folders without a tar being created.

Link to comment
Share on other sites


Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...