pv: Progress Bar for md5sum et al.

Tools that know they will take a long time often come with a built-in progress indicator, but there are other utilities on Linux that often leave the user frustratedly tapping their fingers, wondering how much longer they will have to wait.

Luckily, there is a nifty little tool called pv that will donate a progress bar to any program that can read from standard input or a pipe. pv probably stands for pipe viewer.

1. Simple example: figure out how long an md5sum will take:

pv eternal.avi |md5sum

will display something like

96.5MB 0:00:05 [25.3MB/s] [=======>                                    ]  9% ETA 0:00:48

Notes:

  • pv reads from file and prints to stdout.
  • md5sum reads from stdin.
  • pv outputs the progress bar to stderr so as not to interfere with the piped data. See the man page for ways to customize pv‘s output.
  • since the bottleneck of such an operation is the media you’re reading from, not the CPU, there will be no noticeable overhead.

2. Complex example: add a progress bar to tar/bzip2 compression/decompression:

tar cf – mydir | pv -n -s $(du -sb mydir | awk ‘{print $1}’) | bzip2 >mydir.tar.bz2

Notes:

  • this example is adapted from the pv man page.
  • the -n switch makes pv output only percentage values.
  • no file is passed to pv, so it reads from stdin (piped to the output of tar).
  • on a system with good cache and enough memory, doing the extra du -s mydir shouldn’t hurt much, since tar will go through the entire directory anyway.

Now let’s decompress it:

pv mydir.tar.bz2 |tar xjf –

By now you realize how awesome this is.

3. Fun example: measure /dev/null throughput:

pv /dev/zero >/dev/null

is close to 3.3GB/s on my 3-year-old system.

Notes:

  • this is not a benchmark ™.
  • pv can’t know the size of its input in this case (infinity), so it obviously can’t display an ETA.

pv is a brilliant example of the UNIX philosophy: simple puzzle pieces combining to create useful results. A couple of last-word remarks:

  • There is apparently a very similar tool called cpipe.
  • There are, unfortunately, programs for which you will not be able to use pv. One example is dpkg, which apparently tries seeking in its input, thus failing to work with pipes.
  • Thanks to my boss for pointing me to this awesome tool.

4 Responses to pv: Progress Bar for md5sum et al.

  1. Alex says:

    Interesting tool, I ask myself how it can tell how much work there is left to do.

    I can understand how it can determine the performance (the MB/s part); + you answered my question partially here: “the size of its input in this case (infinity), so it obviously can’t display an ETA.”

    But even if the input is known, is there a special convention that says that “my program’s progress will be the last number printed on the screen” (so that another program can parse that and use it? How does it know the ETA?

    What if my program is doing something else? An actual example: a program is compressing a file, after it is compressed – it encrypts it. Encryption and compression happen at a different speed. If the file is large it will take it a while until the data are compressed (in the meantime pv’s estimation will be based only on samples taken during compression), and then, when encryption kicks in – not only that the size of the input will be smaller (because the file is compressed now), but it will also be different.

    So, how does pv deal with the cases in which it is not that easy to estimate progress?

  2. Constantin says:

    It can only do two things:
    1) get the size of the data that is being piped through (e.g. if it’s a file, it knows its size);
    2) divide the amount of data piped through by the argument to -s.
    (it’s only a pipe throughput measurement tool, no magic here.)

    This might seem limiting at first sight, and if you have a program that compresses + encrypts, pv will probably make inaccurate predictions. But if you follow the Unix philosophy, you would have separate tools for compression and encryption, and pv supports multiple progress bars: http://ivarch.com/programs/pv.shtml

    A side-note about inaccurate progress bars: when installing linux packages, download speed and ETA can easily be determined, but actually installing a 200MB package accounts for the same amount of pixels on the progress bar as a 200KB package, even if the times vary widely. I guess a smart program could assume that the time necessary to install a package is approximately proportional to its size, but there are a lot of things for which we will never get accurate progress bars…

  3. Daniel Serodio says:

    How could pv work with folders instead of files, eg “tar cvf /” ?

  4. Constantin says:

    Daniel,

    in the example above, pv is given the size of the directory explicitly (by calling du).

%d bloggers like this: