This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: System Administration

Using Bonnie++ for filesystem performance benchmarking

By Ben Martin on July 01, 2008 (9:00:00 AM)

Share    Print    Comments   

Bonnie++ allows you to benchmark how your filesystems perform various tasks, which makes it a valuable tool when you are making changes to how your RAID is set up, how your filesystems are created, or how your network filesystems perform.

Bonnie++ is available for openSUSE 10.3 as a 1-Click, for Ubuntu Hardy, and in the standard Fedora 9 repositories. I installed Bonnie++ from the 64-bit Fedora 9 repositories.

The packages for Ubuntu and Fedora both install Bonnie++ into /usr/sbin, while openSUSE installs into /usr/bin. Bonnie++ will complain and fail to work if invoked as the root user, but if Bonnie++ is installed into /usr/sbin instead of /usr/bin, to invoke Bonnie++ as a regular user you will probably have to include its full path. Bonnie++ uses autoconf to generate its Makefile, and the install-bin target is hardwired to install Bonnie++ into sbin, so to package it in /usr/bin you have to move it there after installation even if building from source.

Bonnie++ benchmarks three things: data read and write speed, number of seeks that can be performed per second, and number of file metadata operations that can be performed per second. Metadata operations include file creation and deletion as well as getting metadata such as the file size or owner (the result of a fstat(2) call).

If you are deciding what filesystem to use for /tmp, then the metadata throughput might be the most important benchmark. On a filesystem where you expand 20MB tarballs, the write performance may be less important than the number of files you can create per second. The file metadata benchmarks are also important if you are planning to run a Squid web proxy or a mail server using the maildir format to store each email in a single file. Such applications perform many file metadata-intensive operations, and often each of the individual files are fairly small, meaning that bulk transfer speeds do not play as much of a role as metadata updates.

The metadata benchmarks are also important if you are creating a new filesystem on top of a RAID device. Journaling filesystems can use write barriers to protect their journaled metadata. If you are using a hardware card to provide RAID functionality, these barriers might force the entire cache on the RAID card and all disks assembled into the RAID on that card to be synced, which can lead to extremely poor performance. As a real-world example, an Adaptec 31205 12-port card with a RAID-6 and XFS using barriers can support less than 100 file creates per second in tests I recently performed. Explicitly disabling barriers in XFS when mounting the same filesystem gives closer to 6,000 file creates per second. Though I'm not advocating disabling barriers in XFS, in this particular hardware configuration it could be done without data loss risk.

The number of seeks per second should be fairly bound by your hardware. If you test a single disk and then a RAID, you should expect that a filesystem on the RAID will give an increase in the number of seeks in proportion to how the RAID is configured. For example, I configured a single disk volume, a RAID-0 stripe set with two disks and with six disks. The single disk could get about 200 seeks per second, the two-disk stripe could perform 340 seeks per second, while and the six-disk could perform 533 seeks per second.

The raw data read and write benchmarks include both a per-character and per-block figure. The former uses standard library calls that perform the read or write operations a single character at a time, while the later perform function calls to transfer larger blocks at once.

The rewrite test is important if you are running applications that modify data in place, and doubly important if you are running such applications on a parity RAID (such as RAID-5 and RAID-6). The rewrite test reads a block of data, changes it slightly, and writes it back. The blocks are cited to be BUFSIZ large, which on my 64-bit Fedora 9 installation is 8,192 bytes.

To see why the rewrite test is important on a parity RAID, imagine that you are creating a RAID-5 using Linux software RAID on four disks. The RAID will be created by default with a 64 kilobyte (KB) chunk size, which means that over the four disks there will be three chunks of 64KB and one 64KB chunk being the parity, as shown in the diagram above, where disks 1-3 are used for data and disk 4 is storing the parity. Of course the disk that stores the parity will change depending on which band in the diagram you are accessing. If you modify only 8KB as the rewrite test does, shown as the red rectangle in the figure, then you are forcing the 64KB parity chunk to be recalculated again for this modification and the original data chunk of 64KB to be written along with your 64KB parity chunk. In the case of the diagram, both the dark gray band on disk 1 as well as the pink parity band on disk 4 will have to be written after the parity is recalculated.

You might see some of the metadata benchmarks reported by Bonnie++ as +++++ instead of a real number per second. This happens when that particular benchmark completes too quickly. To overcome this for benchmarking a particular setup, use the -n option to specify that more files should be used for the metadata tests. With the -n option you can specify up to four parameters. The first is the number of files to create per directory, specified in multiples of 1,024; the second two numbers are the maximum and minimum size of each file used for testing; and the last is how many directories to create, each containing the number of files you nominated with the first parameter. The defaults are to create 16,384 files with a size of 0 bytes in a single directory, which is equivalent to using -n 16:0:0:1 as a parameter to Bonnie++.

You can process the comma-separated output at the bottom of a Bonnie++ run with the bon_csv2html command to format your benchmark results for presentation on the Web. You might also like to use the -q option to Bonnie++ to redirect everything but the comma-separated data to stderr so that you can pipe the stdout of Bonnie++ directly into bon_csv2html to generate HTML output.

By default the name of the machine you're testing is reported for the benchmark run. You can override this with the -m option to record not only the machine name but also information about the filesystem configuration itself.

As each test is performed Bonnie++ prints a fresh line informing you which particular test it is up to in the benchmark process. The results are shown at the end in both a text table and as a comma-separated list. The results shown below required the number of files for metadata testing be increased using -n 256 so that the read metadata could be reported by Bonnie++. If I didn't supply a -n 256 parameter, both the sequential create read and random create read operations could be performed too quickly, and thus Bonnie++ would only report those numbers as "+++++".

$ rm -rf /tmp/foo $ mkdir /tmp/foo $ /usr/sbin/bonnie++ -d /tmp/foo Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP chunklog 4G 59330 97 269236 64 109173 31 54711 97 290233 38 509.6 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 256 3862 96 342257 98 10759 69 3704 94 339523 98 1771 11 chunklog,4G,59330,97,269236,64,109173,31,54711,97,290233,38,509.6,1,256,3862,96,342257,98,10759,69,3704,94,339523,98,1771,11

As most applications that perform heavy IO will not read or write data in single characters, the block read, write, and rewrite figures are the most interesting data transfer figures. The %CP column reports the percentage of the CPU that was used to perform the IO for each test. The file metadata tests are shown in the second row of results; in them, files with a zero byte size are created, read (with stat(2)), and finally deleted. The create, read, delete metadata tests are performed using file names that are sorted numerically and those are just random numbers. Some filesystems perform much better if an application creates and accesses files in a specific order. Because Bonnie++ performs the metadata tests twice, you can see whether a filesystem has optimized accesses to files by performing accesses in sorted file name order.

The final line printed by Bonnie++ is the same data that the table contains, formatted as comma-separated values. The Bonnie++ distribution includes the bon_csv2html Perl script, which takes the comma-separated values reported by Bonnie++ and generates an HTML page displaying them. The below table was generated by piping the comma-separated value line from the above test into bon_csv2html.

Sequential Output Sequential Input Random
Seeks
Sequential Create Random Create
Size:Chunk SizePer CharBlockRewritePer CharBlockNum FilesCreateReadDeleteCreateReadDelete
K/sec% CPUK/sec% CPUK/sec% CPUK/sec% CPUK/sec% CPU/ sec% CPU/ sec% CPU/ sec% CPU/ sec% CPU/ sec% CPU/ sec% CPU/ sec% CPU
chunklog4G59330972692366410917331547119729023338509.6125638629634225798107596937049433952398177111

The Bonnie++ benchmark is a great yardstick to see if you are getting the performance from your hardware that you think you should. You can search around for Bonnie++ results that other people have produced and published using similar hardware. If you are making changes to how your RAID or filesystem is created, Bonnie++ is invaluable for testing whether the changes you think should improve performance actually have a noticeable and positive effect.

Tomorrow, I'll show you how to take the results of multiple Bonnie++ runs and generate a graph showing the relative changes between your benchmarks, so you can instantly see whether your modifications are positive and by how much.

Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.

Share    Print    Comments   

Comments

on Using Bonnie++ for filesystem performance benchmarking

Note: Comments are owned by the poster. We are not responsible for their content.

Using Bonnie++ for filesystem performance benchmarking

Posted by: Anonymous [ip: 76.171.185.195] on July 01, 2008 03:19 PM
Ben,

Thanks for the great article. I have not tried Bonnie++. But, I have tried iozone for doing benchmarking of different RAID's and storages. It works similar to how it is explained in this article.

Ramesh
http://www.thegeekstuff.com

#

Good article

Posted by: TK on July 01, 2008 04:48 PM
Good stuff! I've seen benchmark numbers run on new filesystems and didn't have a good foundation on how they culled those numbers. Glad to run across this one!

#

I think dbench takes a better approach

Posted by: Anonymous [ip: 96.14.250.133] on July 02, 2008 07:17 AM
Unfortunately, Bonnie++ is single-threaded for all but the random-search test. It shows how fast files can be read and written, but doesn't scale well for SMP or CPU speed. On my 2-way AMD64 system with 2G RAM and SATA drives, the default number of files for create/stat/delete was way too low; stat() almost always came from cache, and that test would end within 1/2 second, giving no speed results. In order to overcome this, I had to specify a number of files so high as to make the test almost unbearably long for a personal research project.

Contrast this with the approach dbench takes: start N threads, then each performs some interesting mix of create, stat, delete, rename, read, write, and other things, and count the operations through the test duration. Not only were filesystem differences greatly magnified, I was able to determine how long a test should run, rather than fiddle with an iteration count.

(Long story short: using an external journal, preferably on a different controller; SMP helps lots in this regard. With all other factors being the same, XFS won hands-down, with >400M/sec throughput using the noop I/O scheduler. ext3 won with an internal journal, but still not as fast as XFS+ext journal. JFS lost every test. NB: XFS+ext journal *cannot* be the very first mount Linux does during boot; initrd hacking is necessary to make XFS+ext journal the root mount. YMMV, caveat emptor.)

#

mkfs and mount options

Posted by: Anonymous [ip: 96.14.250.133] on July 02, 2008 07:25 AM
"Journaling filesystems can use write barriers to protect their journaled metadata. If you are using a hardware card to provide RAID functionality, these barriers might force the entire cache on the RAID card and all disks assembled into the RAID on that card to be synced, which can lead to extremely poor performance. As a real-world example, an Adaptec 31205 12-port card with a RAID-6 and XFS using barriers can support less than 100 file creates per second in tests I recently performed. Explicitly disabling barriers in XFS when mounting the same filesystem gives closer to 6,000 file creates per second. Though I'm not advocating disabling barriers in XFS, in this particular hardware configuration it could be done without data loss risk."

ext3 has not used write barriers; IIRC the devs are just now exploring the utility of write barriers, and determining how to schedule them.

As far as XFS, you may also check out the "-l lazy-count=1" option to mkfs.xfs. From the man page:

"This changes the method of logging various persistent counters in the superblock. Under metadata intensive workloads, these counters are updated and logged frequently enough that the superblock updates become a serialisation point in the filesystem. The value can be either 0 or 1.

"With lazy-count=1, the superblock is not modified or logged on every change of the persistent counters. Instead, enough information is kept in other parts of the filesystem to be able to maintain the persistent counter values without needed to keep them in the superblock. This gives significant improvements in performance on some configurations. The default value is 0 (off) so you must specify lazy-count=1 if you want to make use of this feature."

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya