- About Us
Yesterday I presented Bonnie++ and IOzone benchmarks for a solid state drive in a client machine and discussed the relative merits of purchasing an SSD over a set of hard disks costing the same money. Today I'll look at deploying and taking advantage of the extremely fast seek time of the SSD on a server.
While most applications of SSDs are for mobile computing and replacing the hard disk in laptops, many of these uses do not seek to take advantage of the one huge advantage of the SSD: speedy seek time. One application on server machines that is extremely seek hungry is running a relational database. Because my test SSD is very limited in size it probably can't store the database tuples themselves, but it should be large enough to store the index. Typical index access involves reading a single index block, working out what the next index block is, and reading that one. Moreover, you might be able to use multiple indexes during the evaluation of a single query, making the speed improvements of storing the index on SSD more pronounced.
For empirical testing to show the gain, if any, of using an SSD to store a relational database index, I used PostgreSQL 8.3.3 on a 64-bit Fedora 9 installation on a 2.2GHz AMD X2 with six 750GB Samsung SATA drives attached to a hardware RAID card for comparison.
PostgreSQL uses tablespaces to allow you to store a table or index on a different disk or filesystem from other database objects. When creating an index, you can use the tablespace clause to explicitly nominate where the index should live. If at a later point you update your primary index SSD, you might like to relegate your older SSD to servicing temporary tables, which are used for sorting large datasets.
Finding a dataset for testing is always a difficult task. I wanted a data set that is freely available, contains millions of tuples, and has data that is likely to be understood by a wide audience. Attempting to test the performance of storing an index on an SSD as opposed to a RAID of conventional hard disks calls for a data set that is sufficient in size to make an index on one of the columns large enough to measure the performance difference that depends on where that index is stored. You are fighting against one of the properties that makes indexing itself efficient: a B-tree can index an extremely large number of tuples and only require three or four seeks of the disk head for a complete index lookup.
The UCI datasets are designed to allow researchers to test supervised machine learning algorithms. The datasets include a few that have a high tuple count that, when imported into a database, will make a main table of moderate size. Importing the raw 1990 USA Census sample produces a single table that is about 1GB in size, with an index on the place of birth column occupying 50MB. For this article I tested the performance of some queries against the main table with the index stored on both hard disk and on the SSD.
The UCI Web site distributes the census data as a raw tab-separated text file. The database schema used is shown below, along with the commands to import the text file, analyse the database, and create an index on hard disk. Because I'm creating a unique ID field as the primary key, the columns must be explicitly listed for the copy command to work. The USCensus1990raw.data.txt file contains a trailing blank line which must be removed for the copy command to work properly.
create database ucicensus1990;
create table census (
AAGE boolean, AANCSTR1 boolean, AANCSTR2 boolean, AAUGMENT boolean, ABIRTHPL boolean, ACITIZEN boolean,
ACLASS boolean, ADEPART boolean, ADISABL1 boolean, ADISABL2 boolean, AENGLISH boolean, AFERTIL boolean,
AGE int, AHISPAN boolean, AHOUR89 boolean, AHOURS boolean, AIMMIGR boolean, AINCOME1 int,
AINCOME2 int, AINCOME3 int, AINCOME4 int, AINCOME5 int, AINCOME6 int, AINCOME7 int, AINCOME8 int,
AINDUSTR boolean, ALABOR boolean, ALANG1 boolean, ALANG2 boolean, ALSTWRK boolean,
AMARITAL boolean, AMEANS boolean, AMIGSTAT boolean, AMOBLLIM boolean, AMOBLTY boolean,
ANCSTRY1 int, ANCSTRY2 int, AOCCUP boolean, APERCARE boolean, APOWST boolean, ARACE boolean,
ARELAT1 boolean, ARIDERS boolean, ASCHOOL boolean, ASERVPER boolean, ASEX boolean, ATRAVTME boolean,
AVAIL int, AVETS1 boolean, AWKS89 boolean, AWORK89 boolean, AYEARSCH boolean, AYRSSERV boolean,
CITIZEN int, CLASS int, DEPART int, DISABL1 int, DISABL2 int, ENGLISH int, FEB55 boolean, FERTIL int,
HISPANIC int, HOUR89 int, HOURS int, IMMIGR int, INCOME1 int, INCOME2 int, INCOME3 int, INCOME4 int,
INCOME5 int, INCOME6 int, INCOME7 int, INCOME8 int, INDUSTRY int, KOREAN boolean, LANG1 int, LANG2 int,
LOOKING int, MARITAL int, MAY75880 boolean, MEANS int, MIGPUMA int, MIGSTATE int, MILITARY int,
MOBILITY int, MOBILLIM int, OCCUP int, OTHRSERV boolean, PERSCARE int, POB int, POVERTY int, POWPUMA int,
POWSTATE int, PWGT1 int, RACE int, RAGECHLD int, REARNING int, RECTYPE text, RELAT1 int, RELAT2 int,
REMPLPAR int, RIDERS int, RLABOR int, ROWNCHLD boolean, RPINCOME int, RPOB int, RRELCHLD boolean,
RSPOUSE int, RVETSERV int, SCHOOL int, SEPT80 boolean, SERIALNO text, SEX int, SUBFAM1 int,
SUBFAM2 int, TMPABSNT int, TRAVTIME int, VIETNAM boolean, WEEK89 int, WORK89 int, WORKLWK int,
WWII boolean, YEARSCH int, YEARWRK int, YRSSERV int,
id serial primary key
copy census ( AAGE, AANCSTR1, AANCSTR2, AAUGMENT, ABIRTHPL, ACITIZEN, ACLASS,
ADEPART, ADISABL1, ADISABL2, AENGLISH, AFERTIL, AGE, AHISPAN, AHOUR89, AHOURS, AIMMIGR,
AINCOME1, AINCOME2, AINCOME3, AINCOME4, AINCOME5, AINCOME6, AINCOME7, AINCOME8, AINDUSTR,
ALABOR, ALANG1, ALANG2, ALSTWRK, AMARITAL, AMEANS, AMIGSTAT, AMOBLLIM, AMOBLTY, ANCSTRY1,
ANCSTRY2, AOCCUP, APERCARE, APOWST, ARACE, ARELAT1, ARIDERS, ASCHOOL, ASERVPER, ASEX,
ATRAVTME, AVAIL, AVETS1, AWKS89, AWORK89, AYEARSCH, AYRSSERV, CITIZEN, CLASS, DEPART,
DISABL1, DISABL2, ENGLISH, FEB55, FERTIL, HISPANIC, HOUR89, HOURS, IMMIGR, INCOME1, INCOME2,
INCOME3, INCOME4, INCOME5, INCOME6, INCOME7, INCOME8, INDUSTRY, KOREAN, LANG1, LANG2, LOOKING,
MARITAL, MAY75880, MEANS, MIGPUMA, MIGSTATE, MILITARY, MOBILITY, MOBILLIM, OCCUP, OTHRSERV,
PERSCARE, POB, POVERTY, POWPUMA, POWSTATE, PWGT1, RACE, RAGECHLD, REARNING, RECTYPE, RELAT1,
RELAT2, REMPLPAR, RIDERS, RLABOR, ROWNCHLD, RPINCOME, RPOB, RRELCHLD, RSPOUSE, RVETSERV, SCHOOL,
SEPT80, SERIALNO, SEX, SUBFAM1, SUBFAM2, TMPABSNT, TRAVTIME, VIETNAM, WEEK89, WORK89, WORKLWK,
WWII, YEARSCH, YEARWRK, YRSSERV )
create index pob on census ( pob );
SELECT relname, reltuples, relpages * 8 / 1024 AS "MB" FROM pg_class ORDER BY relpages DESC;
relname | reltuples | MB
census | 2.45828e+06 | 1010
census_pkey | 2.4583e+06 | 52
pob | 2.45828e+06 | 52
explain select count(*) from census where pob = 33;
Aggregate (cost=33994.42..33994.43 rows=1 width=0)
-> Index Scan using pob on census (cost=0.00..33965.20 rows=11686 width=0)
Index Cond: (pob = 33)
When I measure a particular index for my tests, it is the only one existing. So when testing the index on the SSD, I drop the index on the hard disk beforehand, and vice versa. For each index, I performed three selects: using a cold cache, a hot cache, and a "cross" cache. For the cold cache, I stopped the PostgreSQL database and remounted the disks containing the database and SSD tablespace before I ran the test. For the hot test I ran the same query twice. The cross test is performed by first performing a cold and hot test for a particular state, then performing the query for a different state. The cold and hot test for the first state lets PostgreSQL cache the index pages for the state in RAM. Issuing the query for a different state can then take advantage of some index pages being cached in RAM, but not all the pages that are needed. For a database that you are constantly accessing and whose performance you care enough about to add a SSD for its index, it is quite rare that PostgreSQL would not cache any pages of the index in RAM, which would lessen the advantage of using an SSD. On the other hand, the hot test is unrealistic because there is a very high chance that all the needed index pages are cached. The cross test is designed to allow a compromise: some index pages are likely to be in the cache, but not all the ones that are needed for the query.
As the number of tuples expected to be returned by a query affects how a database resolves the query, the queries I used returned few enough results that PostgreSQL would always consult the index. To be exact, from the list of places of birth, I selected Florida (pob=12), Puerto Rico (pob=72), and Nevada (pob=32) because the SQL explain indicated that an index on pob would be used for these queries. These queries find 54,445, 10,134, and 4,722 tuples respectively out of a total of 2,458,285 tuples in the table. The order for the cross test was Florida followed by Puerto Rico, then Puerto Rico followed by Nevada, then Nevada followed by Florida. The SQL queries are extremely simple, as shown below. For those unfamiliar with PostgreSQL, when you issue a query with the explain keyword prepended, PostgreSQL tells you how it would have gone about executing that query. In this case, I used explain to verify that PostgreSQL would use the index that I wanted it to, as shown by the Index Scan line. The Aggregate that is performed after the index scan simply counts the number of tuples returned.
# explain select count(*) from census where pob = 12;
Aggregate (cost=36740.79..36740.80 rows=1 width=0)
-> Index Scan using pobssd on census (cost=0.00..36708.78 rows=12800 width=0)
Index Cond: (pob = 12)
select count(*) from census where pob = 12;
select count(*) from census where pob = 72;
select count(*) from census where pob = 32;
I formatted the SSD with XFS and mounted it with nobarrier for maximal performance. Note that with nobarrier there is a risk of losing a transaction in the journal if power is unexpectedly cut, but in a server environment, one assumes that the machine is protected from abrupt power failure by a UPS, which removes the need for nobarrier. I used the SSD tablespace for the pobssd PostgreSQL index:
mkfs.xfs -f -i size=512 -l lazy-count=1 /dev/disk/by-id/ata-MTRON_MSD-SATA3025_0HB3331303546-part1
mount -o "nobarrier" /dev/disk/by-id/ata-MTRON_MSD-SATA3025_0HB3331303546-part1 /mnt/ssd
chown postgres:postgres /mnt/ssd/postgresql-data
CREATE TABLESPACE ssd LOCATION '/mnt/ssd/postgresql-data';
drop index pob;
create index pobssd on census ( pob ) tablespace ssd;
On average, for hot cache performance, the SSD and hard-disk-based indexes were very similar in speed, with the SSD ranging from needing 70 to 85% the time that the same query used when storing the index on hard disk. The anomaly here was in retrieving the data for Nevada, where the SSD-based index took only 5 milliseconds, as opposed to about 200ms for a hard-disk-based index when using a hot cache. It is hard to figure why Nevada had different performance to the other states.
Shown below is the graph comparing SSD and hard-disk-based indexes for cold and cross cache. For cold caches, the SSD-based index could resolve the query in just 3-10% of the time that it took when the index was stored on hard disk. The worst-performing query on cold cache was Florida, because it required more tuples to be fetched. Florda is also the case where the SSD needed 10% of the time that hard-disk-based indexes did, perhaps because there was more work being performed that was aimed at the base table rather than the index itself. The cross queries are interesting because they show that the SSD can execute the same query in 20% or less of the time that hard-disk-based queries needed, even when some pages of the index have been cached in RAM by PostgreSQL from a previous query.
Although the maximum capacity for SSDs is still much smaller than conventional hard disks, and they command an extreme premium in terms of dollars per gigabyte, the seek time offered by a single SSD dominates what can be obtained from a hardware RAID of six hard disks. If you are running a database server and have already increased the amount of RAM that you can dedicate to caching index pages to the limit but still need better performance, you might consider adding a 32 or 64GB SSD to the machine to boost your database index performance.
Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.