This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Databases

Using MySQL as a filesystem

By Ben Martin on February 15, 2008 (4:00:00 PM)

Share    Print    Comments   

With MySQLfs you can store a filesystem inside a MySQL relational database. MySQLfs breaks up the byte content of files that you store in its filesystem into tuples in the database, which allows you to store large files in the filesystem without requiring the database to support extremely large BLOB fields. With MySQLfs you can throw a filesystem into a MySQL database and take advantage of whatever database backup, clustering, and replication setup you have to protect your MySQLfs filesystem.

MySQLfs does not appear to be offered in binary packages for Fedora, openSUSE, or Ubuntu, so you must build it from source. To build MySQLfs you'll need to install mysql-devel and fuse-devel packages first. (FUSE allows a normal program to expose a filesystem through the Linux kernel. This means that you can immediately use a FUSE filesystem with any application.) If you are missing some devel packages then the errors generated by configure might be a little bit cryptic. For example, on my Fedora 8 64-bit machine, I was at first unable to link with mysqlclient_r because FUSE development packages were not found; this error generates a single -L in the linker flags, which makes linking a mysqlclient test program fail. The build also defaults to using -Wall -Werror, which will cause the build to fail at the first warning generated. You can fix these issues by compiling without -Werror and making sure that you have all the necessary development packages installed. Once you have built MySQLfs from sources, the make install command simply places MySQLfs into /usr/local/bin.

The commands below set up a MySQL database to use with MySQLfs. Note that make install does not install the schema.sql file anywhere on your filesystem by default, so you'll have to copy that schema someplace yourself so you can set up more MySQLfs databases later.

# mysql -p mysql> CREATE DATABASE mysqlfs; mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON mysqlfs.* TO mysqlfs@"%" IDENTIFIED BY 'foobar'; mysql> FLUSH PRIVILEGES; mysql> Bye # mysql -uroot -p mysqlfs < /tmp/mysqlfs-0.4.0-rc1/schema.sql

The database schema consists of three tables: tree, inodes, and data_blocks. The tree table maps an inode to its parent inode. Part of the table is shown below.

In the schema shown below I have imported a file with the path guten/alice13a.txt. The filesystem naming hierarchy is represented in the tree table, the alice13a.txt file is stored in the guten directory, which is created under the root of the MySQLfs filesystem. The inode table contains all the information that one would expect from a stat(2) call. The data_blocks table is used to store the actual file byte content.

mysql> select * from tree; +-------+--------+----------------------+ | inode | parent | name | +-------+--------+----------------------+ | 1 | NULL | / | | 2 | 1 | guten | | 3 | 2 | alice13a.txt | ... mysql> select * from inodes; +-------+-------+---------+-------+-----+-----+------------+------------+------------+----------+ | inode | inuse | deleted | mode | uid | gid | atime | mtime | ctime | size | +-------+-------+---------+-------+-----+-----+------------+------------+------------+----------+ | 1 | 0 | 0 | 16877 | 0 | 0 | 1201155861 | 1201155861 | 1201155861 | 0 | | 2 | 0 | 0 | 16888 | 500 | 500 | 1200108244 | 1200108244 | 1201156234 | 0 | | 3 | 0 | 0 | 33184 | 500 | 500 | 1200108239 | 1200108239 | 1201156234 | 153477 | ... mysql> describe data_blocks; +-------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+---------+-------+ | inode | bigint(20) | NO | PRI | | | | seq | int(10) unsigned | NO | PRI | | | | data | blob | YES | | NULL | | +-------+------------------+------+-----+---------+-------+

Because FUSE allows you to mount filesystems that can behave in strange ways, the ability to mount custom FUSE filesystems is normally limited to users who have been explicitly added to the fuse group. The below session shows an initial denied attempt to mount a FUSE filesystem and the commands to remedy the situation. I had to log in again after adding myself to the fuse group. The initial MySQLfs filesystem and database is empty, but the df command shows that the kernel knows that the filesystem is mounted. The fusermount command unmounts the user-created FUSE filesystem at the end of the session.

$ mkdir ~/mysqlfs $ mysqlfs -ohost=localhost -odatabase=mysqlfs ~/mysqlfs * Opening logfile 'mysqlfs.log': OK fuse: failed to open /dev/fuse: Permission denied $ su -l root# usermod -a -G fuse ben root# exit $ exit ... $ id uid=500(ben) gid=500(ben) groups=492(fuse), ... $ mysqlfs -ohost=localhost -odatabase=mysqlfs ~/mysqlfs * Opening logfile 'mysqlfs.log': OK $ ls -ld mysqlfs drwxr-xr-x 1 root root 0 2008-01-24 16:24 mysqlfs $ df -h mysqlfs . Filesystem Size Used Avail Use% Mounted on mysqlfs 0 0 0 - /home/ben/mysqlfs /dev/sdc3 16G 4.4G 11G 31% /home $ ls -l mysqlfs total 0 $ fusermount -u mysqlfs

Now that we can have set up the ability to mount MySQL as a filesystem, let's copy some files into the MySQL database and interact with the filesystem. In the session below, I copy some text files from Project Gutenberg into MySQLfs and verify that when read back they have the same MD5 checksum as the originals. I then copy the Linux source tarball into MySQLfs to get an idea of how well MySQLfs handles a 44MB file.

Copying the kernel from /tmp to a backup file in /tmp took about 2 seconds on a cold disk cache. Copying the kernel into a MySQLfs took about 20 seconds. Performing the same test again right away, in order to have the source file in disk cache, the /tmp backup took about 0.3 seconds, while the MySQLfs copy still took 20 seconds. This indicates that write performance is the major bottleneck in MySQLfs. After I read back the kernel, MySQL seemed to cache some data, which made response time faster as I performed more tests. This could make MySQLfs an interesting option if you have a filesystem that is read often, as you could take advantage of MySQL caching.

~]$ cp -av /.../guten ./mysqlfs/ `/.../guten' -> `./mysqlfs/guten' `/.../guten/alice13a.txt' -> `./mysqlfs/guten/alice13a.txt' `/.../guten/boysw10.txt' -> `./mysqlfs/guten/boysw10.txt' `/.../guten/dmoro11.txt' -> `./mysqlfs/guten/dmoro11.txt' ~]$ cd ~/mysqlfs/guten guten]$ ls -l -rw-r----- 1 ben ben 153477 2008-01-12 13:23 alice13a.txt -rw-rw---- 1 ben ben 48923 2008-01-12 13:23 boysw10.txt -rw-rw---- 1 ben ben 259214 2008-01-12 13:23 dmoro11.txt guten]$ md5sum * 135e06ad31b169065bccbf03ec7236f2 alice13a.txt 7dd30f1b37e32cdb5d21fe992bbf248d boysw10.txt 87c05f11193c0e05b3d0dec0808a0450 dmoro11.txt guten]$ md5sum /.../guten/* 135e06ad31b169065bccbf03ec7236f2 /.../guten/alice13a.txt 7dd30f1b37e32cdb5d21fe992bbf248d /.../guten/boysw10.txt 87c05f11193c0e05b3d0dec0808a0450 /.../guten/dmoro11.txt guten]$ cd .. mysqlfs]$ time cp /tmp/linux-2.6.23.tar.bz2 . real 0m16.278s user 0m0.006s sys 0m0.531s mysqlfs]$ time cat linux-2.6.23.tar.bz2 >/dev/null real 0m0.502s user 0m0.004s sys 0m0.035s mysqlfs]$ time dd if=linux-2.6.23.tar.bz2 of=/tmp/junk bs=1024 count=1024 1048576 bytes (1.0 MB) copied, 0.0200973 s, 52.2 MB/s real 0m0.058s user 0m0.003s sys 0m0.013s mysqlfs]$ time dd if=linux-2.6.23.tar.bz2 of=/tmp/junk bs=1024 count=1024 skip=9000 1048576 bytes (1.0 MB) copied, 0.0214207 s, 49.0 MB/s real 0m0.031s user 0m0.001s sys 0m0.011s

Although it is one of the nastiest things that one can do to a FUSE filesystem, I decided to run the Bonnie++ filesystem benchmark suite against MySQLfs. I conducted these tests inside a virtual machine, which might introduce a slight linear slowdown due to the virtualization, but both the MySQL database and /tmp are stored on the same virtual disk device so relative performance should be valid.

As you can see from the results below, MySQLfs was about one-tenth as fast as /tmp/bonnie (ext3) at running Bonnie++ (version 1.03a-7). That indicates you might not want to use MySQLfs to directly store a maildir. Bonnie++ is designed to work against native kernel filesystems and not FUSE, so these figures should be taken with a rather large grain of salt.

$ /usr/sbin/bonnie++ -d /tmp/bonnie Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP v8tsrv 2G 18155 31 16726 5 13338 6 26207 46 74527 24 9840 144 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 24878 94 +++++ +++ +++++ +++ 29018 99 +++++ +++ +++++ +++ $ /usr/sbin/bonnie++ -d ~/mysqlfs/bonnie Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP v8tsrv 2G 2615 5 1207 1 1323 1 2143 2 2363 0 138.1 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 186 2 505 1 296 2 209 2 441 1 282 2

Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.

Share    Print    Comments   

Comments

on Using MySQL as a filesystem

Note: Comments are owned by the poster. We are not responsible for their content.

Using MySQL as a filesystem

Posted by: Anonymous [ip: 209.120.245.184] on February 15, 2008 07:10 PM
I think MySQLfx doesn't have any journaling system, but ext3 has! and it is so affective in speed!

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 84.169.185.239] on February 15, 2008 09:18 PM
Thanks, you totally made my day! This is not as stupid as that graphical version of less (trowser, I think) someone wrote about some time ago, but you're damn close! There definetly should be a "Useless Use of Fuse" award.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 72.188.239.230] on February 16, 2008 12:41 AM
This seems like a fun experiment but is it practical?

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 127.0.0.1] on February 16, 2008 04:15 AM
Using MySQL as a filesystem? Frankly I am bewildered as to why anyone would want to.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 124.148.113.169] on February 16, 2008 08:21 AM
Maybe you want to link data from an existing database with some files that you can access and update through a filesystem interface.
For example, storing digital pictures or other content that like to be seen as a filesystem and being able to select * from mysqlfs, existingtable... where JOIN-Condition-Here.

IIRC there was a large company playing up how wonderful life could be when filesystem queries could be performed using SQL. Can't recall the name of the company off the top... microsomething?

#

Re: Using MySQL as a filesystem

Posted by: Anonymous [ip: 72.188.239.230] on February 16, 2008 03:37 PM
I agree that being able to query the filesystem using SQL would be useful when you want to add a layer of file attributes to all your files. But that's just it - make an extra layer that parallels the existing file structure that you want attributed. There's no reason why you have to roll your filesystem into MySQL just to enjoy the benefits of using SQL to attribute, browse, and search your files. There are tons of applications that already do this and it makes the best of both worlds.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 24.19.16.16] on February 16, 2008 10:40 AM
This is a neat "because I can" proof of concept. That said, I can think of nothing that it accomplishes that can't be done better another way, with the one exception of transferring files over your database connection if all you've got is that database connection. I won't get into all the reasons why a filesystem excels at things that this fails miserably at, and why the "big company" mentioned in the post above couldn't bring said product to market.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 192.168.29.210] on February 16, 2008 01:28 PM

Using MySQL as a filesystem

Posted by: Anonymous [ip: 210.1.201.138] on February 16, 2008 02:18 PM
Ok.... WHY the hell would you want to use MySQL as FS.
Clearly the author is a fool. As stated above the proof of concept is ok... but other than that there is 0 pratical value.

GFS, lustre, ext3, xfs etc all would do a much better job...

Anyone reading this and thinking hey I could use this for my site. Stop. Don't pass go and do not collect your next pay check.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 86.43.88.90] on February 16, 2008 06:23 PM
I read these comments hoping to find a good reason to actually use this. and found nothing

/is sad

#

Re: Using MySQL as a filesystem

Posted by: Anonymous [ip: 59.183.14.148] on February 17, 2008 03:21 PM
never mind. The good work will show up in a couple of years.
Disclaimer: I'm not connected with those guys coding it.

I advise you to keep monitoring the project.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 63.200.65.107] on February 16, 2008 06:53 PM
What about cluster replication? This seems viable for that -- set up a cluster using this fs and mysql's native replication technology.

#

Re: Using MySQL as a filesystem

Posted by: Mark Slade on February 17, 2008 01:39 AM
If it's replication you want why not just use network shares + shell scripts? Granted it's not an out-of-the-box solution but I don't think it would be very difficult to create replication of file systems across machines, and I think the benefits of doing that on ANY file system would make it worth the trouble.

Also don't forget the magic of scp.
[Modified by: Mark Slade on February 17, 2008 01:39 AM]

#

Re(1): Using MySQL as a filesystem

Posted by: Anonymous [ip: 63.200.65.107] on February 19, 2008 04:28 AM
I've been working on an rsync solution, but what I really need is two to ten way replication:

Adding a file to server 1 gets replicated to server 2 and 3
adding a file to server 2 gets replicated to server 1 and 3
adding a file to server 3 gets replicated to server 1 and 2

I can use mysql's built in replication technology to do that, whilst writing to the mysqlfs.

I've researched rsync, enbd, and drbd, all are master -> slave solutions, and I'm looking for a round robin replication solution.

#

Using MySQL as a filesystem

Posted by: Dummy00001 on February 16, 2008 07:00 PM

That's interesting.

What about putting MySQL server on real (powerful) server and using fs on a client system, connecting it to the server? That'd be fun.

The idea isn't new. But it is nice to see that now with Linux it can be done so easily. If nothing else, it shows that Linux VFS layers and infrastructure around it allows to create really specialized things on cheap.

P.S. For better performance, one would have to simulate FS B+-trees: split data in chunks and organize them in balanced tree. Making trees of arbitrary height in SQL.... But probably by limiting size of file, one can make N tables to simulate tree of height N with one table per level.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 172.30.6.12] on February 17, 2008 02:46 AM
What do you do when your a company of 1000 people with a render farm thats generating thousands of files everyday and you want to know, how much data a certain person or team outputted to a specific directory in the last 24 hours?

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 76.125.64.66] on February 18, 2008 07:47 AM
Isn't this starting to be what Microsoft is (was?) trying to do with WinFS, at least just a little bit? And I don't think they were trying to do straight SQL queries; they were trying to put Exchange as the file system. We have people at work who use it that way anyway. (Rich file system with verbose comments, helps, and hints all floating around the actual kernel of information.)

See: http://en.wikipedia.org/wiki/WinFS -- data storage and management system based on relational databases, ... designed for persistence and management of structured, semi-structured as well as unstructured data.



Sounds like he's done *NO* tuning of it, either from FUSE or MySQL. Ext3 has been tuned and beat on for years (notice that _3_?) as has similar file systems. It's what you DO with the data not (always) how fast you can just get to it.



Now for instance I wonder if you could run GREP across it, where MySQL itself did the search comparison of the contents? And if the filename table is indexed it might produce a name faster than 'find / -name xyz' would.



Don't postulate the guy is stupid *just* because he did something you didn't like -- if nothing else it's an easy way to introduce MySQL people into the guts of a file system, and at absolute worst it gave him some experience producing a working FUSE enabled application. (... which *I* haven't done, so he's further along than me. How much direct programming have *YOU* (the above commenters) done lately?)



It's easy to say 'No, it's stupid' -- hello_world.c is stupid, too, but you can learn from it.

#

FUSE fileystem performance

Posted by: Anonymous [ip: 194.237.142.7] on February 18, 2008 12:31 PM
NTFS-3G is also FUSE based but apparently it gives 5-20x better performance still *unoptimized* than MySQLfs: http://ntfs-3g.org/performance.html

#

FUSE fileystem performance

Posted by: Anonymous [ip: 194.237.142.7] on February 18, 2008 12:38 PM
NTFS-3G is also FUSE based but apparently it gives 5-20x better performance still *unoptimized* than MySQLfs: http://ntfs-3g.org/performance.html

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 35.9.55.67] on February 18, 2008 02:10 PM
For one application of weird file systems you might want to google starpack, which is an interesting way to distribute software that is scripted in Tcl/Tk. While the database in use there is Metakit, it does show one application of database based filesystems.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 216.206.198.2] on February 18, 2008 08:40 PM
I can see usage of this as part of a document imaging and management system. Most such systems have a web interface for manipulating folders and objects, but putting documents into the system or reading documents stored in the system through a web interface is cumbersome. Use SSH to access the MySQLfs mount across the network or internet, add some extended metadata attributes, and throw in some access controls, and you'd have a document management system that looks and acts like a filesystem, can be locked down like a filesystem, and is more intuitive than a web app.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 127.0.0.1] on February 19, 2008 05:52 AM
all you people who say this is a waste of time are just missing the point.

mysql is fast, see. really, truly fast.
so a filesystem inside mysql has got to be superfast.

and, here's the really clever bit, you could run a mysql server on that mysqlfs for the ultimate in performance.

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 67.61.191.131] on February 19, 2008 07:47 PM
My primary question is does it allow being mounted from multiple machines? If it were somehow cluster aware, this could be very useful when you had say, 30-40 nodes running a webapp all needing access to the same files. Currently, we've got one client who is utilizing ocfs2 and a fiber channel SAN for their web files, however, this could be greatly improved if it could be done with a MySQL backend, with each of the web nodes running a slave server, so the changes were replicated locally.


Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 68.106.253.157] on February 20, 2008 12:52 AM
"Clearly the author is a fool."

Read, "I don't understand this, therefore it is stupid."

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 76.94.11.23] on February 21, 2008 06:17 AM
As stated, this would actually be PERFECT for a DMS or a cluster where some daemons needed to access the same data; i.e. Web Farm, FTP farm. The nice thing is using MySQL's built-in replication features to distribute across the network or even a WAN.

Granted, the writes would be rather slow, however, I'm quite sure someone will find a way to optimize this so that the bottleneck goes away. Imagine the freakin' UPTIME on this as opposed to using something like DRBD to do filesystem replication and only one filesystem being accessible at one time!

DRBD was a pain to set up to begin with - the author here makes this look absolutely effortless!

#

Using MySQL as a filesystem

Posted by: Anonymous [ip: 204.28.140.7] on February 25, 2008 03:27 PM
I would love to see a few new commands added that are aware that the filesystems is relation to extent the functionality. For example, A BeginTranFS, EndTranFS, and CancelTranFS, that an install program could call to commit or rollback all the filesystem changes at one time would be a great addition.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya