This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

CLI Magic: Use Extended Attributes for better file management

By Ryan Paul on June 20, 2005 (8:00:00 AM)

Share    Print    Comments   

There are many organizational techniques that contribute to efficient file management. Thoughtful and effective directory hierarchies help users locate content with ease. Consistent and expressive file nomenclatures give users the ability to discern the nature of a file's content at a glance. Unfortunately, there are many things that even the most expressive file name can't convey. In some cases, there is just too much information, most of which doesn't warrant inclusion in a concise file name. Many unique file formats now include embedded meta-data mechanisms that provide users with a way to 'tag' files. With a specialized tag editor, users can easily associate a title, artist and album with a specific MP3 file, for instance. Wouldn't it be nice to be able to associate arbitrary tag data with any kind file or directory? With extended attributes, you can.
Extended attributes are essentially name/value pairs that can be assigned to any file or directory. This powerful file system meta-data feature, when intelligently used, can facilitate tremendously efficient file management. The goal of this example-driven overview is to illustrate the power of extended attributes and demonstrate potential uses. You will need a basic understanding of awk and bash to appreciate the significance of some of the examples, but even users without a lot of command line experience will be able to understand the commands and figure out how to use them. By the time you finish reading this article, you will be able to take advantage of the coolest file system feature to be implemented since the symbolic link.

Getting Started

The first step is configuration. In days of yore, intrepid users had to patch their kernel to get support for extended attributes. Lucky for you, the feature is now a standard part of the 2.4 and 2.6 kernels, and it is widely supported by most major distributions and almost all the prominent file systems. In most cases, it is just a matter of enabling the feature. If you are fortunate enough to be using an XFS file system, the feature is already enabled and you can skip the rest of this section. If you are using ext2, ext3 or reiser, you will have to add the user_xattr flag to the drive's entry in your<nobr> <wbr></nobr>/etc/fstab file and remount the partition. The updated entry should vaguely resemble this:

<nobr> <wbr></nobr>/dev/hda1 / ext3 defaults,noatime,user_xattr 0 1


After you have altered your fstab file, you can either reboot your computer, or remount the partition like so:

mount -o remount,user_xattr /


Now you need to install the commands and the associated library. An 'attr' package is available for many distributions, and source code is available from the SGI web site.

That's all there is to it. Now that everything is configured properly, it's time to learn some new commands.

The Commands

Manipulation of extended attributes can be done with three commands: setfattr, getfattr, and attr. This article does not cover the attr command, which exists solely for the sake of IRIX compatibility. As you can guess, the setfattr command sets attributes, and the getfattr command retrieves them. To start with, we will add an attribute named 'testing', with a value of 'this is a test', to a file called 'test-1.txt':


setfattr -n user.testing -v "this is a test" test-1.txt


The '-n' parameter specifies the name of the attribute. Period-delimited attribute namespaces are used to reduce naming conflicts. All attributes explicitly added by users must be in the 'user' namespace, which is why the attribute in the example is named 'user.testing'. For serious employment of this feature, more elaborate namespaces are advisable. The '-v' parameter specifies the value of the attribute. Note that the value is enclosed in quotation marks because the string includes spaces.

Now we will use the getfattr command to retrieve the 'testing' attribute:


getfattr -n user.testing test-1.txt


this will cause the following output:

  # file: test-1.txt
  user.testing="this is a test"


If you just want it to display the value of the attribute, you can use the '--only-values' parameter:

getfattr --only-values -n user.testing test-1.txt


The setfattr command is also used for removing attributes. To get rid of our 'testing' attribute, all I have to do is:

setfattr -x user.testing test-1.txt


Unfortunately, not all programs and file systems support extended attributes. If you copy the files to a different file system or manipulate the files with a utility that doesn't support the feature, the attributes will disappear. If you want to preserve the attributes, you can use the '--dump' parameter of the getfattr command to generate a complete listing of all the attributes and values associated with the target:


getfattr --dump * > data_file


When you move the files back to a file system that supports extended attributes, you can restore the attribute data to the files by using the '--restore' parameter of the setfattr command:



setfattr --restore=data_file


There are a few other options and parameters that are not covered in this article. For more information, you can refer to the man pages.

A Few (Relatively) Simple Examples

As a journalist, I write a lot of articles. The number of files in ~/doc/technical/articles has steadily grown, and it occurs to me that it will soon become difficult to manage. I can use extended attributes to simplify the task.

First, lets think about the kind of attributes it might be helpful to associate with articles. There are a few in particular that come to mind: title, date of publication, and publication venue. There are many other attributes that I could add, but I want to keep it simple, and I want to avoid assigning attributes for things like word count that I can easily ascertain with other simple commands. Now let's think about namespace issues. In order to prevent my attribute names from conflicting with attributes added in the future by other programs, I will put all my attributes in the 'user.article' namespace.

Now I will manually set the individual attributes for all of my articles. Here is an example:

setfattr -n user.article.title -v "Innovations in Window Management"

article-commentary-wm_innovations.txt


It is also possible to add an attribute to multiple files at once. To demonstrate this, I will add an article.author tag to all my article files:


setfattr -n user.article.author -v "Ryan Paul" *.txt


Now I'll show you how to use the attributes for filtering and file management. Let's start by trying to list all of the files with articles that I have written for Newsforge. The format for the getfattr command is kind of unusual, so we have to use awk to extract the relevant data. The getfattr output looks something like this:

  # file: article-commentary-wm_innovations.txt
  user.article.venue="newsforge"

  # file: article-comparison-xml_authoring_tools.txt
  user.article.venue="newsforge"



If we treat each '# file: ' entry as an awk record, and each line of that record as an awk field, we should be able to get what we want by grabbing the first field of every record that contains '="newsforge"':

getfattr -n user.article.venue *.txt | awk 'BEGIN {RS="# file: ";

FS="\n"}<nobr> <wbr></nobr>/="newsforge"/ {print $1}'


To simplify matters, we can abstract this into a bash function:

  ea_query() {
    a=$1; v=$2
    shift 3
    getfattr -n $a $* |
    awk "BEGIN  {
      RS=\"# file: \"
      FS=\"\n\"
    }<nobr> <wbr></nobr>/=\"$v\"/ { print \$1 }"
  }


Which we can use to perform arbitrary queries. The following command-line call:


ea_query user.article.venue newsforge *.txt



will list all the<nobr> <wbr></nobr>.txt files for which 'newsforge' is the value associated with the 'user.article.venue' attribute. Now let's try using the queries for some simple file management. If I want to copy all the files containing articles written for newsforge into a 'newsforge_articles' directory, I can do this:

cp `ea_query user.article.venue newsforge *.txt` newsforge_articles


What if I want to make a tarball containing all articles I wrote for newsforge that are longer than 1200 words? To find out which articles are longer than 1200 words, I use an old trick: I filter 'wc -w' through awk and output the names of all the files that fulfill a greater-than comparison. I can use the output of that as the input for the query, and I can use the output of the query as the input for tar:


tar -czf long_newsforge_articles.tgz $(ea_query user.article.venue

newsforge $(wc -w *.txt | awk '/\.txt/ {if ($1 > 1200) print $2 }'))


A useful variation on the above example might involve compressing newsforge articles containing a specific word or phrase. You can do that by using grep rather than wc and awk. Once you have put together a few handy bash functions or shell scripts for attribute manipulation, you should be able to integrate extended attribute queries into your repertoire of file management techniques with relative ease.

An Arcane Example

For the benefit of system administrators and ambitious readers, I will now present a more sophisticated command-line example. I am going to show how I list, in order by publication date, the title and filename of every review I have written for Newsforge that has been published since November of 2004, is longer than 1000 words, and contains the word "Linux". For this example, you will need ruby, and Aredridel's excellent xattr module (http://theinternetco.net/projects/ruby/ruby-xatt<nobr>r<wbr></nobr> ). I have included line breaks to increase the readability of the example:


  ruby -r xattr -e '

    pubdate = proc {|f|
      Time.gm(*f.get_attr("article.date.published").spl<nobr>i<wbr></nobr>  t("-"))
    };

    Dir["*review*.txt"].map {|fn| File.open fn }.find_all {|f|

      c = f.read;
      f.get_attr("article.venue") == "newsforge" and
      c.split.length > 1000 and
      c.include? "Linux" and
      pubdate[f] > Time.gm(2004, "nov", 01)

    }.sort {|f1,f2| pubdate[f1]  pubdate[f2] }.each {|f|

      puts "#{f.get_attr("article.title")} #{f.path}"

    }'


I start by defining a 'pubdate' function that will convert my "date.published" attribute string into a ruby 'Time' instance. Then, I use the 'Dir[]' class method to generate a list of all files in the current directory that match the "*review*.txt" glob. I filter that list of files through a 'find_all' block that performs the necessary checks, and then I pass the results to a 'sort' block that performs publication date comparisons using the 'pubdate' function. Finally, I send the title and name of each file to stdout in a concluding 'each' block. Note that I use the '-r xattr' parameter to include Aredridel's module.

Ruby lends itself well to command line administrative work. You can use variations of the above example for a wide variety of file and system management tasks in native Ruby, or you can output file names, and pipe the result to other shell commands.

If you have some programming experience, you can use the xattr lib to make elaborate scripts and utilities that can manipulate extended attributes. For those who prefer Python for application development, pyxattr is available here.

Conclusion

Whenever I learn a new command or a new command line trick, I celebrate by putting it to good use. I've given you a good start. Now it's your turn. Find a creative way to use extended attributes, and demonstrate your command line prowess by leaving a comment with a few examples, or by sharing your experiences.

Share    Print    Comments   

Comments

on CLI Magic: Use Extended Attributes for better file management

Note: Comments are owned by the poster. We are not responsible for their content.

Uhm

Posted by: Anonymous Coward on June 20, 2005 05:30 PM
Sounds pretty neat in a certain theoretical way (but just somehow), but I think that in practic it's not really anything for me.

#

Thank You

Posted by: Anonymous Coward on June 21, 2005 03:17 AM
Unlike the previous two posers, I found the article very informative. I've been thinking about extended attributes and metadata for a while now and finding such a framework already in my Linux filesystems is great.

Now if more utilities would support xattrs.

#

Beagle

Posted by: Anonymous Coward on June 21, 2005 11:24 PM
I believe the Beagle desktop search engine (www.beaglewiki.org), a Gnome project, uses these attributes to index files.

#

Wish By Default

Posted by: Anonymous Coward on June 21, 2005 11:52 PM
Being an ex-Be user, I really wish I could have this by default with Linux. It was so nice to have because it lets add functionality and information to a file/directory with ease.

#

Does this work on JFS?

Posted by: Anonymous Coward on June 22, 2005 12:22 AM
Can I enable extended attributes on a JFS (IBM's journaling filesystem) filesystem? Is the method the same as for the filesystems mentioned in your article?

Grurp

#

Re:Does this work on JFS?

Posted by: Administrator on June 22, 2005 04:02 PM
The LWN kernel archive shows a patch added in 2002:
<a href="http://lwn.net/Articles/9488/" title="lwn.net">http://lwn.net/Articles/9488/</a lwn.net>

This devworks article briefly describes the implementation of xattrs on JFS:
<a href="http://www-106.ibm.com/developerworks/library/l-jfslayout/#h9" title="ibm.com">http://www-106.ibm.com/developerworks/library/l-j<nobr>f<wbr></nobr> slayout/#h9</a ibm.com>

As far as I can tell, the same commands should work. I am not sure how to enable the feature in JFS tho. You might want to ask on the JFS mailing list.

#

Error in example

Posted by: Anonymous Coward on June 27, 2005 11:49 PM
You have error in example function ea_query():

shift 3 is wrong

OK is shift 2

Bye

#

Backing up extended attributes

Posted by: Anonymous Coward on December 16, 2005 11:23 AM
If anyone needs a convenient way to back up files with extended attributes, check out rdiff-backup.

#

Backuping

Posted by: Anonymous Coward on December 20, 2005 04:37 AM
Backuping is possible with the tool star, it supports extended attributes backuping, but don't ask me which arguments must be passed for it.
The GNU Coreutils "cp" copies also the extended attributes.

#

nice but

Posted by: Administrator on June 20, 2005 05:43 PM
Given the difficulty to make anyone agree on anything in the Linux world, I very much doubt that this feature will be implemented consistently across desktop apps.

Let's see in a year or 2 how many alternatives have been found....

That's why the focus will be more on smart searches than on user driven tagging.

#

Re:nice but

Posted by: Administrator on June 21, 2005 03:41 AM
Agreement difficulties are not an issue: extended attributes are already a standard part of the kernel and many filesystems. The problem is application support.

I agree that smart searches will be a future focus, but smart searches require a meta-data tagging system. Look at how Apple uses extended attributes to vastly improve the efficacy of their spotlight search system. FYI, the Beagle desktop search system for GNOME requires that extended attributes be enabled.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya