This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

How to script songs lyrics retrieval

By Duane Odom on April 04, 2007 (8:00:00 AM)

Share    Print    Comments   

I recently wrote <slash type="file" id="6f18320df59806a66b37e6a7c0d660bd" title="The complete script">a simple bash script</slash> to incorporate a lyrics database into some of my music-handling scripts. I took advantage of one of the benefits of open source software by finding an existing application that performed this task and inspecting the code to see how the developers did it.

I started with the code for Rhythmbox, a music management application. I discovered that the developers used a couple of simple URL calls to a Web-based lyrics database called Leo's Lyrics:

http://api.leoslyrics.com/api_search.php?auth=duane&artist=cake&songtitle=comfort eagle
http://api.leoslyrics.com/api_lyrics.php?auth=duane&hid=VxwOBYpM3iY=

The first call returns an XML file containing the results of the search in a file like the one below. (Note: the authorization token [I use "duane"] seems to be ignored by the server. I tried unsuccessfully to find some documentation for the Web-based API on the Web site. I also tried sending an email to the support address on the site after signing up for an account that allows you to submit new lyrics, but I got no response.)

<?xml version="1.0" encoding="UTF-8"?>
<leoslyrics>
 <response code="0">SUCCESS</response>
 <searchResults>
   <result id="120741" hid="VxwOBYpM3iY=" exactMatch="true">
     <title>Comfort Eagle</title>
     <feat/>
     <artist>
       <name>Cake</name>
     </artist>
   </result>
 </searchResults>
</leoslyrics>

From the response element (<response code="0">SUCCESS</response>) I could see that the call had succeeded. The next thing I was interested in was the result element (<result id="120741" hid="VxwOBYpM3iY=" exactMatch="true">), and particularly in the hid attribute, which is the id of the lyric entry that I was interested in. I passed the hid that I gleaned to the second URL call, and got this result:

<?xml version="1.0" encoding="UTF-8"?>
<leoslyrics>
 <response code="0">SUCCESS</response>
 <lyric hid="VxwOBYpM3iY=" id="120741">
   <title>Comfort Eagle</title>
   <feat/>
   <artist>
     <name>Cake</name>
   </artist>
   <albums>
     <album>
       <name>Comfort Eagle</name>
       <imageUrl>http://images.amazon.com/images/P/B00005MCW5.01.MZZZZZZZ.jpg</imageUrl>
     </album>
   </albums>
   <writer/>
   <text>We are building a religion
We are building it bigger
.................
Pendant keychains</text>
 </lyric>
</leoslyrics>

From these results I saw that I could pull out the text element (<text>.../<text>) from the results and have my song lyrics.

To automate this process in a bash script I used wget and xmlstarlet. Wget is a simple utility for non-interactive download of files from the Internet. I used wget to call the URLs with the correct parameters and capture the XML results. Xmlstarlet is a set of command-line utilities used to query and process XML documents. I used xmlstarlet to pull out the pertinent information from the URL call results.

To make this script really useful, I wanted to supply it with a path to an MP3 file and have it pull the artist and title from the ID3 tag in the file and use this information to download the lyrics. I used id3tool, a utility that can view and edit ID3 tags within MP3 files from the command line.

After running id3tool at the start of the script, I use sed to pull the artist information from id3tool's output and store it in a shell variable called ARTIST. I retrieve the song title with a similar line.

ARTIST=`id3tool "$1" | sed -ne "s/.*Artist:\(.*\)/\1/p"`

I then use wget to call a URL with the artist and title parameters. The call returns XML results which are stored in a shell variable called search_results:

search_results=`wget -q "http://api.leoslyrics.com/api_search.php?auth=$AUTH&artist=$ARTIST&songtitle=$SONGTITLE" -O -`

Next, I use xmlstarlet to parse information from the XML results. The following example executes the sel command of the xmlstarlet utility to parse the text of the response element:

result=`echo $search_results | xmlstarlet sel -t -v "/leoslyrics/response/text()"`

Finally, I use a combination of the technique above with the unesc command of the xmlstarlet utility, which simply un-escapes all escaped characters in the text, making it easier to read:

echo $lyrics | xmlstarlet sel -t -v "/leoslyrics/lyric/text/text()" | xmlstarlet unesc > "$1.txt"

The completed script gets the lyrics for one song at a time. To download the lyrics for your entire song library, you can use the find command with the exec parameter to execute the script on all of your songs at once. For example, if your song library is rooted at /share/music/, you could run:

find /share/music/ -iname *.mp3 -exec get_lyrics.sh {} \;

You can easily use the same URL calls and response parsing techniques in your language of choice (providing that language can retrieve Web pages and parse XML). These techniques could also be adapted to work with other lyrics databases that support Web-based APIs.

Duane Odom is a computer programmer for the US Department of Defense and a freelance writer. He has been a Linux user since 2001.

Share    Print    Comments   

Comments

on How to script songs lyrics retrieval

Note: Comments are owned by the poster. We are not responsible for their content.

good article!

Posted by: Anonymous Coward on April 09, 2007 02:20 AM
<tt>I liked your article!

I guess some people would like to add the lyrics to the id3v2 tag of the file (at least i do)

So i changed you're script a little. Don't know if it works for everyone but it works for me at least<nobr> <wbr></nobr>:)

if [ ! -f "$1.txt" ]; then
            echo $lyrics_results | xmlstarlet sel -t -v "/leoslyrics/lyric/text/text()" | xmlstarlet unesc > "$1.txt"
            exec put_lyrics.py "$1" "$1.txt"
        fi

put_lyrics.py:

#!/usr/bin/env python
import sys
import codecs
from mutagen.id3 import ID3, USLT,TALB
audio = ID3(sys.argv[1])
lyrics = codecs.open(sys.argv[2],'r','utf8').read()
audio<nobr>.<wbr></nobr> update_to_v24()
audio.add(USLT(encoding=3, desc="tagged by python", lang="und", text=lyrics))
audio.save()

You have to have the mutagen module installed though.

Hope it can be of help to anyone.</tt>

#

Re:good article!

Posted by: Anonymous Coward on April 16, 2007 01:14 AM
I tried this small python prog and it does work, the lyrics get appended in a USLT frame in the ID3 tag... But then, iTunes can no longer read the artwork from the mp3 file, although mutagen-inspect shows that the APIC tag is still there... any ideas?

Thx!

#

Re:good article!

Posted by: Anonymous Coward on April 16, 2007 02:52 PM
I don't use iTunes so i don't know how it handles id3 tags, it could be that the line below screws it up try removing it from the<nobr> <wbr></nobr>.py file and see if it does any difference, if the artwork is in the id3 tag nothing should have happened to it, except that itunes might use id3V2.3, but please go ahead and test it out.

audio. update_to_v24() -- remove that line

#

How to script songs lyrics retrieval

Posted by: Anonymous [ip: 67.149.151.14] on January 31, 2008 12:03 AM
check out http://lyricsfly.com/api/ for a much easier access.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya