- About Us
Building and installing Antiword from developer source is not
difficult, but it is a little different than normal. If you can't
find the right binary for your distribution, download the latest
source tarball from the site. Version 0.36.1 is the release I
grabbed. After decompressing the tarball, enter the subdirectory
created by tar and type
install as a normal user. There are many platform specific
Makefiles included in the tarball, but the default in our download
is for Linux. By the way, the
make install process
bin directory in your home directory, and
puts the executable there.
No man pages are included with the developer release, but simply
antiword without any arguments produces a
handy little cheat sheet explaining how to use it. Like this
Name: antiword Purpose: Display MS-Word files Author: (C) 1998-2004 Adri van Os Version: 0.36.1 (09 Dec 2004) Status: GNU General Public License Usage: antiword [switches] wordfile1 [wordfile2 ...] Switches: [-f|-t|-a papersize|-p papersize|-x dtd][-m mapping][-w #][-i #][-Ls] -f formatted text output -t text output (default) -a <paper size name> Adobe PDF output -p <paper size name> PostScript output paper size like: a4, letter or legal -x <dtd> XML output like: db (DocBook) -m <mapping> character mapping file -w <width> in characters of text output -i <level> image level (PostScript only) -L use landscape mode (PostScript only) -s Show hidden (by Word) text
As you can see, we can various options for the conversion. We can create a straight text file, PDF, PostScript, or XML. That's a pretty impressive range of options for a beta. But how well does it work, that's the real question. Let's give it a whirl with some real world documents.
I downloaded an MS Word 6 document from the Oracle web site. The first test was to convert to plain text, like this:
antiword -t Linux_DB.doc > LDB.txt
Paging through the resulting text document, I noticed that the
graphics were missing, but other than that, the text was well
formatted and perfectly legible. Then I tried the PDF and
PostScript options (using
antiword -a letter Linux_DB.doc
> LDB.pdf and
antiword -p letter Linux_DB.doc >
LDB.ps respectively). Again, the images were missing, but
other than that, the conversions seemed to have worked
|Click to enlarge|
-ioption for PostScript conversions, and sure enough, using
-i2faithfully reproduced the images from the original as well as the text. You can see screenshot of the PostScript data viewed with GhostView alongside.
Other attempts on other MS Word documents did not always result in the images being included in the conversion. Possibly they were created with earlier versions of MS Word, as the image feature is only supposed to work on documents created by MS Word 6 and later.
One last option to mention. The
-s argument tells
Antiword to show any comments hidden by MS Word in the original
document. There have been a number of embarrassing
slips by various firms who have found out to late that these
"hidden" comments can be brought back to visibility by people who
were never intended to see them.
Antiword is a valuable tool when you want to see, or to print, an MS Word document quickly, without waiting for a huge word processing app to load itself into memory. It's not quite soup in some ways, but I'm going to keep eye on it. When it can handle PDFs and images without a hitch, it's good to go as far as I'm concerned.</body> </html>