Building and installing Antiword from developer source is not
difficult, but it is a little different than normal. If you can't
find the right binary for your distribution, download the latest
source tarball from the site. Version 0.36.1 is the release I
grabbed. After decompressing the tarball, enter the subdirectory
created by tar and type make, then make
install as a normal user. There are many platform specific
Makefiles included in the tarball, but the default in our download
is for Linux. By the way, the make install process
creates a bin directory in your home directory, and
puts the executable there.
No man pages are included with the developer release, but simply
entering antiword without any arguments produces a
handy little cheat sheet explaining how to use it. Like this
one:
Name: antiword
Purpose: Display MS-Word files
Author: (C) 1998-2004 Adri van Os
Version: 0.36.1 (09 Dec 2004)
Status: GNU General Public License
Usage: antiword [switches] wordfile1 [wordfile2 ...]
Switches: [-f|-t|-a papersize|-p papersize|-x dtd][-m mapping][-w #][-i #][-Ls]
-f formatted text output
-t text output (default)
-a <paper size name> Adobe PDF output
-p <paper size name> PostScript output
paper size like: a4, letter or legal
-x <dtd> XML output
like: db (DocBook)
-m <mapping> character mapping file
-w <width> in characters of text output
-i <level> image level (PostScript only)
-L use landscape mode (PostScript only)
-s Show hidden (by Word) text
As you can see, we can various options for the conversion. We can create a straight text file, PDF, PostScript, or XML. That's a pretty impressive range of options for a beta. But how well does it work, that's the real question. Let's give it a whirl with some real world documents.
I downloaded an MS Word 6 document from the Oracle web site. The first test was to convert to plain text, like this:
antiword -t Linux_DB.doc > LDB.txt
Paging through the resulting text document, I noticed that the
graphics were missing, but other than that, the text was well
formatted and perfectly legible. Then I tried the PDF and
PostScript options (using antiword -a letter Linux_DB.doc
> LDB.pdf and antiword -p letter Linux_DB.doc >
LDB.ps respectively). Again, the images were missing, but
other than that, the conversions seemed to have worked
perfectly.
|
|
| Click to enlarge |
-i option for PostScript
conversions, and sure enough, using -i2 faithfully
reproduced the images from the original as well as the text. You
can see screenshot of the PostScript data viewed with GhostView
alongside.
Other attempts on other MS Word documents did not always result in the images being included in the conversion. Possibly they were created with earlier versions of MS Word, as the image feature is only supposed to work on documents created by MS Word 6 and later.
One last option to mention. The -s argument tells
Antiword to show any comments hidden by MS Word in the original
document. There have been a number of embarrassing
slips by various firms who have found out to late that these
"hidden" comments can be brought back to visibility by people who
were never intended to see them.
Antiword is a valuable tool when you want to see, or to print, an MS Word document quickly, without waiting for a huge word processing app to load itself into memory. It's not quite soup in some ways, but I'm going to keep eye on it. When it can handle PDFs and images without a hitch, it's good to go as far as I'm concerned.
</body> </html>Note: Comments are owned by the poster. We are not responsible for their content.
nice of you to feature antiword. i myself have been using antiword for almost 2 years now. using midnight commander, you can take a "quick view" of the text of word documents. edit your "extensions" file (under MC's command menu), and put in the following lines:
<TT>type/^Microsoft\ Wordby the way, you can also check out "unrtf" which can be used in much the same way to tackle RTF files.
Open=(soffice %f >/dev/null 2>&1 &)
View=%view{ascii} antiword %f</TT>
Antiword Fan
Posted by: Anonymous Coward on March 07, 2005 10:36 PM#