For many average users, GNU/Linux support for PDF files may seem reasonably advanced. They can create PDF files in programs like OpenOffice.org, read them with programs like Kpdf, and edit them in programs like pdftk or PDFedit. But that's not the whole story, says José Marchesi, founder of the recently created GNU PDF project. "Unfortunately, there are a lot of missing features in the existing free implementations," he says. That's the main reason why the Free Software Foundation (FSF) has declared GNU PDF a high priority project, and is actively seeking donations to speed its progress.
Marchesi is a long-time support of the GNU Project, the umbrella organization for free software projects connected to the FSF. In 1999, he founded GNU Spain, and he later assisted in the creation of GNU Italy and GNU Mexico. He has also contributed to GNU Ghostscript, GNU gv, and GNU Ferret, the first two of which provide support for both PDF and the closely related PostScript format. In addition, Marchesi performs what he calls "random works" in the GNU Project, such as writing internal code and editing Web pages as needed.
Marchesi says he first became aware of the need for better free PDF support a few years ago in his role as maintainer of gv. In December 2005, Marchesi tried to update the Ghostscript PDF interpreter that gv uses, only to find it was technically impractical. The solution, he decided, was to attack the problem at a more basic level, and, after he discussed the problem with members of the FSF and GNU Project, GNU PDF was born.
According to Marchesi, full support for PDF is urgent for a number of reasons, both technical and political.
On the technical level, once Marchesi started investigating, he discovered a great deal of PDF functionality that is either missing or incomplete: "interactive features (forms, annotations), the management of embedded contents (sounds and movies), execution of JavaScript to perform forms validation, 3-D artwork, accessibility, Web capturing, [and] management of document collections."
Many users are unaware of these lacks, either because they never use such features or because, Marchesi says, "The PDF standard is quite careful when providing backward compatibility: When a PDF consumer application (such as a viewer) finds an unknown construct (such as 3-D artwork), it can (and should) ignore it. But in fact you may be missing information."
The GNU Project would like to see a full implementation of the upcoming ISO 32000 standard for PDF. Despite the increasing frequency with which PDF is used for corporate and academic purposes, all software that provides the highest levels of support for the ISO standard is proprietary, which means that, without a concerted effort, free software users could be left behind.
Marchesi also says, "We want a GPLv3 implementation of PDF. Almost all of the existing alternatives are licensed under GPLv2 only." Besides the obvious credibility involved in having the new version of the license used, no doubt an important consideration is the conviction that a GPLv3 program will provide greater protection of users' freedoms.
Marchesi considered adding the missing functionality to existing free PDF libraries, the project quickly discovered that this idea was impractical, given GNU PDF's engineering goals.
"Our objective is to provide the same level of PDF support as Adobe [Acrobat]," Marchesi says, referring to the leading proprietary PDF program. "So we need a general and complete library that provides enough functionality to build an Acrobat-like program on top of it. This requires capabilities to both read and manipulate PDF files in an integrated library. None of the existing free implementations provides that [integration]. Some of them are designed to provide rasterization of PDF pages, such as Ghostscript, Xpdf, and Poppler, while others are designed to provide facilities for PDF manipulation, such as PoDoFo." Each is suitable for its particular purposes, but not for the integrated support envisioned by GNU PDF.
GNU PDF's first goal is to write a library in the C programming language "intended to be used by both PDF consumer and PDF product applications," Marchesi says. "The library will be similar to the Adobe PDF Library, providing access to several layers of abstraction. In this way, the library will be useful for many kinds of applications, not just viewers."
The next step will be to write an application that has already been labelled GNU Juggler, "an Acrobat-like application on top of the library." GNU Juggler, Marchesi says, "will be a specialized PDF viewer and editor." To help with the application's creation, a member of GNU PDF project is already performing a functional analysis of the latest edition of Acrobat Professional, Adobe's flagship PDF product, in order to reverse-engineer it.
One thing GNU PDF will not have to do is write a graphics library. Project members have already concluded that they can use libcairo. The members of the Cairo project are aware of GNU PDF, and some have already started discussing having the GNU PDF library being integrated with their work.
The FSF has set up a Web page for donations to GNU PDF -- a first for any of its ongoing high-priority projects, although the FSF did briefly help collect pledges for the Free Ryzom campaign last year. However, Marchesi emphasizes that "we will go ahead with the project in any case." Donations would allow the project to hire full-time developers, instead of the volunteers more usual in a new free software project.
"To write the GNU PDF library and GNU Juggler is a really big task, and we want to do it really fast," Marchesi says. "It is crucial for us to have a free, complete, and high-quality implementation of the PDF standard as soon as possible."
Note: Comments are owned by the poster. We are not responsible for their content.
1) Me too! pdfnup can be a pain
2) Evince (Gnome's document viewer) is apparently aiming to include annotation support in it's next release. The annotation support is the result of a Google Summer of Code project.
I tried using PDFEdit, but it's interface was crazy - I had no idea how to even start to use it!
"Proof again that the open source community just doesn't get it..."
PDF was designed by Adobe - they are responsible for the presence of embedded audio and video in the specificiation, not the open source community.
The GNUPDF devs are just saying that to have a *fully* compliant PDF application, these (useless in dead-tree form) features need to be implemented.
PDF is a PRINT format
Ehm, where in PDF name, description or specification does it say that PDF is a print format? PDF is a Portable Document Format. There is more to portability than printing.
...would look exactly on another machine.
Not sure what exactly do you mean by "look exactly". I will assume that you have omitted the words "the same", because that is very common misconception. In fact, it is nearly impossible for a document in any format to print or display exactly the same on two different devices: you have different resolutions, different media size, different printable area on the page, different contrast and intensity setting of a monitor, different shade of paper, different physical and chemical properties of ink, even different available fonts, and so on. PDF was designed to give you nearly as good display or print as possible under given circumstances. That's why it is called "portable".
How exactly am I supposed to print a PDF that has an embedded MP3?
Very easily. PDF specification states that is a feature of a document cannot be implemented on a given device, it is to be ignored. So for example things like hyperlinks or embedded audio will be ignored when printing, but the text and graphics on the page will still print as well as possible on a given printer.
I personally find this very useful. I can create a slideshow explaining how to solve a problem, demonstrating and explaining all calculations step by step, designed to be viewed on a screen. I can include voice explanations as well as written ones, thus making it more useful to both visual and auditory learners. You can still print the slideshow and study it on a train, you just get the visual part without the audio. PDF format is ideal for this, it contains everything in a single file, that you can download for offline use, will display in a very similar way on variety of devices, users of nearly all common operating systems can download a viewer free of charge, and, as noted above, various features of the document will be gracefully ignored when displayed or printed on devices that cannot handle them. In addition, thanks to pdfTeX, it is very easy to create.
GNU PDF to fill missing gap in functionality
Posted by: Anonymous [ip: 129.32.8.58] on November 30, 2007 01:26 AM1) I'd like to see a GUI for dealing with N-up printing, including edge trimming. Commands like "pdfnup --nup 2x2 --paper letter --trim "1cm 0cm 0cm 1cm" document.pdf" are not all that pleasant, and (esp. in trying to figure out the degree of appropriate trim) sometimes it takes a few times to get it right.
2) PDFedit looks very useful, but it mostly stalled and crashed on my machine. (Here's a <a href="http://slashdot.org/~timothy/journal/183325">longish journal entry</a> on my experience with it.) I would dearly appreciate a PDF reader that let me smoothly highlight and annotate PDFs; in law school, this would be hugely appreciated.
p.s. Offtopic: I wish the captcha words used to prevent spam were printed MUCH larger. It's a raster image, it needn't be quite so small!
#