This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Free Software

GNU PDF to fill missing gap in functionality

By Bruce Byfield on November 29, 2007 (9:00:00 PM)

Share    Print    Comments   

For many average users, GNU/Linux support for PDF files may seem reasonably advanced. They can create PDF files in programs like OpenOffice.org, read them with programs like Kpdf, and edit them in programs like pdftk or PDFedit. But that's not the whole story, says José Marchesi, founder of the recently created GNU PDF project. "Unfortunately, there are a lot of missing features in the existing free implementations," he says. That's the main reason why the Free Software Foundation (FSF) has declared GNU PDF a high priority project, and is actively seeking donations to speed its progress.

Marchesi is a long-time support of the GNU Project, the umbrella organization for free software projects connected to the FSF. In 1999, he founded GNU Spain, and he later assisted in the creation of GNU Italy and GNU Mexico. He has also contributed to GNU Ghostscript, GNU gv, and GNU Ferret, the first two of which provide support for both PDF and the closely related PostScript format. In addition, Marchesi performs what he calls "random works" in the GNU Project, such as writing internal code and editing Web pages as needed.

Marchesi says he first became aware of the need for better free PDF support a few years ago in his role as maintainer of gv. In December 2005, Marchesi tried to update the Ghostscript PDF interpreter that gv uses, only to find it was technically impractical. The solution, he decided, was to attack the problem at a more basic level, and, after he discussed the problem with members of the FSF and GNU Project, GNU PDF was born.

The reasons for a new PDF project

According to Marchesi, full support for PDF is urgent for a number of reasons, both technical and political.

On the technical level, once Marchesi started investigating, he discovered a great deal of PDF functionality that is either missing or incomplete: "interactive features (forms, annotations), the management of embedded contents (sounds and movies), execution of JavaScript to perform forms validation, 3-D artwork, accessibility, Web capturing, [and] management of document collections."

Many users are unaware of these lacks, either because they never use such features or because, Marchesi says, "The PDF standard is quite careful when providing backward compatibility: When a PDF consumer application (such as a viewer) finds an unknown construct (such as 3-D artwork), it can (and should) ignore it. But in fact you may be missing information."

The GNU Project would like to see a full implementation of the upcoming ISO 32000 standard for PDF. Despite the increasing frequency with which PDF is used for corporate and academic purposes, all software that provides the highest levels of support for the ISO standard is proprietary, which means that, without a concerted effort, free software users could be left behind.

Marchesi also says, "We want a GPLv3 implementation of PDF. Almost all of the existing alternatives are licensed under GPLv2 only." Besides the obvious credibility involved in having the new version of the license used, no doubt an important consideration is the conviction that a GPLv3 program will provide greater protection of users' freedoms.

The approach

Marchesi considered adding the missing functionality to existing free PDF libraries, the project quickly discovered that this idea was impractical, given GNU PDF's engineering goals.

"Our objective is to provide the same level of PDF support as Adobe [Acrobat]," Marchesi says, referring to the leading proprietary PDF program. "So we need a general and complete library that provides enough functionality to build an Acrobat-like program on top of it. This requires capabilities to both read and manipulate PDF files in an integrated library. None of the existing free implementations provides that [integration]. Some of them are designed to provide rasterization of PDF pages, such as Ghostscript, Xpdf, and Poppler, while others are designed to provide facilities for PDF manipulation, such as PoDoFo." Each is suitable for its particular purposes, but not for the integrated support envisioned by GNU PDF.

GNU PDF's first goal is to write a library in the C programming language "intended to be used by both PDF consumer and PDF product applications," Marchesi says. "The library will be similar to the Adobe PDF Library, providing access to several layers of abstraction. In this way, the library will be useful for many kinds of applications, not just viewers."

The next step will be to write an application that has already been labelled GNU Juggler, "an Acrobat-like application on top of the library." GNU Juggler, Marchesi says, "will be a specialized PDF viewer and editor." To help with the application's creation, a member of GNU PDF project is already performing a functional analysis of the latest edition of Acrobat Professional, Adobe's flagship PDF product, in order to reverse-engineer it.

One thing GNU PDF will not have to do is write a graphics library. Project members have already concluded that they can use libcairo. The members of the Cairo project are aware of GNU PDF, and some have already started discussing having the GNU PDF library being integrated with their work.

Realizing the project goals

The FSF has set up a Web page for donations to GNU PDF -- a first for any of its ongoing high-priority projects, although the FSF did briefly help collect pledges for the Free Ryzom campaign last year. However, Marchesi emphasizes that "we will go ahead with the project in any case." Donations would allow the project to hire full-time developers, instead of the volunteers more usual in a new free software project.

"To write the GNU PDF library and GNU Juggler is a really big task, and we want to do it really fast," Marchesi says. "It is crucial for us to have a free, complete, and high-quality implementation of the PDF standard as soon as possible."

Bruce Byfield is a computer journalist who writes regularly for Linux.com.

Share    Print    Comments   

Comments

on GNU PDF to fill missing gap in functionality

Note: Comments are owned by the poster. We are not responsible for their content.

GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 129.32.8.58] on November 30, 2007 01:26 AM
I'd like to see a few things in the treatment of PDFs on Linux, and I'm very glad to see this project.



1) I'd like to see a GUI for dealing with N-up printing, including edge trimming. Commands like "pdfnup --nup 2x2 --paper letter --trim "1cm 0cm 0cm 1cm" document.pdf" are not all that pleasant, and (esp. in trying to figure out the degree of appropriate trim) sometimes it takes a few times to get it right.



2) PDFedit looks very useful, but it mostly stalled and crashed on my machine. (Here's a <a href="http://slashdot.org/~timothy/journal/183325">longish journal entry</a> on my experience with it.) I would dearly appreciate a PDF reader that let me smoothly highlight and annotate PDFs; in law school, this would be hugely appreciated.



p.s. Offtopic: I wish the captcha words used to prevent spam were printed MUCH larger. It's a raster image, it needn't be quite so small!

#

Re: GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 24.93.127.221] on December 01, 2007 12:42 AM
flpsed will let you overlay text and simple line drawings on top of PDFs. It won't let you edit any elements already present but you CAN come back edit the the things you added later.

http://www.ecademix.com/JohannesHofmann/

#

Re: GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 59.167.167.104] on December 01, 2007 04:27 AM

1) Me too! pdfnup can be a pain



2) Evince (Gnome's document viewer) is apparently aiming to include annotation support in it's next release. The annotation support is the result of a Google Summer of Code project.



I tried using PDFEdit, but it's interface was crazy - I had no idea how to even start to use it!

#

Not all of the functionality is missing...

Posted by: Anonymous [ip: 141.123.223.100] on November 30, 2007 04:23 PM
...it shouldn't have been there in the first place. For example "the management of embedded contents (sounds and movies)"...PDF is a PRINT format. It's purpose was to make sure a document created on one machine with a particular version of the software, with particular fonts, margins, etc, etc would look exactly on another machine. It's not a universal document format for storing every type of info on the planet. How exactly am I supposed to print a PDF that has an embedded MP3?

Proof again that the open source community just doesn't get it...

#

Re: Not all of the functionality is missing...

Posted by: Anonymous [ip: 59.167.167.104] on December 01, 2007 04:12 AM

"Proof again that the open source community just doesn't get it..."



PDF was designed by Adobe - they are responsible for the presence of embedded audio and video in the specificiation, not the open source community.
The GNUPDF devs are just saying that to have a *fully* compliant PDF application, these (useless in dead-tree form) features need to be implemented.

#

Re: Not all of the functionality is missing...

Posted by: Anonymous [ip: 97.84.177.5] on December 02, 2007 04:37 AM

PDF is a PRINT format



Ehm, where in PDF name, description or specification does it say that PDF is a print format? PDF is a Portable Document Format. There is more to portability than printing.



...would look exactly on another machine.



Not sure what exactly do you mean by "look exactly". I will assume that you have omitted the words "the same", because that is very common misconception. In fact, it is nearly impossible for a document in any format to print or display exactly the same on two different devices: you have different resolutions, different media size, different printable area on the page, different contrast and intensity setting of a monitor, different shade of paper, different physical and chemical properties of ink, even different available fonts, and so on. PDF was designed to give you nearly as good display or print as possible under given circumstances. That's why it is called "portable".



How exactly am I supposed to print a PDF that has an embedded MP3?



Very easily. PDF specification states that is a feature of a document cannot be implemented on a given device, it is to be ignored. So for example things like hyperlinks or embedded audio will be ignored when printing, but the text and graphics on the page will still print as well as possible on a given printer.



I personally find this very useful. I can create a slideshow explaining how to solve a problem, demonstrating and explaining all calculations step by step, designed to be viewed on a screen. I can include voice explanations as well as written ones, thus making it more useful to both visual and auditory learners. You can still print the slideshow and study it on a train, you just get the visual part without the audio. PDF format is ideal for this, it contains everything in a single file, that you can download for offline use, will display in a very similar way on variety of devices, users of nearly all common operating systems can download a viewer free of charge, and, as noted above, various features of the document will be gracefully ignored when displayed or printed on devices that cannot handle them. In addition, thanks to pdfTeX, it is very easy to create.

#

dont be so cheap

Posted by: Anonymous [ip: 217.153.60.83] on December 03, 2007 01:33 PM
why people care that much about PDF? i think that current implementations are MORE than good for what is needed. If someone needs advance PDF stuff cant he just buy it from adobe? dont be so cheap if you are using something that is not 100% free.

#

Re: dont be so cheap

Posted by: Anonymous [ip: 74.208.44.45] on December 03, 2007 11:52 PM
Don't believe there is an adobe acrobat version available for *NIX....that would be one reason buying from Adobe is not an option

#

Re: dont be so cheap

Posted by: Anonymous [ip: 82.169.41.246] on December 29, 2007 08:39 PM
I'm a student and need the sort of features available in the free (Windows) PDF Xchange viewer (highlighting, search, annotation etc). Hell, some of the PDF readers I used last time I had Linux installed couldn't even SEARCH. I'm not looking for fancy advanced features, just something to let me get my studying done.

Don't be so cheap? Let's see, I spend most time studying so my earnings are negligible, yet I have to pay rent, food, tuition fees etc - why would I want to pay for Adobe anything, when I get the superior PDF Xchange viewer for WinXP for free (I already had WinXP so...). This is one of the main reasons I went back to Windows (that and lack of a decent WordWeb dictionary replacement (no, an online dictionary is not a replacement - if it takes longer than to use a paper-based dictionary, it's not progress!!) as well as other little incompatibilities which just made life harder)

#

GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 212.20.255.162] on December 03, 2007 03:11 PM
There's a bit of GNU arrogance in here as well though, isn't there? I mean, those of us who have been using PDF since 1.0 and free tools like Xpdf, Ghostscript and other stuff in the TeX arena have been benefiting from a huge amount of fantastic work from very smart people, and suddenly GNU comes along with a PR campaign (because this is what this is) to try to get some momentum behind their completely new implementation. Fully half of the functionality they talk about reinventing already exists in mature free software, but they want to do everything again from scratch. How is this a good use of GNU funds?

#

Re: GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 64.6.40.50] on December 05, 2007 06:41 PM
Something I was taught long ago about software development was that when you approached 80% completion on a project, it was faster to throw out everything you had done and start over. Why? Because in your ignorance you had done things that you later proved to be very bad design choices. It is faster to re-design the whole thing from scratch, and then borrow as appropriate from the old project, than to try to fix the old project. I would be very surprised if code from these other projects didn't end up in this new one. Why should a developer strap himself to a 10 year old framework that will break when it is expanded, when he could make a new one that will work well, and then borrow a lot of code out of the other projects?

#

GNU PDF

Posted by: Anonymous [ip: 90.177.44.67] on December 03, 2007 05:25 PM
Has GNU tried to at least contact the authors of already existing tools? No? Yep why not, reinventing the wheel is much more fun and better than 2 really useful projects is to have 10 almost useless projects.... (ok, i am exaggerating but i hope you got my point!)

#

Re: GNU PDF

Posted by: Anonymous [ip: 58.28.159.83] on January 05, 2008 08:25 AM
This comment seems concerned about reinventing the wheel when so many existing tools (note the plural) fit most of the use cases.

I have been watching the GnuPDF mailing list for some time and have been impressed by how many of these authors of already existing tools have volunteered their time and code to the GnuPDF project. There are a lot of tools, but there are also a lot of gaps and redundancy. Amalgamating the libraries into one, very powerful library centralises development effort and fills all the gaps between the targeted projects that currently exist. It also leads to a project with a far-superior architecture http://gnupdf.org/Lib:Architecture that is rigorously tested at every stage http://gnupdf.org/Lib:Torture_Chamber

Like most Free software, the existing tools have largely grown out of people scratching an itch and filling gaps in existing libraries (gaps that a lot of comments here contend do not exist). The FSF, as always, takes a more professional, almost business-like approach to identifying goals and end results, organises resources and correctly prioritises development effort. The paid programmers do what is needed rather than what they feel like (thankfully these things often coincide) while still structuring development in a way that encourages and allows volunteers. The paid programmers ensure that the end goal is always getting closer and development will not stop once the library is 'good enough'. It will also, hopefully, create a more user-centric (users being, in this case, developers of PDF-related applications) library because the paid programmers can spend time on the often-ignored, less 'fun' aspects of programming like user-friendliness and documentation.

Hopefully the professional approach will encourage other programmers to join the development; snowballing the library into something that subsumes the functionality of the existing libraries. Then this comment-poster will get his wish: one, high-quality library that all developers can work on together. As soon as GnuPDF surpasses each library, I imagine that it will attract at least some of the developers of that library. That obviously increases the speed of development. To achieve that professional core of developers, working full-time on the necessary parts of the project, the FSF needs to raise funds; hence the appeal https://www.fsf.org/donate/directed-donations/gnupdf.html

With regard to a different comment, I heard a whisper that Qt is seriously considering GPLv3. It would certainly make life a lot easier if they at least made it 'GPLv2 or later' or added GPLv3 to their 'GPL Exceptions List'.

#

Hard to get excited about vaporware

Posted by: Anonymous [ip: 76.238.82.114] on December 04, 2007 03:37 PM
There are several projects that do pdfs well in linux, many have ports for windows as well. xpdf and ghostscript have been around for a while. There are many viewers, scanner to pdf applications. Now there is a new project that is going to start from scratch, and be better than all of them. They should have run this story when the application was working. Just calling it GNU pdf, gAcrobat, gAdobe, openPDF manager, - right now, it's just a word.

#

GNU PDF to fill missing gap in functionality

Posted by: Anonymous [ip: 212.84.100.116] on February 16, 2008 09:27 AM
One of the current shortcomings I find annoying is the lack of support for hyperlinks, eg tables of contents, and internal/external URLs. None of the free pdf print drivers can do this.

Its good that the FSF have raised the profile of the deficiencies of current open source PDF software, most of which can be described as "adequate for basic usage". As the PDF format is now an ISO standard, it is necessary to meet the challenge of full functionality. People shouldn't be arguing to maintain the status quo. After all, open source users' gripes about Internet Explorer are because it doesn't meet official standards - I don't hear many arguing that Dillo is an "adequate" browser - they push Firefox or Konqueror as standard compliant solutions.

As full pdf functionality is considered basic functionality by enterprise users, its inclusion (or not) will be an important factor before many people/organisations will even consider choosing open source software.

AND WHEN IS SOMEONE GOING TO CORRECT THE ATROCIOUS FORMATTING ON THIS FORUM - ITS REALLY NOT DOING LINUX ADOPTION ANY GOOD AT ALL!

#

PDF Studio

Posted by: Anonymous [ip: 74.224.121.85] on March 07, 2008 03:51 PM
There are already tools out there that do fill the missing gap in functionality. Check out for instance:

<a href="http://www.qoppa.com/psindex.html" target="_blank">PDF Studio</a>

Great tool that supports annotations, bookmark editing, text searching, form filling, etc...

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya