Opinion
The EPUB Connundrum

The EPUB Connundrum

Let's face it, the EPUB ebook format has almost everything the 21st century book publisher and reader needs. From the reader's perspective, EPUB is non-proprietary. Just like ye ol' old paper publication, when you download your EPUB, you own it! It sits on your digital bookshelf in DropBox, or where-ever, and nobody can decide later to take it away from you. There's also a wide variety of reading software for every device and platform that's sure to meet every reader's tastes, and like EPUB itself, if you buy it, it's yours. It doesn't belong to some huge web giant who's probably tracking your every page turn.

Functionally, EPUB does everything you would expect a digital book to do, and probably more. Unlike PDF, it's not just a digital representation of a physical object; it is, rather, a digital publication. There are no margins in which to write notes and there are no "pages" at the bottom of which footnotes can be found. Rather, footnotes and annotations pop-up over the page, and margin notes are anchored in the document according to the specifications of the reader you choose. Likewise, bookmarks are also handled by the reading software. Audio, video, text-to-speech (TTS), and images are all supported — as you would expect in a digital only publication — and of course, there can also be external links.

For publishers, there's also a lot to like. They have all the power and control they need over their books to present readers with a polished, professional publication without interfering with preferences that should be left under reader control. From font selections and multiple columns on the screen to multimedia that's embedded and playable right on the page — it's all under the publisher's control. But beyond that, control returns to you. As the IDPF puts it, "content presentation should adapt to the user." So things like relative font sizes, background styles, highlight colors, how multimedia is presented, and many other options, becomes a collaboration between developers and readers who are free to explore different ways publications can be presented on different platforms.

Then there are the modifications readers can make to their publications. Just like in paper publications, text can be highlighted, bookmarks can be set, notes can be written in the "margins", and so on. Put it all together and the EPUB standard is so powerful it's completely understandable that, even as far back as six years ago, it was being hailed by some as "the industry standard" for ebooks.

Except… Sadly, the reality of the EPUB user experience rarely lives up to the capability of the format. Having looked through a cross section of EPUB readers available to Android users (sorry, I don't have an Apple device), I've discovered that the actual level of support for the EPUB format varies greatly from reader to reader. In fact, in some cases, support is so minimal features simply don't work — at all! Now, some of this can be laid right at the feet of the IPDF itself, who openly state in the specification document that support for some of the more advanced features is "optional." All of which goes a long way to explain why Amazon bought mobipocket and Barnes & Noble went to the trouble of developing their own proprietary reading software. (Though, sadly, the Nook is no better at properly displaying an EPUB than the readers examined below.)  It also explains why EPUB has failed to live up to the hailed vision of being "the industry standard" for ebooks. That title now more probably belongs to Kindle — horrifying as it to contemplate a single corporation having that much control over the world's digital literature.

To demonstrate my findings, the pictures on the right are screenshots of a lorem ipsum I created to test the compliance of EPUB readers. The lorem ipsum was hand coded, as all complex publications must be, to the EPUB version 3.0.1 standard. It was then validated using the IDPF's own validator. EPUBs validated by the IDPF will be accepted by both Google and Barnes & Noble — but sadly not by Smashwords — and can also be converted directly to Kindle by Amazon. Unfortunately, validation does not guarantee correct decoding, or even full functionality, especially among the fleet of reading software produced by independent developers — the very readers those most interested in their privacy are likely to choose.

While there are a wide variety of problems, they can be reduced three main categories. Without descending into the technicalities of the EPUB standard, the most common problem can be described as an inability to display publications the way the publisher coded it to appear. Another is proper display, but things like footnotes, multimedia, and so on don't work, or don't work properly. Then there are those that are a blend of the other two. In the example above, for instance, the page is displayed correctly, but the footnotes are inactive. This particular reader also complained that the EPUB was non-compliant and had "problems."

That an EPUB reader will charge a fully validated EPUB with being "non-compliant" is especially troubling for publishers. Especially small, indi-publishers. The average reader is likely to assume that the developer of the software is an expert on the format, and will blame the publisher, who has done everything right. It makes EPUB and the IDPF look bad as well.

  The second example is of a reader that displays the page very poorly. Note that the frame for the chapter header is completely missing and the "drop cap" is misaligned, which in turn screws up the spacing of the first line relative to the paragraph. Though it might not be easily seen in the screencap, this reader also ignored the font commands, and instead insisted on using its own fonts. This can be a serious problem when the publisher has made font family choices for reasons of clarity or artistic style important to the context of the material. This reader also fails to popup the footnotes. Unlike the reader above, however, at least it doesn't claim that the book is "non-compliant."

Finally, the third example. It uses the right font family (no actual font metric files were included in the EPUB) and displays the "drop cap" properly as well. But as you can see from the screenshot, it makes a hash out of the chapter header first, by not making the background transparent, and second, by misaligning the text in the frame. It is, however, the only one of the three that displays EPUB 3 annotations correctly. The last screenshot shows a note popped up.

Clearly this is a very limited test of EPUB readers. No attempt was made to gauge compliance with audio, video, or TTY. But even so the results were appalling. 

So, to return to where I began this blog post, in theory EPUB has everything readers and publishers could want, and by its very nature, like the Internet itself, is built to continue to evolve along with the technologies around it. Add its independence, the ability of publication owners to actually take custody of their digital property sans gatekeepers, and EPUB truly does deserve to be to digital publications what paper was to physical publications, and what HTML has become to the web. But for it to live up to that potential, to deserve the moniker of being the "industry standard" for ebooks, a couple of things are going to have to happen.

Certify EPUB Readers

EPUB readers need to be certified as being fully EPUB compliant just like validation ensures that all publication coding is compliant. And to qualify, a reader should be fully, 100% compliant with the standard! While the idea of making compliance with some standards optional for security reasons is laudable, it places the decision of just how secure "secure enough" is, in the hands of developers, rather than readers. A reader can't enable functionality that they need when it's not there, but they can (and should be able to) disable functionality they find objectionable.

From the publisher's perspective, knowing that the publication is guaranteed to be presented as coded — unless the reader chooses otherwise — is reassuring. It puts presentation control exactly where it ought to be: in the hands of the reader. It makes possible the same kind of strategy we've become used to with our smartphones and tablets. Publishers can let readers know that they'll need to have scripting turned on, for example, to take advantage of certain features of the publication. At the moment, the mish-mash of (non)compliance with the standard in the EPUB sphere makes using such a strategy a fool's errand.

Make EPUB a Standalone Container

If I give you a reference book off of my shelf, chances are it's going to have text that's highlighted, underlined, and there will be hand written notes in the margins of the page. You know this. You expect it, because it's a used paper book. The notes and highlights can't simply be erased. You would think a 21st century "digital book" would be the same. Sadly, the EPUB standard does not agree. Reader markup — highlighting, margin notes, and bookmarks — are all managed by, and proprietary to, the software being used. Which means, all reader markup is lost if (or more likely when) the reader finds it necessary to change reading software. It also means that, if a reader marks up a publication using reader foo, that markup will not be seen when the publication is opened on another device that uses software bar. Likewise, markup inserted using software bar cannot be seen when the publication is opened back in the office, on software foo.

This shortcoming is a huge advantage to proprietary formats like Kindle and other closed ecosystems. Reader markup in an EPUB should be incorporated into, and become a part of, the publication. Doing so would give EPUB the true independence and portability the standard promises.

Incorporating this feature into the EPUB standard would also be an entirely new way of looking at DRM. One that's very "back to the future" like because it would vault the very old idea of the "used book" into the digital future, with as little threat to sales as the used paper book market is to new paper book sales. Who wants a used book when you can have a new, pristine one for just a few bucks? And yet, as a DRM feature, all that would be needed is the ability of publishers to turn off any "global deletion of reader markup" functions built into the reading software. Add the kind of "branding" common to digital sales of PDFs to the standard and you would then have the makings of a DRM system that doesn't infringe on the rights of EPUB owners to do what they will with their own, but still protects copyright as thoroughly as it did in the pre-digital era.

I'm under no illusion that these changes would be easy to implement, but doing so would go a long way to helping the EPUB format live up to its potential, and would also go a long ways toward leveling the playing field between the big monopoly players, who are able to create their own publishing and reading platforms, and small indi-publishers who are entirely dependent upon equally small indi-developers to display the publications they provide directly to their readers.