For a long time I’ve wanted a painless way of converting ebooks to speech in my text to speech application, Text2Go. Up until recently I’ve thought the best way to achieve this would be to support the PDF standard. Although PDF is ubiquitous and a great way to distribute documents that will ultimately be printed, due to its internal structure, it’s actually quite difficult to reliably extract text from a PDF document. There are a number of software libraries that can be purchased to perform this task. However these are either unreliable, overly complex, exorbitantly priced or a combination of all three. This situation has been brought about by a combination of factors, such as PDF being a proprietary format, being binary rather than text based, and being designed primarily to accurately represent a printed document.
These problems have meant that I’ve deferred adding automatic ebook conversion to Text2Go.
The ePub format changes all this as it’s the complete opposite to PDF in a number of important ways.
Firstly, ePub is an open format, not controlled or owned by any one company. This means that anyone can download the ePub specification (actually a series of standards known as OPF and maintained at the IDPF Forums). There are no licensing costs or restrictions associated with the standard.
Secondly, ePub is built on top of existing standards. This is perhaps the most important difference between ePub and PDF and the one most likely to guarantee the ultimate success of ePub.
An ePub file is actually a zip archive, containing multiple files. You can examine the contents of any ePub file simply by renaming it to a .zip file and opening it with any tool or OS that supports the zip archive format (e.g. Windows XP and above, Winzip, gzip, 7-zip, etc).
Inside a typical ePub file, you will find the following types of file.
- metadata.opf – An xml file containing information about the ebook, such as the author, publisher, title and a list of all the other files in this ePub file.
- toc.ncx – A table of contents for the ebook.
- One or more html pages, containing the ebook text.
- Any images used in the ebook, such as a cover image, and images that accompany the text. Images are stored in standard formats such as jpeg.
Notice that the standard file formats used to build the web, such as xml, html, jpg are used rather than any new, ePub-specific formats. The benefits of this approach are immeasurable. Support for all these formats is built into every modern operating system and programming language. The technology required to create an ePub reader application is the same as that required to display a web page and just about any modern computing device, be it a PC or more importantly a mobile device ships with this technology.
One of the things I love about ePub is that all text is represented in text files, be it html or xml. There is something incredibly reassuring about being able to open a file in a basic text editor and view or edit it.
Finally, the ePub format is DRM-free. This means that anyone purchasing an ePub file can rest assured that they have full access to the content, and are free to convert it to any other format, transfer and display it on any device, print it and importantly in this case, convert it to speech. Compare this to the sad state of affairs that plagues the Amazon Kindle now that publishers are disabling the TTS feature on more and more of their titles.
The ePub format doesn’t preclude the use of DRM but fortunately to date no one has come up with a DRM scheme for ePub. Personally I hope this never happens and all titles released as ePub remain completely open. At the moment buying an ebook in ePub format is a guaranteed way of knowing the title is completely DRM-free.
Unfortunately, the above two paragraphs are just my wishful thinking. There are already at least 2 DRM’ed ePub formats in existence, as pointed out by Keith Fahlgren in the comments below
So far I’ve found developing a reader for the ePub format a relatively painless process. This is primarily due to the fact ePub uses existing standards. I’ve got a complete toolbox of software libraries I can use to read images, html, xml and zip archives. This lets me concentrate on building the text to speech conversion process. The nice thing about xml and html is that my reader can easily ignore information that’s not needed by my application. This makes for a more robust reader. Superfluous or unexpected information is simply skipped over.
I’m very excited about adding ePub support to Text2Go and can’t wait to release it. I have my eye on the ever growing collection of ebooks published on Smashwords.com, all available in ePub format.
I believe the openness of ePub coupled with the ease of implementing an ePub reader makes for a very bright future. I can only see more and more companies, organisations and individuals embracing ePub, in the same way people embraced html on the web.
Please, please, publishers, choose DRM-free and choose ePub!
Sidenote: The ePub logo used above is not an official logo. Instead it’s one of a series of free public domain logos provide by Threepress Consulting. The sad fact is that ePub still doesn’t have an official logo. David Rothman of Teleread.org lamented this fact almost two years ago.