eBooks Just Published

Fresh, DRM-free

Jun

17

Creating an ePub Reader for Text to Speech Use

Recently I’ve been working on an ePub reader prototype. Once I’ve created a robust ePub ebook reader, I’m going to move this functionality into my text to speech application, Text2Go. My goal is provide a system that will convert an ebook to speech and transfer it to your iPod in a single click. This will allow any ePub formatted ebook to be turned into an audio book which can then be listened to while driving, walking, working out at the gym or any other activity where reading is impractical.

The focus of my ePub reader is quite different from the norm due to the fact that the recipient is not a human reader but a machine reader or computerized voice. A computerized voice cares nothing for fancy layouts, font selection or images. This makes my job a lot easier in many ways. However a computerized voice lacks one important skill a human reader uses frequently, often at a subconscious level. A computerized voice has no way of skimming over a section of text. For example, human readers will never read the same footer at the bottom of every page or meticulously read every page number. If this text is mixed in with the actual body of the story (usually as a result of some blind conversion process from a different ebook format to ePub), then the computerized voice will read this text in full on every page. This becomes incredibly irritating for the human listener.

The ePub standard provides direct support for structured documents that include footnotes, sidebars, annotations, page numbers, etc. This is achieved using an alternative xml document format know as DTBook. The ePub standard recommends this format be used for educational publications and publications that are highly structured - for example when it’s important that the page layout of the original printed document is maintained. DTBook actually stands for Digital Talking Book and is a standard developed by the Daisy Consortium for blind, visually-impaired, physically handicapped, learning-disabled or otherwise print-disabled readers. Although originally designed for talking books, the extra information (i.e. the metadata) the DTBook format contains about the ebook text makes it a great choice for any ebook and will greatly assist in applications such as my text to speech application.

Which brings me to the fact that not all ePub titles are created equal. Those that have been lovingly hand-crafted by someone who understands the ePub format will never suffer these problems. Those that have been blindly converted using an automated tool from a source document format that doesn’t make any distinction between the various roles of text within a document will be plagued by such problems. This seems similar to differences in quality between different editions of a print title. Unfortunately you can’t heft an ebook, feel the quality of the paper between your fingers, flex the binding or quickly thumb through the pages of an ebook prior to purchase. The ability to view a sample goes a long way to solving this but it would also be worthwhile for reviewers to comment on how well the ePub book has been put together, what underlying format is used to represent the text and does it have a table of contents to make navigation easy.

My ePub reader has three tasks to complete in decreasing order of importance.

  1. Extract the text from document, converting it from html to plain text ready for text to speech conversion.
  2. Organize the content into chapters. Each chapter will become its own audio track. This will make it easier to navigate than a single huge audio track.
  3. Extract the cover image and use it as the album art for each audio track.

Although all ePub documents conform to a standard, there is a lot of variability possible within the standard. In order to make my reader as robust as possible, I’ve been trying to find ePub titles from a wide range of sources. I’m particularly interested in those that have been created with different tools or even hand-crafted.

ePub Reader Prototype

An edition of White Fang by Jack London illustrates this variability nicely. It had a couple of unique features I needed to handle. Firstly it didn’t have a standard ePub table of contents. All it contained  was a single html file. However this html file had its own table of contents embedded at the top of the file.  It was implemented as an html table, with each entry containing a link to the relevant section further down in the document. Those familiar with html markup will know that you can use the anchor tag to name a specific point in a document. You can then link to this named anchor so your browser (or reader) will position you at this exact point in the document when the link is clicked. You can use this technique when linking to an external document or within the same document.

I had noticed the ePub authoring tool, Calibre, uses named anchor points when creating links for its table of contents. However because each chapter was stored in a separate file (within the ePub file, which is actually just a zip archive), the anchor points were a bit superfluous - each chapter just had a single anchor point at the top of the page. Although this was how Calibre organised its chapters, I imagined that it was quite possible to store an entire book in a single html file and use named anchor points to link to the relevant sections from the table of contents.

Therefore I added the ability for my ePub reader to split a book into sections based on where the named anchor points lay in the document. This had an unexpected benefit when I came to read the White Fang title. Although it didn’t contain a standard ePub table of contents, my reader was able to find the named anchor points it had used to implement its own in-page table of contents and correctly split it into chapters.

The other interesting feature of the White Fang title also centred around the table of contents. As I said before, this was implemented using an html table. However when my reader extracted the text from the table, all the text was run together. This would have been disastrous when it came time to convert it to speech. Unlike human readers, computerized voices don’t recognise an unusually long word as being a number of words run together.

When extracting text from a table, I needed to understand that a table cell acts as natural boundary for text and should be punctuated accordingly.

What is clear is that it’s really beneficial to gather as many ePub format ebooks from as many varied sources as possible to test my reader with. You can imagine my delight then when the new ePub ebook site, ePubBooks.com was announced recently on Teleread.org. Here is another source of ePub books, generated using their own tool. From the sample of titles I’ve downloaded, these seem to be very well formatted. My young and naive ePub reader had no trouble loading and splitting them into chapters. For interests sake, I downloaded their version of White Fang. This version had an ePub table of contents. The end result was the same the first version I had found. My only complaint with the titles at ePubBooks.org is they have no cover image. This obviously doesn’t detract from the reading experience but it does make organising and browsing through your library of ebooks a lot less fun if they don’t have cover images.

Finally if anyone knows of other ePub sources or have ePub format ebooks that have been hand-crafted or created with other tools, then I’d love the chance to run them through my ePub reader. Please drop me a line at markgladding at ebooks just published dot com.

5 Responses so far

Here is an EPUB ebook that uses DTBOOK instead of XHTML:

http://www.idpf.org/2007/ops/samples/hauy.epub

Here’s a cross-platform open-source reader you might interested in:

http://code.google.com/p/emerson-reader

You may also consider the DAISY Pipeline to generate TTS books:

http://daisymfc.sourceforge.net/doc/scripts/Narrator-DtbookToDaisy.html

Two questions:

- It sounds like you’re converting the text to a speech (audio) file for use on the iPod. (Correct me if I misunderstand what you’ve written.) Are you stuck with mp3 or can you use the .amr codec, which will produce much smaller files?

- With all its drawbacks, the Kindle’s TTS enables me to interrupt my reading to clean up after dinner without having to forego continuing the story. TTS starts where I stopped in the text; and the transition back is similar. Are you forced into an either/or situation with the audiofile? Or will you have text and audiofile and the ability to switch between them?

I guess if I knew more about audio, I could tell whether the text file could point to particular spots in the audio that correspond, say, with each new paragraph.

Actually, I have more questions now, so I’ll send you a note as you suggest, to test some ePubs I have.

Roger Sperberg
myname at spress dot ws

[...] certainly be a huge blessing to a large community of challenged readers. Here is the beginning of a long article on the site. There are far more technical issues than I would have imagined. Speech or not, anyone who is [...]

Daniel,

Thanks for the link to the Daisy pipeline and open source reader. I’ll definitely check those out.

I’ve been using hauy.epub to test my DTBook reading implementation. My question is, is this the only DTBook in existence? I know it’s not the case, but I’ve had real trouble finding other examples in DTBook format. It’s a shame that DTBook doesn’t seem to be more widely used.

Roger,

Text2Go generates MP3 files natively. When working with iTunes, it can use any of the codecs iTunes produces. There is no reason why I couldn’t add .amr support. Do many players handle this format?

Your use of the Kindle to switch from reading to speech while you’re doing something else is interesting. This is a great feature for visual ePub readers such as the Kindle.

Text2Go is purely a text to speech application. It assumes you’re going to listen to the entire text as audio.

Text to speech is very useful technology that have various of uses like sending sms to landline phones.
Services like text2land allow this feature.

Leave a comment or review