It was interesting to read over at TeleRead.org that the Los Angeles Public Library won’t buy e-books in a format for Adobe Digital Editions until ADE software supports text to speech, according to Library Journal.
This is a good decision by the LA Public Library as there are a number of thorny issues surrounding ePub, DRM and Text to Speech.
The first is that DRM protection and text to speech do not sit well together. Why? Because as soon as you offer text to speech you introduce a major security vulnerability into your elaborate DRM mechanism. This is why I believe that many ebooks in PDF format are shipped with text to speech disabled. Granted, in a number of cases, the publisher may not have the audio rights to the work but I suspect the majority of the time they don’t want to subject their works to this vulnerability.
To understand the security vulnerability you have to understand a little of how the text to speech process works. The following is specific to Microsoft Windows. I’m not familiar with text to speech on Linux or MacOS but I assume they have similar mechanisms.
Microsoft Windows supports text to speech using SAPI, the Speech Application Programming Interface. This interface serves two functions. It allows any Windows application like Adobe Reader, Microsoft Excel, and my own Text2Go to pass a string of text to the API and have it converted to speech. This speech can be output directly through the PC’s speakers or saved to an audio file (in .wav or mp3 format for example) for later playback. The actual conversion is done by a computerized voice. Windows XP ships with the atrocious sounding Microsoft Sam voice. Windows Vista and Windows 7 ship with the marginally better Microsoft Anna voice. Thankfully there are a number of 3rd party voice providers who sell a huge range of high quality, natural sounding voices in multiple languages and accents. Voices can be registered through SAPI and made available to any application that wants to provide text to speech functionality. Applications can then use SAPI to discover which voices are available on their system and let the user to choose a voice to use.
This brings us to the security vulnerability introduced by text to speech. In order for an ebook to be converted to speech, the entire text must be passed through one of the installed voices. For a normal voice this is not really a problem. The text will be spoken aloud through your speakers. But what say we created our own voice that didn’t convert the text to speech but instead saved it to a file. This would give you the means of instantly creating a plain text copy of an ebook. The only downside would be you’d lose all formatting information.
Such a voice would be very easy to develop. Microsoft even provide a sample voice as part of their documentation. Applications such as Adobe Reader or Adobe Digital Editions would have no way of knowing if your voice was a genuine text to speech voice or a text to text file voice.
The only way to guard against this would be for Microsoft to introduce a certification process for all SAPI-compliant voices. Voice vendors would be required to submit their voices to Microsoft for validation. Once verified, the voices would be digitally signed to identify them as being certified and to ensure they were not later tampered with. Applications could then choose to only use these certified voices for text to speech.
If you’re thinking this is a little far fetched then you may like to know that this is precisely the process Microsoft requires Vista-compatible video drivers to go through. This was to prevent the user from installing a video driver onto their system that pipes the video output from a DRM-protected HD-DVD or Blu-Ray disc directly to an unencrypted file. Peter Gutmann of the University of Auckland has conducted an interesting analysis of the Microsoft Vista DRM.
Closed, proprietary systems that don’t allow you to install your own software, such as the Amazon Kindle, will be less vulnerable to this approach.
As ebooks gain in popularity it will be interesting to see if Microsoft introduce a validation system for text to speech voices. In the meantime I’m sure publishers will continue to demand control over whether their works support text to speech on a title by title basis.
Which brings us back to the lack of text to speech in Adobe Digital Editions. Even if Adobe do add text to speech support, it’s still not much use to readers if publishers persist in disabling text to speech for the majority of their titles. The LA Public Library need to insist that not only does ADE support text to speech but all supplied ebooks also have text to speech enabled.
This confusing state of affairs makes it difficult for the ebook purchaser to answer the simple question ‘Can I convert this ebook to speech?’ prior to purchase. One of the great benefits of the ePub format is that you know there are no hidden restrictions on what you can do with your ebook. It can be read on any device with an ePub compatible reader and text to speech will always be possible. However now there are ‘DRM-protected ePub ebooks’ around, once again the consumer is left needing to ask a hundred questions to determine their rights for each individual title.
The real deception here is continuing to call ‘DRM-protected ePub’ ebooks ePub. As soon as an ePub ebook is wrapped in DRM it loses all the advantages of an open standard that come with ePub. I’m sure publishers recognise that ‘DRM-protected ePub’ is quite a mouthful and the tendency will be to shorten it to ePub.
I find this muddying of terminology particularly frustrating as I near the release date for a major upgrade to Text2Go which will support converting ePub ebooks to audiobooks. I would like to be able to say in my marketing material that ‘Text2Go supports ePub ebooks’ without having to add a caveat such as ‘except those protected by DRM’. A statement such as this means nothing to people outside the industry and all of a sudden you’re having a technical discussion on ebook formats, DRM, its restrictions and why is it necessary. By the time you’ve finished, if the customer hasn’t fallen asleep or fled, they’re going to be highly confused or suspicious of ebooks.
To my mind once an ePub ebook is wrapped in DRM it should not be allowed to use the name ePub. Perhaps instead they could be referred to as eSnub – the format publishers use when snubbing the rights of readers and the format readers should snub if they know what’s good for them.
For those interested in participating in the Text2Go ebook to audiobook beta, drop an email to markgladding at ebooksjustpublished.com and I’ll send you a prerelease copy as soon as it’s available.