Sunday, March 17, 2013

Hatter Sample Books

I was reading and found their github repository for their book on making eBooks available. I thought it was a good idea, and decided to use Hatter to start making Project Gutenberg books available in ePub format. I started with the current top 100 books list, and will work my way through them. Below I'll list the book, a link to the github repository, and a link to the ePub. If you'd like to fix something, fork one of the repositories, use Hatter to fix the file, and submit a pull request. Or for a new book, create your own repository, use Hatter to generate a book, email me a link to the repository, and I'll add it below.

Part of the reason for doing this is to test Hatter and see what features I need to add. I haven't spent a lot of time on each book. In most cases 15 to 20 minutes. Even so, you should find that most of these books are as good as, or better than, what other publishers are putting out for these books.

Time wise, Les Misérables was the exception. Les Misérables is huge, has hundreds of sections, and required an hour or two of work. I am going to skip any books that will require a ton of formatting. An example is: "The Survey of Cornwall". I'm also going to skip anything I don't feel comfortable converting.

1. Pride and Prejudice by Jane Austen github ePub
2. The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle github ePub
3. Les Misérables by Victor Hugo github ePub
4. Syndrome by Thomas Hoover github ePub
5. Adventures of Huckleberry Finn by Mark Twain github ePub
6. Grimm's Fairy Tales by The Brothers Grimm github ePub
7. How to Live on 24 Hours a Day by Arnold Bennett github ePub
8. Alice's Adventure in Wonderland by Lewis Carroll github ePub
9. Leaves of Grass by Walt Whitman github ePub
10. Gulliver's Travels by Jonathan Swift github ePub
11. The Importance of Being Earnest by Oscar Wilde github ePub
12. Metamorphosis by Franz Kafka github ePub
13. Wuthering Heights by Emily Bronte github ePub
14. The Prince by Nicolo Machiavelli github ePub
15. The Adventures of Tom Sawyer by Mark Twain github ePub
16. The Mysterious Affair at Styles by Agatha Christie github ePub
17. Heart of Darkness by Joseph Conrad github ePub

These are a little out of order, but still in the top 100.

13. Peter Pan by J. M. Barrie github ePub
14. Dracula by Bram Stoker github ePub

This is a bit outside of the top 100 list, but I listen to The Partially Examined Life. One of the readings was:

The Souls of Black Folks. by W.E.B Du Bois github ePub. It's really well written and a pretty good read, so I'll include it here.

In fact I'll start including things I'm turing into ePubs because I am reading them. I'll skip over the pieces that are copyrighted. To be honest, I turn blog posts I want to read into ePubs because they are unreadable on badly designed blogs. They are easier, and more enjoyable to read on my nook, on the train.

The Raven by Edgar Allen Poe github ePub

Sunday, February 03, 2013

Hatter - An App to turn text files into ePub books

I am not happy with the ePub files I download for titles at Project Gutenberg. They seem like bundled up versions of the text files that have been split arbitrarily. I want something broken on chapter boundaries the way the author wrote the book. So I wrote an application to generate decent ePubs from a text file. I got started so I'd have a tool to convert files from Project Gutenberg, but it will work on any text file.

Here you can see that I have opened "Walden" by Henry David Thoreau. I haven't done much processing on it beyond adding some <h1> tags for chapter titles and some <table>s for tabular data. (Actually, I did have to do some more work on the file. Project Gutenberg text files include randomly uppercased words and double dashes --. To make a clean file you have to convert these markers. The text file is still readable with the markers, but looks really ugly in an ePub.)
The blue markers on the line numbers to the left designate where a one section ends and a new section starts. Click a line number to add a marker. Clicking on a marker allows you to specify text to use as a Table of Contents entry:

You don't have to mark paragraphs. Any block of text is treated as a paragraph unless it starts and ends with an <h1>, <h2>, <blockquote> or a <table> HTML tag. Blocks wrapped in those tags are just inserted as is. Other blocks are wrapped in a tag. You can insert any valid HTML tags withing blocks and they will be left as is. You can make something bold by surrounding it with <b> tags or italic with <i>.

You need to set a Title and Author:

A publisher and identifier can be set, but they are optional. The identifier is usually the book's isbn, but you can use whatever you'd like. If you don't set an identetifier Hatter will generate a UUID and use that as the book's identifier. You can also add a cover image by dragging an image to the Cover Image well.

Hatter generates ePubs that validate using epubcheck version 3.0. The ePub files also include an NCX entry so older readers should be able to read the files too.

I've started another post with sample ePubs here: Hatter SampleBooks As I finish new sample books I'll add them to that post.

Some things I am working on:

1) Being able to edit the book's css. For now you get some generic default values that look OK. - done
2) Add the ability to add other resources that are included in the book. This would allow you to reference fonts and images in the text. - done
3) Make the UI not so ugly. - ongoing.

I hope to make Hatter available soon on the Apple App store. It will probably be $10 or so.

Oh, and in case you think you are going to create eBooks using files from Project Gutenberg and upload them to the iBooks store, read the license on the Project Gutenberg files. You can create free books, but not for sale ones. (Well, actually you can. But you have to donate 20% back to Project Gutenberg, which isn't a bad thing.)

Update Feb 16, 2013: I think everything is done for the 1.0 release. I just need to do some more testing and submit to the Apple App Store.

Update Feb 23, 2013: Version 1.0.0 is available on the app store. Version 1.0.1 coming soon with the ability to load and save Hatter documents to iCloud and a menu item to show the Getting Started Guide in case users need to refer back to it.

Update Mar 1, 2013: Uploaded version 1.0.1 to the app store. Fixed an issue creating ePubs that have only one section. (That was embarrassing.) Version 1.0.1 also has iCloud support and a show the Getting Started Guide menu item under the Help menu.

Update Mar 17, 2013: I rejected the v1.0.1 since Apple was taking a LONG time to review it. I guess they are getting behind. I uploaded a new binary that has more features, and more bug fixes. I've been working with a small Taiwanese publisher to fix some issues related to generating a Chinese ePub.

The next thing I'm working on is moving ePub generation off the main thread. Most ePubs generate in a second or two. Les Misérables takes several minutes and shows why I need a status window.

Update Mar 29, 2013: After two rejections for being able to save documents to iCloud I give up and turn off iCloud support. (I'm not the only one having their app rejected for this.) 1.0.1 is submitted again. It has MANY bug fixes, building ePubs is on a separate thread and building ePubs is very fast. A second or two for Les Misérables.

Update Oct 6 2013: Hatter is at version 1.0.4. It has a lot more stability and is significantly better at creating ePubs than version 1.0. Most of the credit for making that happen goes to Fred Jame who runs a publishing company in Taiwan. He pushed me to support a lot of features I wouldn't have thought to. Like vertical text and epub:type attributes for asides in the text body.

Since 1.0 Hatter has gotten a live preview mode so you can see how the section will look in the final ePub. Live preview is really useful for seeing how css changes will look without having to build an ePub and load it onto a device.

The next big feature is the ability to add and edit items in the Tag Palette. The tag palette is a table of buttons you can use to enter blocks of text in a way that makes sense for a type of tag. h tags go around a line, blockquote tags go around a block of text, i tags go around the current selection. You can export and import a tag palette so you can share a it with a group. Editing tag palettes will come in version 1.0.5, which should be out soon.