Sunday, March 17, 2013

Hatter Sample Books

I was reading unglue.it and found their github repository for their book on making eBooks available. I thought it was a good idea, and decided to use Hatter to start making Project Gutenberg books available in ePub format. I started with the current top 100 books list, and will work my way through them. Below I'll list the book, a link to the github repository, and a link to the ePub. If you'd like to fix something, fork one of the repositories, use Hatter to fix the file, and submit a pull request. Or for a new book, create your own repository, use Hatter to generate a book, email me a link to the repository, and I'll add it below.

Part of the reason for doing this is to test Hatter and see what features I need to add. I haven't spent a lot of time on each book. In most cases 15 to 20 minutes. Even so, you should find that most of these books are as good as, or better than, what other publishers are putting out for these books.

Time wise, Les Misérables was the exception. Les Misérables is huge, has hundreds of sections, and required an hour or two of work. I am going to skip any books that will require a ton of formatting. An example is: "The Survey of Cornwall". I'm also going to skip anything I don't feel comfortable converting.

1. Pride and Prejudice by Jane Austen github ePub
2. The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle github ePub
3. Les Misérables by Victor Hugo github ePub
4. Syndrome by Thomas Hoover github ePub
5. Adventures of Huckleberry Finn by Mark Twain github ePub
6. Grimm's Fairy Tales by The Brothers Grimm github ePub
7. How to Live on 24 Hours a Day by Arnold Bennett github ePub
8. Alice's Adventure in Wonderland by Lewis Carroll github ePub
9. Leaves of Grass by Walt Whitman github ePub
10. Gulliver's Travels by Jonathan Swift github ePub
11. The Importance of Being Earnest by Oscar Wilde github ePub
12. Metamorphosis by Franz Kafka github ePub
13. Wuthering Heights by Emily Bronte github ePub
14. The Prince by Nicolo Machiavelli github ePub
15. The Adventures of Tom Sawyer by Mark Twain github ePub
16. The Mysterious Affair at Styles by Agatha Christie github ePub
17. Heart of Darkness by Joseph Conrad github ePub

These are a little out of order, but still in the top 100.

13. Peter Pan by J. M. Barrie github ePub
14. Dracula by Bram Stoker github ePub


This is a bit outside of the top 100 list, but I listen to The Partially Examined Life. One of the readings was:

The Souls of Black Folks. by W.E.B Du Bois github ePub. It's really well written and a pretty good read, so I'll include it here.

In fact I'll start including things I'm turing into ePubs because I am reading them. I'll skip over the pieces that are copyrighted. To be honest, I turn blog posts I want to read into ePubs because they are unreadable on badly designed blogs. They are easier, and more enjoyable to read on my nook, on the train.

The Raven by Edgar Allen Poe github ePub


Sunday, February 03, 2013

Hatter - An App to turn text files into ePub books

I am not happy with the ePub files I download for titles at Project Gutenberg. They seem like bundled up versions of the text files that have been split arbitrarily. I want something broken on chapter boundaries the way the author wrote the book. So I wrote an application to generate decent ePubs from a text file. I got started so I'd have a tool to convert files from Project Gutenberg, but it will work on any text file.


Here you can see that I have opened "Walden" by Henry David Thoreau. I haven't done much processing on it beyond adding some <h1> tags for chapter titles and some <table>s for tabular data. (Actually, I did have to do some more work on the file. Project Gutenberg text files include randomly uppercased words and double dashes --. To make a clean file you have to convert these markers. The text file is still readable with the markers, but looks really ugly in an ePub.)
The blue markers on the line numbers to the left designate where a one section ends and a new section starts. Click a line number to add a marker. Clicking on a marker allows you to specify text to use as a Table of Contents entry:


You don't have to mark paragraphs. Any block of text is treated as a paragraph unless it starts and ends with an <h1>, <h2>, <blockquote> or a <table> HTML tag. Blocks wrapped in those tags are just inserted as is. Other blocks are wrapped in a tag. You can insert any valid HTML tags withing blocks and they will be left as is. You can make something bold by surrounding it with <b> tags or italic with <i>.

You need to set a Title and Author:


A publisher and identifier can be set, but they are optional. The identifier is usually the book's isbn, but you can use whatever you'd like. If you don't set an identetifier Hatter will generate a UUID and use that as the book's identifier. You can also add a cover image by dragging an image to the Cover Image well.

Hatter generates ePubs that validate using epubcheck version 3.0. The ePub files also include an NCX entry so older readers should be able to read the files too.

I've started another post with sample ePubs here: Hatter SampleBooks As I finish new sample books I'll add them to that post.

Some things I am working on:

1) Being able to edit the book's css. For now you get some generic default values that look OK. - done
2) Add the ability to add other resources that are included in the book. This would allow you to reference fonts and images in the text. - done
3) Make the UI not so ugly. - ongoing.

I hope to make Hatter available soon on the Apple App store. It will probably be $10 or so.

Oh, and in case you think you are going to create eBooks using files from Project Gutenberg and upload them to the iBooks store, read the license on the Project Gutenberg files. You can create free books, but not for sale ones. (Well, actually you can. But you have to donate 20% back to Project Gutenberg, which isn't a bad thing.)

Update Feb 16, 2013: I think everything is done for the 1.0 release. I just need to do some more testing and submit to the Apple App Store.

Update Feb 23, 2013: Version 1.0.0 is available on the app store. Version 1.0.1 coming soon with the ability to load and save Hatter documents to iCloud and a menu item to show the Getting Started Guide in case users need to refer back to it.

Update Mar 1, 2013: Uploaded version 1.0.1 to the app store. Fixed an issue creating ePubs that have only one section. (That was embarrassing.) Version 1.0.1 also has iCloud support and a show the Getting Started Guide menu item under the Help menu.

Update Mar 17, 2013: I rejected the v1.0.1 since Apple was taking a LONG time to review it. I guess they are getting behind. I uploaded a new binary that has more features, and more bug fixes. I've been working with a small Taiwanese publisher to fix some issues related to generating a Chinese ePub.

The next thing I'm working on is moving ePub generation off the main thread. Most ePubs generate in a second or two. Les Misérables takes several minutes and shows why I need a status window.

Update Mar 29, 2013: After two rejections for being able to save documents to iCloud I give up and turn off iCloud support. (I'm not the only one having their app rejected for this.) 1.0.1 is submitted again. It has MANY bug fixes, building ePubs is on a separate thread and building ePubs is very fast. A second or two for Les Misérables.

Update Oct 6 2013: Hatter is at version 1.0.4. It has a lot more stability and is significantly better at creating ePubs than version 1.0. Most of the credit for making that happen goes to Fred Jame who runs a publishing company in Taiwan. http://puomo.tw/ He pushed me to support a lot of features I wouldn't have thought to. Like vertical text and epub:type attributes for asides in the text body.

Since 1.0 Hatter has gotten a live preview mode so you can see how the section will look in the final ePub. Live preview is really useful for seeing how css changes will look without having to build an ePub and load it onto a device.

The next big feature is the ability to add and edit items in the Tag Palette. The tag palette is a table of buttons you can use to enter blocks of text in a way that makes sense for a type of tag. h tags go around a line, blockquote tags go around a block of text, i tags go around the current selection. You can export and import a tag palette so you can share a it with a group. Editing tag palettes will come in version 1.0.5, which should be out soon.


Sunday, August 07, 2011

Alice


Based on this post I wrote an eBook reader. It supports both PDFs and ePub books.

Most eBook readers are witten to support selling books in an online store.  This makes them awkward for reading books from places like Pragmatic Programmer and Craft & Vision. So I wrote my own eBook reader that works the way I'd like it to. Alice monitors directories you choose for eBooks it understands. There is no need to "import" or "install" books. Alice just finds them where you tell it to look. You can look up words in the dictionary by selecting the word, Ctrl-clicking and selecting "Look up in Dictionary". It has a slider at the bottom so you can flip through a book quickly. It keeps track of where you left off when you close a book. It has search for both PDFs and ePubs. (Search is particularly useful for using Alice as a programming reference.)

You can find Alice on the Apple App Store.

Update 12-26-2012: I expected a lot of support issues related to Sandboxing but that hasn't happened. Looks like the "Getting Started" guide is working.

I have a request. If you like Alice, please write a nice review on the Apple App Store. There are only three reviews for Alice. Two are one star reviews, both of which are not valid. The other is a three star review, with some valid points, but still seems harsh.

If you have a bad experience with Alice, please send me email and let me help you rather than writing a nasty review. At least one of the negative reviews claims Alice is missing a feature that does exist. The user jumped to a conclusion without asking for help.

Update 12-22-2012: Version 2.0.0 is available.

I reversed the time order of updates so you don't have to scroll to the bottom to find the latest update.

The major change in this version, and the reason for the 2.0 version number, is Alice is now Sandboxed. I worked around most of the issues and removed a couple features that won't work in a Sandboxed app. What does this mean?

I lowered the price. Alice has been $10 for a long time. It is now $5 since I had to remove features.

Going forward any updates, that are not pure bug fixes (and that is determined by Apple), must be Sandboxed. Sandboxed applications have a very limited set of things they can do. One thing they can't do is send AppleEvents. That means Alice can't do "Show in Finder" since it works by sending an AppleEvent to the finder with the name of the file.

The biggest issue is, without permission, Alice can only look in it's own container on the file system. That is: ~/Library/Containers/com.baldmountain.AliceReader/Data. Since you don't store your eBooks there, you need to give Alice permission to look elsewhere. You must open the Preferences panel (Cmd-,), select the directories tab, and add some directories using the '+' button. Otherwise Alice will not find any eBooks. The technical issue is that Alice needs a security scoped URL to access any file outside of it's Sandbox Container. Using a File Open panel gives Alice permission to look for files in the opened directory.

Alice is targeted at 10.8 Mountain Lion. I tried hard, but Spotlight and security scoped URLs don't play well on 10.7 Lion. The Sandboxed version of Alice can't find books on Lion. You can still read eBooks using Alice; the Library and Storage lists won't list anything. If you are running an OS before Mountain Lion, don't update. This is another reason for lowering the price.

There is some good news in this update. I got a copy of AppCode from JetBrains. It's static analysis tools are really nice. There are many bug fixes, including lots of potential crashing bugs, that are fixed because of AppCode. Very nice tool.

For all those people who run into issues because Alice is Sandboxed, I'm sorry. I tried hard to make as much of Alice work in a Sandboxed environment as I could. It took four tries to get this version approved by the Apple App store. The rejection reasons, not a bug fix, 2 terms of service violations for talking about Sandboxing when Alice starts, and approval on the forth try. (The window talking about Sandboxing became the "Getting Started" window, which I prefer. I'm happy the previous versions were rejected. It made me think of a better experience for users.)


Update 11-18-2012: It's been a while since I've posted an update. I'm still trying to decide what to do about App Sandboxing.

I still use Alice every day and I have been working on the code. It took some time, but I have a version of Alice that works with the Sandbox. I haven't released it because it only works on Mountain Lion. On Lion Spotlight refuses to return any results. I am planning on releasing a Mountain Lion only version in the next day or two. Depending on how upset people get, I'll consider releasing a non-Apple App Store version that has the sandbox turned off. If anyone has suggestions for app distribution I'd be interested.

Update 4-21-2012: Version 1.1.5 is at the App store for review. Just a smal bug fix for a crash I ran into. (A Spotlight result returned a null when queried for a file.) I also changed PDF view to continuous scroll rather than page at a time. Hopefully everyone preferes it.

Big question: Apple is encouraging developers to adopt App Sandboxing. Sandboxing makes the OS more secure by limiting and app's access to APIs and not allowing access to other application's data. This is good. But it will break Alice's ability to list files automatically. I'll have to remove the Library and Storage lists. Alice will still be able to open ePub and PDF files, but you'll have to use the file open dialog or double click the file (if Alice is set as default for the file type.) Supporting App Sandboxing is not mandatory for Alice so I don't have to support it.

What do you think? Suport App Sandboxing and remove the Library and Storage lists or don't support App Sandboxing and keep the Library and Storage lists? Let me know.

Update 1-28-2012: Version 1.1.3 is out. It fixes a crashing bug in the French version due to bad format string. I also connected some menu items to functionality. I'm a bit embarrassed that I needed to make those changes. Sorry...

One other thing I snuck in was the ability to open items exported from iBooks Author. (.ibooks files.) The files are ePub files so Alice has no trouble opening them. But, and there is usually always a but, iBooks uses a TON of CSS extensions to create neat effects for the iBooks app. Alice uses WebKit to display the xhtml files in an ePub book. Since WebKit doesn't understand the iBooks CSS extensions, iBooks files don't render well. At some point I'll submit a feature request to Apple to have them update WebKit to understand ibooks CSS extensions.

Update 1-9-2012: Sigh, I forgot that not everyone is just like me. I also knew that the Spotlight change would cause issues. I just checked in a change to allow users to monitor the directories they want rather than ONLY the home directory. The default is still the user's home directory. I need to test to make sure this works. Once I'm sure this works fine, and I update Alice's Help, I'll release it.

Update 12-22-2011: Version 1.1.0 is available. This is a major change to how Alice finds files. It no longer scans the filesystem. Alice now uses Spotlight to find files in your Home directory. The major thing you'll notice is Alice starts MUCH faster than in the past. If you run into an issue where you know Alice should be finding files and it doesn't, try Alice's Help. It's most likely that ePub files have the wrong file type setting. Alice does skip files in the Library directory in your home directory. Otherwise Alice would find documentation and temporary files as well as Mail attachments, which is probably not what you want. I hope it works for you. Happy Holidays!

Update 12-4-11: Version 1.0.4 is out. Hopefully everyone got the update OK and I have resolved the threading issues.

I'm considering a new way of monitoring files. Rather than scanning user selected directories for ePub and pdf files, Alice would use Spotlight to find all ePub and pdf files in your home directory. It would skip files in the Library directory in your home directory otherwise it would display mail attachments and the like. Files you probably don't want to display. (I think.) I've implemented this on a branch and I think I like it better than the old scan directories approach. If nothing else, it is much faster. If you have a view one way or the other I'd love to hear it.

Update 11-19-2011: I was wrong. Version 1.0.3 was approved and is available. I'm still going to have to enable App Sandbox at some point which means I'll need to rethink the Library and file system monitoring. Alice may only be able to monitor your Documents directory and maybe your iTunes eBooks directory. I still think it is unlikely that Alice will ever be able to decode Apple's Fairplay encrypted eBooks. But I haven't asked yet, so we'll see.

Update 11-18.2011: There is an update to Alice in review at the Apple Store. Even though I've been quiet, I have been working. Odds are it will be rejected. I haven't turned on App Sandboxing for Alice. App Sandbox is a new feature in Lion to limit what an application can do. The benefit is that it will prevent malicious applications from accessing data they shouldn't be. The downside for Alice is that it will prevent Alice from being able to monitor any directory. I'm not sure what to do. Alice will continue, but may be limited to only monitoring your Documents directory and your iBooks eBook directory in the Music directory. I'll try for an exception to be able to monitor the default Dropbox directory if it exists.

Update 9-27-2011: Alice users may want to read this post on ePubs. It will show you how to read your iBooks books in Alice. (I think.)

Update 9-23-2011: 1.0.2 is out. Most of what I was going to put in 1.0.2 went out in 1.0.1. 1.0.1 was rejected by the app store because it found a PDF file in an application that was stored in the user's document directory. Apparently Alice shouldn't be poking around in other application's contents even if they are in the monitored directory. Well, no problem. 1.0.2 has search in ePub files. It's still an early version, but it seems to work fine. I have been writing code to enable a demo version of Alice. I'm not sure how to implement it properly. My current solution seems a little too "feeble".

Update 9-2-2011: Sigh, 1.0.1 has been in review since 8/23 and I'm close to releasing 1.0.2. 1.0.2 has the ability to set a placeholder mark so you can jump somewhere else to check something then jump back to where you were. Kind of like when you use a finger to keep your place and flip to look back in a paper book.

Update 8-27-2011: I added the code and strings to localize Alice. I used Google translate to convert strings from English to French. They are probably really off and I'm afraid to release it. - After fighting with getting French help to display I decided to have a look in the TextEdit bundle to see what Apple does. Alice does all the same thing. But I also see that TextEdit doesn't display French help either so my testing isn't valid. I'm going to assume it will work. :D I lifted some of the French strings from TextEdit so the translation should be less horrible.

Update 8-24-2011: Alice is approved and for sale on the Apple AppStore. Here's a link to Alice.

I got my first support request. I didn't think much about accessibility and one of Alice's users is blind. I'll have to fix that since it is important. The first update, coming soon, will have PDF searching enabled and at least some accessibility fixes..

Update 8-18-2011: After resolving an issue related to submitting to the AppStore I managed to get Alice uploaded. Should be available soon. Next up: Search. At least for PDF, then for ePub. I find I need it.

Update 10-6-2013: I haven't written about Alice in a long time. I have been working on Alice, and have made improvements over the last couple years. I just uploaded version 2.1.2. The last version I wrote about was 1.0.2 so things have changed a bit. 2.1.2 is a minor fix to handle the case where an ePub has a zero length file that claims a longer than zero compressed length. I found a file like this on Munsey. Kurt Vonnegut's 2BR02B to be specific. Actually, the ePub is kind of crap. I'll probably grab the text file from Project Gutenburg and make a decent one using Hatter. I have no idea why people suffer though these awful ePubs when there is a cheap tool to make a decent one.

Oh, and I almost forgot. The next version of MacOS X (Mavericks) comes with iBooks. This should pretty much kill off Alice...

Saturday, June 11, 2011

iBooks and eReaders



You know what I want? One decent eBook reader. I have five marginal ones. They all work. They all display books adequately. But they all display different eBook formats. I want one reader to display all my documents. And display them intelligently. That doesn’t mean it should be “book like”. I don’t need page flip animations or ribbon bookmarks. These are OK, but are not conceptually central to reading a book.

I have an original Kindle. I like it. But I can’t view PDFs or ePubs or Microsoft Word documents on it. The Kindle is an agreeable device for buying and reading books from Amazon.com. I want more.

I have the Nook, Kindle and Kobo readers on my laptop. With some searching you can discover where the Kindle keeps it’s files. New books can be added to the Kindle reader by copying their files to this secret location. This works most times. Occasionally you’ll get an error and be asked to remove the book from the reader. I deleted the Nook reader since it offers me nothing beyond what the Kindle offers. The Kobo reader was a little more reticent about it’s document storage location. Kobo seems to store it’s documents in an sqlite database making it impossible to add books unless you buy them from Barnes & Noble. Gone. I want more.

Let me describe what I want.

I was in the Apple store recently trying the iPad2 and specifically iBooks. iBooks has fancy page animations and ribbon bookmarks which left me, “Meh”. I noticed the progress indicator along the bottom had what looked like a slider “thumb”. I pressed and slid it back and forth and realized, “I can flip through the book!” eBook readers are good for reading text in a linear manner. They are good for novels. Technical books want to be read in a non-linear way. You need to be able to quickly jump around the book to discover what you need. iBooks on the iPad was the first reader to make that work. I want that.

I keep documents I want to lay my hands on easily in DropBox. These DropBox documents are available at home or in the office. DropBox is an enabler and I’m fond of it. But the only documents I can read from DropBox are PDFs. I can read my word processing documents, but only at home since the office uses a different word processor. Why can’t a eReader scan my Documents directory, and any other directory I select, and find documents it understands? When an eReader shows my “Library” I’d like it to include all of my library, not just the books it has been trained to. I want that.

My original Kindle will display definitions for words on a line I choose. When I stumble across a word I don’t know I can look it up easily. Wouldn’t it be nice it it had a Thesaurus too? I want that.

All the pieces are available. They just need to be connected. Maybe if I want that, I’ll have to build it myself.

Boston has a dull skyline, but walking along the Charles river is pleasant.