Natural Language Annotation for Machine Learning by James Pustejovsky and Amber Stubbs; O’Reilly Media

Programming languages have a very strict syntax. When you see “I am a sentence I am another sentence,” you know that you’re really looking at two different sentences even though the period between “sentence” and “I” is missing. If you try something similar with the computer (try leaving the semi-colon off in C or miss an indent in Python, for example), you’ll get a nasty error message. This book1 aims to teach you how to program 2 your computer to work with the looser languages used by humans (like English) instead of the stricter counterparts used by machines.

The content available so far gives you a brief background on the relevant parts of language — grammar, pragmatics, discourse analysis, etc. The authors go on to talk about setting up an annotation project: determining your goal, creating your model/specification, and creating/storing your annotations in a flexible but easy to create (by annotators) manner.

Though a bit dry, the writing is clear and simple. I had no previous experience in this area, but I had no trouble understanding the subject matter for the most part.

Here are some of the notes I took while reading the book:

Disclaimer: I received this book for free through the O’Reilly Blogger program.


  1. You can buy it at O’Reilly Media
  2. Actually, it’s a little difficult to determine exactly what’s going to be covered in the book as we only have four chapters available so far. The book is scheduled to be released later in 2012. I’m reviewing the Early Release version. 

Also read...

Comments are closed.