Natural Language Annotation for Machine Learning by James Pustejovsky and Amber Stubbs; O’Reilly Media
Programming languages have a very strict syntax. When you see “I am a sentence I am another sentence,” you know that you’re really looking at two different sentences even though the period between “sentence” and “I” is missing. If you try something similar with the computer (try leaving the semi-colon off in C or miss an indent in Python, for example), you’ll get a nasty error message. This book1 aims to teach you how to program 2 your computer to work with the looser languages used by humans (like English) instead of the stricter counterparts used by machines.
The content available so far gives you a brief background on the relevant parts of language — grammar, pragmatics, discourse analysis, etc. The authors go on to talk about setting up an annotation project: determining your goal, creating your model/specification, and creating/storing your annotations in a flexible but easy to create (by annotators) manner.
Though a bit dry, the writing is clear and simple. I had no previous experience in this area, but I had no trouble understanding the subject matter for the most part.
Here are some of the notes I took while reading the book:
Disclaimer: I received this book for free through the O’Reilly Blogger program.