Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications
<div><p>Create your own natural language training corpus for machine learning. Whether you’re working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycle—the process of adding metadata to your training corpus to help ML algorithms work more efficiently. You don’t need any programming or linguistics experience to get started.</p><p>Using detailed examples at every step, you’ll learn how the <i>MATTER Annotation Development Process</i> helps you <b>M</b>odel, <b>A</b>nnotate, <b>T</b>rain, <b>T</b>est, <b>E</b>valuate, and <b>R</b>evise your training corpus. You also get a complete walkthrough of a real-world annotation project.</p><ul><li>Define a clear annotation goal before collecting your dataset (corpus)</li><li>Learn tools for analyzing the linguistic content of your corpus</li><li>Build a model and specification for your annotation project</li><li>Examine the different annotation formats, from basic XML to the Linguistic Annotation Framework</li><li>Create a gold standard corpus that can be used to train and test ML algorithms</li><li>Select the ML algorithms that will process your annotated data</li><li>Evaluate the test results and revise your annotation task</li><li>Learn how to use lightweight software for annotating texts and adjudicating the annotations</li></ul><p>This book is a perfect companion to O’Reilly’s <i>Natural Language Processing with Python</i>.</p></div>