25 August 2014


Beyond active learning: Agile Text Mining

One of the most heated debate at the Strata 2012 conference was about whether domain expertise is more important than machine learning skills (see Mike Driscoll’s summary and full video from the debate). KD Nuggets even ran a poll on the subject

For me (and others like James Taylor) this is the wrong question to ask. It is like asking whether milk or flour is more important when baking pancakes.

Which led to the formulation of the Agile Text Mining methodology:

The successful development of an intelligent text mining application requires the collaboration of two main stakeholders: subject matter experts and text miners. In this paper, we describe a new methodology, agile text mining to improve that collaboration. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. We introduce Sherlok, a system supporting the development of agile text mining applications and present an application to extract mention of neurons from a very large corpus of scientific articles. The resulting code and models are publicly available at http://sherlok.io (full article, published in IEEE)

Go Top
comments powered by Disqus