UW Interactive Data Lab
Papers
The iSeqL tool for interactive text sequence learning. A) Users label sampled instances; user-annotated entities are highlighted in yellow. B) Predictions from the current model are underlined to expedite annotation and convey model performance. The right-most panel contains evaluation aids: C) the count of labels that “flipped” in the last round, D) a model quality (F1) score against held out data; and E) an entity rank chart that shows the top predicted, labeled, and discovered entities.
Abstract
Exploratory analysis of unstructured text is a difficult task, particularly when defining and extracting domain-specific concepts. We present iSeqL, an interactive tool for the rapid construction of customized text mining models through sequence labeling. With iSeqL, analysts engage in an active learning loop, labeling text instances and iteratively assessing trained models by viewing model predictions in the context of both individual text instances and task-specific visualizations of the full dataset. To build suitable models with limited training data, iSeqL leverages transfer learning and pre-trained contextual word embeddings within a recurrent neural architecture. Through case studies and an online experiment, we demonstrate the use of iSeqL to quickly bootstrap models sufficiently accurate to perform in-depth exploratory analysis. With less than an hour of annotation effort, iSeqL users are able to generate stable outputs over custom extracted entities, including context-sensitive discovery of phrases that were never manually labeled.
Materials