UW Interactive Data Lab
Papers
Jason Chuang, Christopher D. Manning, Jeffrey Heer
Tag cloud visualizations of an online biography of the pop singer Lady Gaga. (left) Single-word phrases (unigrams) weighted using G2. (right) Multiword phrases, including significant places and song titles, selected using our corpus-independent model.
Abstract
Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are often used to create effective visualizations of text. While prior work in HCI and visualization has proposed a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611 responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics of human-generated keyphrases, including phrase length, commonness, position, and part of speech. Next, we systematically assess the contribution of each feature within statistical models of keyphrase quality. We then introduce a method for grouping similar terms and varying the specificity of displayed phrases so that applications can select phrases dynamically based on the available screen space and current context of interaction. Precision-recall measures find that our technique generates keyphrases that match those selected by human judges. Crowdsourced ratings of tag cloud visualizations rank our approach above other automatic techniques. Finally, we discuss the role of HCI methods in developing new algorithmic techniques suitable for user-facing applications.
Materials
Citation
Jason Chuang, Christopher D. Manning, Jeffrey Heer
ACM Trans. on Computer-Human Interaction, 19(3), pp. 1-29, 2012