Hello!
I am a research scientist at Spotify. I'm broadly interested in natural language processing methods for advancing our understanding of social phenomena across media, policy, and technology.
I received my PhD in Computer Science & Engineering from the University of Washington, where I was advised by Noah Smith; my work was funded in part by a NSF Graduate Fellowship. Before grad school, I was a software engineer at Microsoft (Skype for Business Server), and before that, I graduated with a BSE in Computer Science and Certificate in Finance from Princeton University.
Increasing access, representation, and retention of underrepresented groups is essential across academia. Invited speaker seminars are common practice in academic science departments and serve to disseminate research, establish connections and collaborations, advance faculty careers, and connect trainees to mentors outside of departmental faculty. Thus, lack of representation among seminar speakers can affect both faculty and trainee professional development. This study characterizes gender demographics of seminar speakers across science departments at an R1 institution for the years 2015–2019, using pronoun usage as a proxy for gender identity. We found that most faculty and invited speakers were male, and few were female or nonbinary. The percentage of female and nonbinary invited speakers increased from 2015–2019 along with the percentage of female and nonbinary host faculty. Overall, male faculty hosted fewer female and nonbinary speakers than their female and nonbinary faculty colleagues. This study provides evidence for a correlation between faculty identity and the scientists they host at their department and motivates further studies investigating this relationship at other R1 institutions and institution types.
For social scientists and other data practitioners, the abundance of available digital text data is a rich potential source for understanding social phenomena. As a result, practitioners have increasingly used text analysis methods on relevant corpora to help answer their substantive research questions; common abstractions for these analyses include text classification, topic modeling, and fixed keyword matching. While these tools are powerful, they impose strong assumptions about the structure of human language (e.g., documents as bags of words), and as a result limit the kinds of inferences that practitioners can draw from corpora. On the flip side, richer models trained on large corpora provided by the natural language processing community do not necessarily transfer to the needs of practitioners' applications.
In this work, we propose semantic comparison as another lens for studying social phenomena in text data. We introduce two novel applications of semantic comparison methods for which standard abstractions are insufficient. First, we demonstrate the utility of finding semantic matches of a query sentence in a broader corpus through two case studies: community recovery after the 2010-2011 Christchurch, New Zealand earthquake sequence, as expressed in local news text; and policy attitudes in the United States Congress across 2000-2013, as expressed in archived websites from the .gov domain. We discuss model selection and end-user challenges involved, and introduce a procedure (nearest neighbor overlap) to compare sentence embedder behavior in the context of a corpus.
Second, we discuss sensationalism in medical journalism and the possible utility of NLP -- particularly semantic comparison -- in identifying sensationalized text. We survey past studies across communications, medicine, and psychology to illustrate the complexity of how and why sensationalism manifests in the health communications pipeline. In doing so, we critique the common NLP setup of attempting to label social phenomena in text with high accuracy and provide recommendations for developing user-facing NLP systems that seek to identify or reduce the occurrence of sensationalism.
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled and unlabeled data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.
As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.
We introduce a novel approach for incorporating syntax into natural language inference (NLI) models. Our method uses contextual token-level vector representations from a pretrained dependency parser. Like other contextual embedders, our method is broadly applicable to any neural model. We experiment with four strong NLI models (decomposable attention model, ESIM, BERT, and MT-DNN), and show consistent benefit to accuracy across three NLI benchmarks.
We are developing a new natural language processing (NLP) method to facilitate analysis of text corpora that describe long-term recovery. The aim of the method is to allow users to measure the degree that user-specified propositions about potential issues are embodied within the corpora, serving as a proxy for the disaster recovery process. The presented method employs a statistical syntax-based semantic matching model and was trained on a standard, publicly available training dataset. We applied the NLP method to a news story corpus that describes the recovery of Christchurch, New Zealand after the 2010-2011 Canterbury earthquake sequence. We used the model to compute semantic measurements of multiple potential recovery issues as expressed in the Christchurch news corpus that span 2011 to 2016. We evaluated method outputs through a user study involving twenty professional emergency managers. User study results show that the model can be effective when applied to a disaster-related news corpus. 85% of study participants were interested in a way to measure recovery issue propositions in news or other corpora. We are encouraged by the potential for future applications of our NLP method for after-action learning, recovery decision making, and disaster research.
We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.
We report on attempts to use currently available automated text analysis tools to identify possible biased treatment by Politifact of Democratic vs. Republican speakers, through language. We begin by noting that there is no established method for detecting such differences, and indeed that "bias" is complicated and difficult to operationalize into a measurable quantity. This report includes several analyses that are representative of the tools available from natural language processing at this writing. In each case, we offer (i) what we would expect to see in the results if the method picked up on differential treatment between Democrats vs. Republicans, (ii) what we actually observe, and (iii) potential problems with the analysis; in some cases we also suggest (iv) future analyses that might be more revelatory.