Lucy H. Lin

Hello!

I am a research scientist at Spotify, where I work on platform safety and content understanding. More broadly, I'm interested in natural language processing methods that advance our understanding of social phenomena across media, policy, and technology.

I received my PhD in Computer Science & Engineering from the University of Washington, where I was advised by Noah Smith; my work was funded in part by a NSF Graduate Fellowship. Before grad school, I was a software engineer at Microsoft (Skype for Business Server), and before that, I graduated with a BSE in Computer Science and Certificate in Finance from Princeton University.

Research

Evaluation framework for understanding sensitive attribute association bias in latent factor recommendation algorithms

Lex Beattie, Isabel Corpus, Lucy H. Lin, Praveen Chandar

In submission.

| paper | abstract |

We present a novel evaluation framework for representation bias in latent factor recommendation (LFR) algorithms. Our framework introduces the concept of attribute association bias in recommendations allowing practitioners to explore how recommendation systems can introduce or amplify stakeholder representation harm. Attribute association bias (AAB) occurs when sensitive attributes become semantically captured or entangled in the trained recommendation latent space. This bias can result in the recommender reinforcing harmful stereotypes, which may result in downstream representation harms to system consumer and provider stakeholders. LFR models are at risk of experiencing AAB due to their ability to entangle explicit and implicit attributes into the trained latent space. Understanding this phenomenon is essential due to the increasingly common use of entity vectors as attributes in downstream components in hybrid industry recommendation systems.

We provide practitioners with a framework for executing disaggregated evaluations of AAB within broader algorithmic auditing frameworks. Inspired by research in natural language processing (NLP) observing gender bias in word embeddings, our framework introduces AAB evaluation methods specifically for recommendation entity vectors. We present four evaluation strategies for sensitive AAB in LFR models: attribute bias directions, attribute association bias metrics, classification for explaining bias, and latent space visualization. We demonstrate the utility of our framework by evaluating user gender AAB regarding podcast genres with an industry case study of a production-level DNN recommendation model. We uncover significant levels of user gender AAB when user gender is used and removed as a model feature during training, pointing to the potential for systematic bias in LFR model outputs.

Gender demographics of invited seminar speakers reflect gender disparities of faculty hosts

Rachel A. Hutto, Lisa Voelker, Jacob J. O'Connor, Lucy H. Lin, Natalia Mesa, Claire Rusch

In Disciplinary and Interdisciplinary Science Education Research (2022).

| paper | abstract |

Increasing access, representation, and retention of underrepresented groups is essential across academia. Invited speaker seminars are common practice in academic science departments and serve to disseminate research, establish connections and collaborations, advance faculty careers, and connect trainees to mentors outside of departmental faculty. Thus, lack of representation among seminar speakers can affect both faculty and trainee professional development. This study characterizes gender demographics of seminar speakers across science departments at an R1 institution for the years 2015–2019, using pronoun usage as a proxy for gender identity. We found that most faculty and invited speakers were male, and few were female or nonbinary. The percentage of female and nonbinary invited speakers increased from 2015–2019 along with the percentage of female and nonbinary host faculty. Overall, male faculty hosted fewer female and nonbinary speakers than their female and nonbinary faculty colleagues. This study provides evidence for a correlation between faculty identity and the scientists they host at their department and motivates further studies investigating this relationship at other R1 institutions and institution types.

Semantic comparisons for natural language processing applications

Lucy H. Lin

PhD thesis (2021), advised by Noah A. Smith.

| paper | abstract |

For social scientists and other data practitioners, the abundance of available digital text data is a rich potential source for understanding social phenomena. As a result, practitioners have increasingly used text analysis methods on relevant corpora to help answer their substantive research questions; common abstractions for these analyses include text classification, topic modeling, and fixed keyword matching. While these tools are powerful, they impose strong assumptions about the structure of human language (e.g., documents as bags of words), and as a result limit the kinds of inferences that practitioners can draw from corpora. On the flip side, richer models trained on large corpora provided by the natural language processing community do not necessarily transfer to the needs of practitioners' applications.

In this work, we propose semantic comparison as another lens for studying social phenomena in text data. We introduce two novel applications of semantic comparison methods for which standard abstractions are insufficient. First, we demonstrate the utility of finding semantic matches of a query sentence in a broader corpus through two case studies: community recovery after the 2010-2011 Christchurch, New Zealand earthquake sequence, as expressed in local news text; and policy attitudes in the United States Congress across 2000-2013, as expressed in archived websites from the .gov domain. We discuss model selection and end-user challenges involved, and introduce a procedure (nearest neighbor overlap) to compare sentence embedder behavior in the context of a corpus.

Second, we discuss sensationalism in medical journalism and the possible utility of NLP -- particularly semantic comparison -- in identifying sensationalized text. We survey past studies across communications, medicine, and psychology to illustrate the complexity of how and why sensationalism manifests in the health communications pipeline. In doing so, we critique the common NLP setup of attempting to label social phenomena in text with high accuracy and provide recommendations for developing user-facing NLP systems that seek to identify or reduce the occurrence of sensationalism.

Parsing with multilingual BERT, a small corpus, and a small treebank

Ethan C. Chau, Lucy H. Lin, Noah A. Smith

In Findings of EMNLP (2020); presented at SIGTYP (2020).

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled and unlabeled data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.

Situating sentence embedders with nearest neighbor overlap

Lucy H. Lin, Noah A. Smith

Preprint; presented at Text As Data (2019).

| paper | abstract | slides |

As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. Though applicable to embedders of texts of any size, we focus on sentence embedders and use N2O to show the effects of different design choices and architectures.

Improving natural language inference with a pretrained parser

Deric Pang, Lucy H. Lin, Noah A. Smith

Preprint (2019).

| paper | abstract | code |

We introduce a novel approach for incorporating syntax into natural language inference (NLI) models. Our method uses contextual token-level vector representations from a pretrained dependency parser. Like other contextual embedders, our method is broadly applicable to any neural model. We experiment with four strong NLI models (decomposable attention model, ESIM, BERT, and MT-DNN), and show consistent benefit to accuracy across three NLI benchmarks.

The partisan dimensions of religious rhetoric: Merging qualitative and natural language processing approaches to measure Congressional behavior

Sarah K. Dreier, Lucy H. Lin, Sofia Serrano, Emily K. Gade, Noah A. Smith

Working draft; presented at Text As Data (2018) & Conference on Politics and Computational Social Science (2019).

Natural language processing for analyzing disaster recovery trends expressed in large text corpora

Lucy H. Lin, Scott B. Miles, Noah A. Smith

In IEEE Global Humanitarian Technology Conference (2018).

| paper | abstract | slides |

We are developing a new natural language processing (NLP) method to facilitate analysis of text corpora that describe long-term recovery. The aim of the method is to allow users to measure the degree that user-specified propositions about potential issues are embodied within the corpora, serving as a proxy for the disaster recovery process. The presented method employs a statistical syntax-based semantic matching model and was trained on a standard, publicly available training dataset. We applied the NLP method to a news story corpus that describes the recovery of Christchurch, New Zealand after the 2010-2011 Canterbury earthquake sequence. We used the model to compute semantic measurements of multiple potential recovery issues as expressed in the Christchurch news corpus that span 2011 to 2016. We evaluated method outputs through a user study involving twenty professional emergency managers. User study results show that the model can be effective when applied to a disaster-related news corpus. 85% of study participants were interested in a way to measure recovery issue propositions in news or other corpora. We are encouraged by the potential for future applications of our NLP method for after-action learning, recovery decision making, and disaster research.

Semantic matching against a corpus: New applications and methods

Lucy H. Lin, Scott B. Miles, Noah A. Smith

Preprint; presented at NW-NLP (2018).

| paper | abstract | slides |

We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.

PolitiFact language audit

Dallas Card, Lucy H. Lin, Noah A. Smith

Technical report for PolitiFact (2018).

| paper | abstract |

We report on attempts to use currently available automated text analysis tools to identify possible biased treatment by Politifact of Democratic vs. Republican speakers, through language. We begin by noting that there is no established method for detecting such differences, and indeed that "bias" is complicated and difficult to operationalize into a measurable quantity. This report includes several analyses that are representative of the tools available from natural language processing at this writing. In each case, we offer (i) what we would expect to see in the results if the method picked up on differential treatment between Democrats vs. Republicans, (ii) what we actually observe, and (iii) potential problems with the analysis; in some cases we also suggest (iv) future analyses that might be more revelatory.

Cascading failures in financial networks

Lucy H. Lin

Undergraduate senior thesis (2012), advised by Andrea LaPaugh.

Contact & Links

lucyl [at] spotify [dot] com

she/her/hers

Google Scholar

Semantic Scholar

CV on request