Back To Top

Training computers to tease out the subtext behind the text

Computer scientists use machine learning to connect real-world events with text on social media and in news articles   

It is hard enough for humans to interpret the deeper meaning and context of social media and news articles. Asking computers to do it is a nearly impossible task. Even C-3PO, fluent in over 6 million forms of communication, misses the subtext much of the time.

Natural language processing, the subfield of artificial intelligence connecting computers with human languages, uses statistical methods to analyze language, often without incorporating the real-world context needed for understanding the shifts and currents of human society. To do that, you have to translate online communication, and the context from which it emerges, into something the computers can parse and reason over.

Dan Goldwasser, associate professor of computer science in the College of Science at Purdue University, and other members of his team strive to address that by developing new ways to model human language and allow computers to better understand us.

“The motivation of our work is to get a better understanding of public discourse, how different issues are discussed, the arguments made and the perspectives underlying these arguments,” Goldwasser said. “We would like to represent the points of view expressed by the thousands, or even more, of people describing their experiences online. Understanding the language used to discuss issues can help shed light on the different considerations behind decision-making processes, including both individual health and well-being choices and broader policy decisions.”

Goldwasser emphasizes that part of the challenge is that so much of online communication relies on readers already knowing the context – whether it’s shorthand on Twitter or the basis of understanding a meme. To analyze the communication, the context is a vital part of the message.

In many of the scenarios we study, progress relies on finding new ways to conceptualize language understanding, by grounding it in a real-world context. Operationalizing it requires developing new technical solutions.

dan goldwasser

Goldwasser and his students use techniques distilled from the combined wisdom of computer science, artificial intelligence and computational social science.

Goldwasser’s lab studies the language used on social media, traditional media stories and in legislative texts to understand the context and assumptions of the speakers and writers. In a world where the written word is flourishing and every person with an internet connection can act as a journalist, being able to study and analyze that writing in an unbiased manner is crucial to human understanding of our own society.

Goldwasser is an expert in using machine learning to analyze natural language and can comment on:

* The context of politics in social media and news media.

* How the framing of messages and issues in laws, news stories, and online affect real-world behavior.

* Modeling human mental states and analyzing Twitter user’s lifestyle choices.

Social media usage by U.S. politicians on two politically divisive issues: gun control and immigration.


ABSTRACT

Understanding Politics via Contextualized Discourse Processing

Rajkumar Pujari, Dan Goldwasser

Presented at 2021 conference on Empirical Methods in Natural Language Processing

Politicians often have underlying agendas when reacting to events. Arguments in contexts of various events reflect a fairly consistent set of agendas for a given entity. In spite of recent advances in Pretrained Language Models, those text representations are not designed to capture such nuanced patterns. In this paper, we propose a Compositional Reader model consisting of encoder and composer modules that captures and leverages such information to generate more effective representations for entities, issues, and events. These representations are contextualized by tweets, press releases, issues, news articles and participating entities. Our model processes several documents at once and generates composed representations for multiple entities over several issues or events. Via qualitative and quantitative empirical analysis, we show that these representations are meaningful and effective.