OSCAR Celebration of Student Scholarship and Impact
Categories
College of Engineering and Computing OSCAR

Comments to Communities: Modeling Online News-Related Discourse

Author(s): Reva Hirave

Mentor(s): Antonios Anastasopoulos, Computer Science

Abstract
This project presents the development of a lightweight, extensible tool designed to collect and aggregate user-generated commentary from online news platforms, including the New York Times, Fox News, and Reddit. By unifying disparate data formats into a standardized structure, this tool facilitates downstream tasks such as toxicity analysis, network modeling, and discourse comparison. To date, over 9,000 comments have been collected, and preliminary analysis using the Perspective API reveals cross-platform trends in toxicity and engagement. Additionally, early visualizations of user interaction networks explore the extent to which individuals engage outside their ideological or topical communities. Ultimately, this project aims to offer a command-line interface that enables customizable data harvesting across platforms, with filters for topic relevance, discussion tree structure, and other discourse features. This tool lays the groundwork for deeper investigations into online news discourse and community polarization.
Audio Transcript
Hi, I’m Reva, and this is Comments to Communities—a project about modeling how people talk about the news across online platforms.

The first rule of the internet: don’t forget the comments. Wreck-It Ralph wasn’t really wrong. Comment sections are chaotic, often toxic, and sometimes considered the worst parts of the internet. But they’re also where some of the most honest and unfiltered public discourse can happen. So instead of ignoring the comments, this project combs through thousands of them.

News sites and social media platforms host ideologically distinct communities. Commenters on The New York Times don’t necessarily talk anything like those on Fox News. Comments can reveal more than just interactions between strangers—they can reflect how online communities construct, challenge, or echo narratives around ongoing social issues.

But right now, existing tools for studying discourse fall short. They usually focus on one platform or treat comments as isolated utterances rather than parts of larger conversations. They also rarely combine toxicity metrics with network structure, which means they don’t fully capture complex social relationships.

This project addresses those gaps. These insights led us to two research questions:

How do different online communities talk about the news, and do they talk across ideological lines?

How can we measure the health of these discussions in ways that go beyond just likes or shares?

To answer these questions, we built a lightweight tool that scrapes and standardizes comments from three platforms: The New York Times, Fox News, and political subreddits. The tool unifies this data into a common format, adding metadata like reply structure and timestamps. This makes it easy to analyze both the content and the shape of these discussions—like who’s replying to whom, and how toxic the exchanges are.

So far, we’ve collected about 4,000 comments from The New York Times, another 4,000 from Reddit, and around 1,000 from Fox News. That’s just under 9,000 posts total, and while it’s not enough yet, it’s already yielded some compelling visualizations.

These are reply networks for Fox News, and you’ll also see one for The New York Times. Each node is a comment, and the size corresponds to the number of replies it received. Every edge—or line—represents a reply relationship. Nodes are color-coded by toxicity: green for less toxic and red for more toxic. These toxicity labels were generated using Google’s Perspective API.

These networks already hint at platform-specific dynamics and how polarized—or productive—these spaces can be. For example, if we look at the New York Times network, some circles are larger simply because they have more replies. One discussion concerns an op-ed about a former Kamala Harris skeptic, which is represented as the largest node in the middle. As expected, most comments have zero replies and cluster near the original post, but there are a few longer threads as well.

This opens up a range of further research questions, like: Do toxic comments produce toxic replies? Could we predict when a conversation will become toxic?

Why does any of this matter? Tools like this can help journalists, sociologists, and NLP researchers ask new kinds of questions—not just what people are saying, but how they’re saying it. If we want healthier discourse, we first need to understand how people talk. This project offers a step in that direction by making comment sections a little less mysterious and a lot more measurable.

This work builds on research from the 2024 Yellow Neck Workshop at Johns Hopkins and is supported by the OSCAR Program at George Mason University. I’d like to thank Dr. Antonis Anastasopoulos and the AI-Curated Democratic Discourse team from the JHU workshop.

Thank you so much for listening. Feel free to reach out if you want to explore the tool or just talk about the project. I’m available at arhavegu.edu.

Thank you again.

One reply on “Comments to Communities: Modeling Online News-Related Discourse”

Leave a Reply