Investigating Disinformation Through the Lens of Mass Media: A System Design

Author(s): Ekrem Kaya, Luke Palmieri, Gowri Prathap

Mentor(s): Hamdi Kavak, Computational and Data Sciences

Abstract
Mass media is a medium of communication with a significant impact on public opinion and perception of issues of global significance. This study is centered around developing a software system to detect and analyze disinformation efforts through mass media outlets and predict shifts in public opinion or reveal active campaigns. The developed system uses a multi- step process to analyze and reveal anti-American sentiment in any country of interest, particularly US allies. We used Turkey as a use case to test our system. Turkey is an important country because it holds a critical role within NATO as a US ally and has recently had significant shifts in anti-American views. We collected mass media articles from various Turkish media outlets. The articles were translated to English and stored in the system database. Roughly 3,500 articles are being published and added to the database each month. Using this system, we were able to conduct both exploratory and targeted analyses. For instance, an Iranian disseminated Turkish language newspaper was found to have the most negative Anti-American sentiment. Additionally, we retrieved and analyzed news articles related to Turkey’s S-400 missile purchase from Russia, proving our system has significant potential.
Audio Transcript
My name is Ekrem Kaya, and I am a freshman majoring in Computational and Data Sciences at George Mason University. The project we are presenting today was conducted by my partner and I with the help of Gowri Prathap, Alex Korb, and our mentors Saltuk Karahan and Hamdi Kavak through a joint effort of George Mason University, Old Dominion University, and Tidewater Community College.

Our project is titled Investigating Disinformation Through the Lens of Mass Media: A System, and the goal of this project is to create a software system to study disinformation efforts in mass media to address the national safety concerns arising from an increasing number of states using digital tools and media to distort facts and shape public perception in creating anti-American views.

Our main data set included two of the largest circulated newspapers in Turkey, Sabah and Hurriyet. Of Turkish media outlets, many have ties to the government and publish only under government supervision. Other sources included FARS, a direct mouthpiece of the Islamic Revolutionary Guard Corps. We were able to extract data from over 80,000 articles to store in a SQL database and translate to English.

EU S mass media dominated by these actors a problem with the identifying these changing views towards EU S is that traditional sentiment analysis is often poor identifying the opinions found in mass media which are less explicit than data found in tweets snd previous approaches have included neural models that also are trained on spam and tweet datasets that are poorly suited to this task

The proposed methodology we have outlined in our system sidesteps these concerns by using target based sentiment analysis the goal of tbsa is to identify a target within a sentence that can then be classified according to its sentiment polarity for example I love X but hate y n classify the polarity towards a target say X is a sentence being negative or positive towards X or why for example traditional sentiment methods like word based or other approaches that do not incorporate target based sentiment analysis or aspect level sentiment analysis would just classify the sentence as neutral most likely considering you have the words love and hate which sort of cancel out

so we have to the best of our knowledge been the first to apply the GRUTSC model that was developed specifically for mass media on a mass media data set then we applied this using temporal sentiment analysis for event and publisher and we have further outlined an approach that can be taken up by others using LDA or various other methods to extend our approach

The first part of our process involved data collection and management we stored our database in it we stored our data in a SQL database then we cleaned the text by removing links and artifacts from the extraction program we created in our usage we ended up collecting from several of the largest newspapers in Turkey including Sabah and hurriyet the largest circulation newspapers in Turkey then we also tokenize the sentences so that they could be used in our TBSA

the second part of our system is the target-based sentiment analysis the first step is to extract targets to compare by sentence. This could use named entity recognition to generate a list of to generate targets for example actors such as politicians states and non state organisations or alternately as we have done in our case study of S 400 missiles use keywords and then select all sentences that contain the keyword next the classification step we utilize the GRU-TSC model two generate the class polarity negative, neutral, or positive of a target within a sentence . In this case we used the USA as a target within a sentence (keywords were USA, US, America)

Then we can calculate the sentiment score which we took an average of positive or negative sent sentences times the class probability times their class probability out of overall number of sentences we also calculated the ratio of positive to negative sentences and the total negative and positively classified sentences each day.

Then because we tracked in our database the publisher and date of every single sentence it made it possible to calculate the sentiment score overtime and aggregate by publishers we also can conduct tbsa within different targeted events or topics. We used LDA on our main data set to find different topics but within our case study we simply used keywords to identify articles about S 400 missiles that were within our data set but any method of topic model modeling could be used to compare the change in sentiment within topic overtime this then allows us to get a wide survey of the mass media in particular in our case the Turkish mass media is opinion over US and how it changed by event and how the different publishers reacted.

We have presented a case study which used the S 400 missile controversy as an example topic the controversy between US NATO and Turkey over purchasing s-400 missiles from Russia which the US felt was dangerous and undermined NATO’s capabilities. We just used articles because this topic is of particular interest geopolitically it allowed us to get a feeling of the potential usage of our system when we were certain what the topic was about.

The results of our application showed on the main data set there was negative overall in sentiment although the mean was less negative negative then on the S 400 subsets of articles, also we found that and at the height of the controversy of the S 400 missiles when the US imposed sanctions against Turkey for their purchase of the missile system, sentiment was the most highly negative and peaked around that time as expected. The majority of sentences were negative towards the US. which was a finding of interest that afinn or traditional methods that are not correlated to our approach may have missed, which shows the usefulness of our approach in understanding views of the mass media towards the US in particular in that media ecosystem that is heavily dominated by government-run or influenced news

the results of the differing perception of publishers within the same subset of S 400 topic show the same extreme negative sentiment very different publishers with different ideological backgrounds all share this negative sentiment for the most part only newspapers like euronews and Reuters focused on reporting market news had sl slightly less negative ratios between positive and negative sentences it must be noted th t all of the outlets publishing in Turkish language face potential barriers and censorship from the government

Another benefit of our system is that it allows us to to further finally find granleigh analyze analyze the publisher sentiment overtime for different publishers this showed that even though sentiment was highly negative and it spiked around the time of the sanctions when you would expect the negative sentiment to be highest negative or sanctions against Turkey we found that the two publishers from Iran bars and I ido one of which forest has a direct connection to the Revolutionary Guard corps in Iran and it is also Iran end is idly logically motivated by Islamism both experienced more much more negative sentiment around 2019 before the sanctions against Turkey when Iran was trying to buy the S 400 missile system from Turkey from Russia rather our finding brought in the bronze the complex use of anti americanisms within Turkey and the Turkish language media ecosystem even though the Erdogan ryzom heavily censors mass media anti americanism is expressed by a wide variety of factors including being influenced by state actors like Iran inside of Turkey that can also spread misinformation and exacerbate negative sentiment towards actors like the US even if they have conflicting aims or used Asian americanism in different ways from say oppositional parties in Turkey or the Erdogan regime or the media influenced by the Erdogan regime we have also proven through this state case study that we have demonstrated our ability to track changes like these in the mass media across time in publisher based on the sentence level analysis

Further,
As of yet we have not seen this type of analysis or system applied anywhere in within international relations using the state of the art models for news and mass media that we have applied future work could use multiple targets and named entity extraction named entity recognition in order to create a larger mapping of perceptions of actors within the Turkish mass media or other environments

Limitations: eventually the model could be improved 2 further diversified the output and implied framings used within the media more to make it more multi-dimensional is what I mean to say as well as applying the methods to more topics and possibly even comparing those using measures to compare the changes within different topics overtime, more validation for specific dataset would also

Thank you for your time

2 replies on “Investigating Disinformation Through the Lens of Mass Media: A System Design”

Fascinating work, team. Some questions and feedback from a strategic comms person who trains presenters and whose PhD work is focusing on truth and bias in media. 1) from the presentation perspective…remember to deliver your findings in a conversational manner. I can tell you felt a time crunch based on the second half of your presentation. When that happens, all speakers need to edit what you share ruthlessly to focus on the main, main, main ideas. Your goal is not to tell them everything. But rather to tell your audience your method to uncover the most important findings and potential future work (helps get more funding 🙂 and leads them to read your paper which includes more details. Your slides can share details as well. When your presentation speed is fast, it’s harder to follow.
2) I’m wondering if your model could/does also map the sentiment and its effects from a social influencer perspective. Can you build a network visualization to identify actors and their reach? If so, I’d love to collaborate with you to look at this more within a domestic context to uncover truths and lies in media. LET’s COLLABORATE.

3) This is really cool work. You mentioned you hadn’t uncovered anyone else doing this type of thing. Anyone coming close? Where did you get your ideas for methodology?

Keep up the great work. Can’t recall if all of you are first years or just the first speaker. If so, can’t wait to see if/how you take this research farther. Congrats! ~ Tracy Mason, Asst. Dean College of Science tmason11@gmu.edu

Interesting work. I had never thought about how complicated it is to categorize social media posts as negative or positive. Thank you for sharing your work.

Leave a Reply