Computer-Assisted Understanding of Stance in Social Media
Formalizations, Data Creation, and Prediction Models
University of Duisburg-Essen
Michael Wojatzki completed his doctorate in January 2019 at the graduate school "User-Centred Social Media" at the University of Duisburg-Essen. Besides his academic career, Michael has worked for several companies in the information industry and software development. His research focuses on automatic, AI-based systems that recognise and analyse opinions shared in social media.
Social Media, Stance
Stance can be defined as positively or negatively evaluating persons, things, or ideas (Du Bois, 2007). Understanding the stance that people express through social media has several applications: It allows governments, companies, or other information seekers to gain insights into how people evaluate their ideas or products. Being aware of the stance of others also enables social media users to engage in discussions more efficiently, which may ultimately lead to better collective decisions.
Since the volume of social media posts is too large to be analyzed manually, computeraided methods for understanding stance are necessary. In this thesis, we study three major aspects of such computer-aided methods: (i) abstract formalizations of stance which we can quantify across multiple social media posts, (ii) the creation of suitable datasets that correspond to a certain formalization, and (iii) stance detection systems that can automatically assign stance labels to social media posts.
We examine four different formalizations that differ in how specific the insights and supported use-cases are: Stance on Single Targets defines stance as a tuple consisting of a single target (e.g. Atheism) and a polarity (e.g. being in favor of the target), Stance on Multiple Targets models a polarity expressed towards an overall target and several logically linked targets, and Stance on Nuanced Targets is defined as a stance towards all texts in a given dataset. Moreover, we study Hateful Stance, which models whether a post expresses hatefulness towards a single target (e.g. women or refugees).
Machine learning-based systems require training data that is annotated with stance labels. Since annotated data is not readily available for every formalization, we create our own datasets. On these datasets, we perform quantitative analyses, which provide insights into how reliable the data is, and into how social media users express stance. Our analysis shows that the reliability of datasets is affected by subjective interpretations and by the frequency with which targets occur. Additionally, we show that the perception of hatefulness correlates with the personal stance of the annotators. We conclude that stance annotations are, to a certain extent, subjective and that future attempts on data creation should account for this subjectivity. We present a novel process for creating datasets that contain subjective stances towards nuanced assertions and which provide comprehensive insights into debates on controversial issues.
To investigate the state-of-the-art of stance detection methods, we organized and participated in relevant shared tasks, and conducted experiments on our own datasets. Across all datasets, we find that comparatively simple methods yield a competitive performance. Furthermore, we find that neuronal approaches are competitive, but not clearly superior to more traditional approaches on text classification. We show that approaches based on judgment similarity – the degree to which texts are judged similarly by a large number of people – outperform reference approaches by a large margin. We conclude that judgment similarity is a promising direction to achieve improvements beyond the state-of-the-art in automatic stance detection and related tasks such as sentiment analysis or argument mining.
The full text of this dissertation is available on OpenD. Online and OpenAccess.Read it now
■doi: 10.17185/duepublico/48043 coisas
Um OpenD zu verbessern legen wir manchmal kleine Dateien – sogenannte Cookies – auf Ihrem Gerät ab. Das ist bei den meisten großen Websites üblich).