Characterizing Online Public Discussions through Patterns of Participant Interactions

New Paper at CSCW 2018 with Justine Zhang, Cristian Danescu-Niculescu-Mizil, and Christy Sauper [link to paper]

Facebook has an incredible number of public (between non-friends) discussions that can grow quite large (# participants and # comments). A hard problem we face is understanding which discussions are going well for users. Imagine doing that for dozens of languages and use cases.

The topic of this paper is a natural extension of my work with George Berry where we study whether seeing higher quality comments causes people to write higher quality comments (it does!). We had a very limited definition of “quality” in that paper that allowed us to focus on the social influence component.

The key idea in this new paper is to encode a discussion with multiple participants as a hypergraph. This is a generalization of earlier reply-tree and sequence models of discussions. We actually ignore text entirely and focus on interactions like reactions/replies.

From this hypergraph, we can extract a number of features and create a high-dimensional vector of discussion characteristics. An important idea that we employ is to count subgraph motifs that stylized types of interactions people can have in discussions.

This high-dimensional representation of discussion structure can be easily embedded in low dimensions. What’s wild is that we can aggregate to the page level, finding certain types of pages are clearly associated with certain discussion structures. (No text/topics are used here!)

Our discussion embeddings allow us to do some neat tricks, like characterizing whether a conversation “style” emerges from the topic itself or its initiator (the latter!). The discussion vectors are also excellent input features for classification tasks, like predicting blocks.

The takeaway here is that the patterns of interaction in comment threads provides incredibly useful signal about what kind of discussion is happening, both in a descriptive and predictive sense. Some dimensions also have clear human interpretations, which we discuss in the paper.

Our hypergraph representation of online public discussions is flexible, extensible, complementary to text representations, and (we think) is more agnostic to language/culture. We also provide code that allows you to apply this method to Reddit data.

Public discussions on social media platforms are an intrinsic part of online information consumption. Characterizing the diverse range of discussions that can arise is crucial for these platforms, as they may seek to organize and curate them. This paper introduces a computational framework to characterize public discussions, relying on a representation that captures a broad set of social patterns which emerge from the interactions between interlocutors, comments and audience reactions.We apply our framework to study public discussions on Facebook at two complementary scales. First, we use it to predict the eventual trajectory of individual discussions, anticipating future antisocial actions (such as participants blocking each other) and forecasting a discussion’s growth. Second, we systematically analyze the variation of discussions across thousands of Facebook sub-communities, revealing subtle differences (and unexpected similarities) in how people interact when discussing online content. We further show that this variation is driven more by participant tendencies than by the content triggering these discussions.