Discourse Network Analysis: Undertaking Literature Reviews in R

Literature reviews are the cornerstone of science. Keeping abreast of developments within any given field of enquiry has become increasingly difficult given the enormous amounts of new research. Databases and search technology have made finding relevant literature easy but, keeping a coherent overview of the discourse within a field of enquiry is an ever more encompassing task.

Scholars have proposed many approaches to analysing literature, which can be placed along a continuum from traditional narrative methods to systematic analytic syntheses of text using machine learning. Traditional reviews are biased because they rely entirely on the interpretation of the researcher. Analytical approaches follow a process that is more like scientific experimentation. These systematic methods are reproducible in the way literature is searched and collated but still rely on subjective interpretation.

Machine learning provides new methods to analyse large swaths of text. Although these methods sound exciting, these methods are incapable of providing insight. Machine learning cannot interpret a text; it can only summarise and structure a corpus. Machine learning still requires human interpretation to make sense of the information.

This article introduces a mixed-method technique for reviewing literature, combining qualitative and quantitative methods. I used this method to analyse literature published by the International Water Association as part of my dissertation into water utility marketing. You can read the code below, or download it from GitHub. Detailed infromation about the methodology is available through FigShare.

A literature review with RQDA

The purpose of this review was to ascertain the relevance of marketing theory to the discourse of literature in water management. This analysis uses a sample of 244 journal abstracts, each of which was coded with the RQDA library. This library provides functionality for qualitative data analysis. RQDA provides a graphical user interface to mark sections of text and assign them to a code, as shown below.

You can load a corpus of text into RQDA and mark each of the texts with a series of codes. The texts and the codes are stored in an SQLite database, which can be easily queried for further analysis.

I used a marketing dictionary to assess the abstracts from journals published by the International Water Association from the perspective of marketing. This phase resulted in a database with 244 abstracts and their associated coding.

Discourse Network Analysis

Once all abstracts are coded, we can start analysing the internal structure of the IWA literature. First, let’s have a look at the occurrence of the topics identified for the corpus of abstracts.

The first lines in this snippet call the tidyverse and RQDA libraries and open the abstracts database. The

function provides a data frame with each of the marked topics and their location.  This function allows us to visualise the occurrence of the topics in the literature.