Discursis

2022-11-10 554 words 3 minutes

Discursis is communication analytics technology that allows a user to analyse text based communication data, such as conversations, web forums and training scenarios. It uses natural language processing (NLP) algorithms to automatically process transcribed text to highlight participant interactions around specific topics and over the time-course of the conversation. Discursis can assist practitioners in understanding the structure, information content, and inter-speaker relationships that are present within input data. Discursis also provides quantitative measures of key metrics, such as topic introduction, topic consistency, and topic novelty.

The NLP algorithms are used to construct a matrix of concept similarity scores between the sections into which a text has been divided. In the typical use case for this tool, that of a discourse with several speakers, those sections will be speaker turns and the similarity matrix provides information about the extent to which any pair of turns share concepts. This information, along with the sequential nature of the interaction, makes it possible to track topics which are maintained, or dropped, or dropped and then picked up again. It is also possible to examine the extent to which speakers are sharing concepts. These possibilities have been used in analysing various kinds of interactions, including medical consultations (see references below).

Discursis also has tools for visualising the analysis, and you can see an example of this below. The data on which these graphics are based is a debate between Kevin Rudd and Tony Abbott held at the National Press Club on 11 August 2013. Figure 1 shows a visualisation of the whole debate.

Figure 2 zooms in on a section of the interaction. The boxes on the diagonal represent the speaker turns, and you can see in Figure 2 that hovering the cursor over a box causes the text of that turn to be visible.

The boxes back in the matrix represent the conceptual similarity between each pair of turns. A heavily populated column means that the topics in a turn were also in many following turns and a heavily populated row means that a turn shared topics with many preceding turns. Selecting a point of intersection in the matrix displays a similarity score for the turns, and the text of both turns is displayed below the main graphic (not shown here).

Discursis was developed by Dan Angus, Janet Wiles and Andrew Smith and has been reworked as an open source tool by staff of Sydney Informatics Hub. A version of the tool running in a Jupyter notebook is available in this Github repository.

References

Acknowledgments

This Jupyter notebook and relevant python scripts were developed by the Sydney Informatics Hub (SIH) in collaboration with the Sydney Corpus Lab under the Australian Text Analytics Platform program and the HASS Research Data Commons and Indigenous Research Capability Program. These projects received investment from the Australian Research Data Commons (ARDC), which is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

How to cite the notebook:

If you are using this notebook in your research, please include the following statement or an appropriate variation thereof:

This study has utilised a notebook/notebooks developed for the Australian Text Analytics Platform (https://www.atap.edu.au) available at (https://github.com/Australian-Text-Analytics-Platform/discursis).

In addition, please inform ATAP (info@atap.edu.au) of publications and grant applications deriving from the use of any ATAP notebooks in order to support continued funding and development of the platform.