Topic Models in R


5 Dec 2018, 9-12:30 | Oriental Room S204, University of Sydney

View the Project on GitHub Digital-Methods-Sydney/ws-201812

Topic models allow you to find topics and relationship among documents in large collections of texts. There is a (growing) number of approaches to explore and make statistical inferences about texts. This session will provide a hands-on introduction to text analysis and topic modelling using stm: An R package for Structural Topic Models. stm is a comprehensive and highly regarded package to prepare, model and visualise textual data. The session will be mainly practical but I will also provide a short theoretical introduction to topic models and present some applications of stm we can currently find in the literature. In this session, we will walk the length of a standard pipeline for textual analysis. We will explore different techniques and methods to import texts and associated metadata into R from a variety of sources such as PDFs, webpages, spreadsheets and APIs. We will prepare the data and clean the text using Hadley Wickham’s “tidy” approach. We will compute document term matrices and discuss the benefits of different weighting techniques. Finally, we will estimate, evaluate and visualise topic models to facilitate their interpretation and to communicate the result of the analysis.

Requirements for attendance

Suggested readings

Additional readings on text analysis

Data used in the workshop

See here.


9-10.30 Block 1: Preparing your text

10.30-11 Morning tea

11-12.30 Block 2: Model your text


Oriental Room S204, The Quadrangle, University of Sydney


Contacts +61 2 8627 6895