Topic Models in R

Topic models allow you to find topics and relationship among documents in large collections of texts. There is a (growing) number of approaches to explore and make statistical inferences about texts. This session will provide a hands-on introduction to text analysis and topic modelling using stm: An R package for Structural Topic Models. stm is a comprehensive and highly regarded package to prepare, model and visualise textual data. The session will be mainly practical but I will also provide a short theoretical introduction to topic models and present some applications of stm we can currently find in the literature. In this session, we will walk the length of a standard pipeline for textual analysis. We will explore different techniques and methods to import texts and associated metadata into R from a variety of sources such as PDFs, webpages, spreadsheets and APIs. We will prepare the data and clean the text using Hadley Wickham’s “tidy” approach. We will compute document term matrices and discuss the benefits of different weighting techniques. Finally, we will estimate, evaluate and visualise topic models to facilitate their interpretation and to communicate the result of the analysis.

Requirements for attendance

Bring your own laptop;
Preinstall R and RStudio;
Make sure you have installed all the packages listed here;
The day before the session, download the entire repository of the session materials from GitHub to have the most recent version of the scripts and data, uncompress the archive file ws-201812-master.zip and open ws-201812.Rproj with RStudio to load the project.

Additional readings on text analysis

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York, NY: Cambridge University Press. nlp.stanford.edu/IR-book

Data used in the workshop

See here.

Program

9-10.30 Block 1: Preparing your text

10.30-11 Morning tea

11-12.30 Block 2: Model your text

Location

Oriental Room S204, The Quadrangle, University of Sydney

venue

Contacts

francesco.bailo@sydney.edu.au

+61 2 8627 6895