Latent Dirichlet Allocation

Introduction

Latent Dirichlet Allocation (LDA) is an unsupervised probabilistic analysis tool for topic modelling unstructured text data. It assumes documents to be made up by a certain proportion of topics and every topic is made up of various words. These assumed random values are priors.

Input

Output

Pros

No need for any external dictionary. The process is unsupervised thus, no need for labelled data. Good for big data analysis.

Cons

Inputs $\alpha$, $\beta$, number of topics and number of iterations need to be tuned to get better
(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational
Serrano -- Latent Dirichlet Allocation (Part 1 of 2)
Serrano -- Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)

Created: 30 Oct 2022
Last Modified: 30 Oct 2022