Latent Dirichlet Allocation

Introduction
Input
Output
Pros
Cons
Links and References

Introduction

Latent Dirichlet Allocation (LDA) is an unsupervised probabilistic analysis tool for topic modelling unstructured text data. It assumes documents to be made up by a certain proportion of topics and every topic is made up of various words. These assumed random values are priors.

Input

$\alpha$ and $\beta$ parameters for document-topic and word-topic spreads.
Number of topics
Number of iterations

Output

Topics and word constituents

Pros

No need for any external dictionary. The process is unsupervised thus, no need for labelled data. Good for big data analysis.

Cons

Inputs $\alpha$, $\beta$, number of topics and number of iterations need to be tuned to get better

Links and References

(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational
Serrano -- Latent Dirichlet Allocation (Part 1 of 2)
Serrano -- Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)

Created: 30 Oct 2022
Last Modified: 30 Oct 2022