Latent Dirichlet Allocation
Introduction
Latent Dirichlet Allocation (LDA) is an unsupervised probabilistic analysis tool for topic modelling unstructured text data. It assumes documents to be made up by a certain proportion of topics and every topic is made up of various words. These assumed random values are priors.Input
- $\alpha$ and $\beta$ parameters for document-topic and word-topic spreads.
- Number of topics
- Number of iterations
Output
- Topics and word constituents
Pros
No need for any external dictionary. The process is unsupervised thus, no need for labelled data. Good for big data analysis.Cons
Inputs $\alpha$, $\beta$, number of topics and number of iterations need to be tuned to get betterLinks and References
(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC FoundationalSerrano -- Latent Dirichlet Allocation (Part 1 of 2)
Serrano -- Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)
Created: 30 Oct 2022
Last Modified: 30 Oct 2022