Latent Dirichlet Allocation
IntroductionLatent Dirichlet Allocation (LDA) is an unsupervised probabilistic analysis tool for topic modelling unstructured text data. It assumes documents to be made up by a certain proportion of topics and every topic is made up of various words. These assumed random values are priors.
- $\alpha$ and $\beta$ parameters for document-topic and word-topic spreads.
- Number of topics
- Number of iterations
- Topics and word constituents
ProsNo need for any external dictionary. The process is unsupervised thus, no need for labelled data. Good for big data analysis.
ConsInputs $\alpha$, $\beta$, number of topics and number of iterations need to be tuned to get better
Links and References(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational
Serrano -- Latent Dirichlet Allocation (Part 1 of 2)
Serrano -- Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)
Created: 30 Oct 2022
Last Modified: 30 Oct 2022