The Meaning of the Parameters in Latent Dirichlet Allocation

A Zinger
2 min readNov 28, 2020

The Latent Dirichlet Allocation is a popular method for topic modelling. Nowadays it has been used in many areas not limited to natural language processing, but also in processing financial data and other areas. The original paper we refer to is Blei 2003¹. The implementation we use is Gensim LDAModel². Here is how we understood the link between Gensim LDAModel and the parameters in the original paper.

Figure 1. Graphical model representation of the smoothed LDA model.

Gensim LDAModel implements the smoothed LDA model as described in Blei 2003¹ Section 5.4. Figure 1 is the plate notation of the model (It was Figure 7 in the paper). What do these parameters mean?

In short words,

  • α: the prior for generating topic probabilities
  • θ: document topic probabilities
  • η: the prior for generating topic word probabilities
  • β: topic word probabilities
  • k: number of topics

On Gensim website, it specifies a few parameters, we list the important ones here.

  • num_topics (k): The number of requested latent topics to be extracted from the training corpus.
  • alpha (α): a-priori belief for each topics’ probability.
  • eta (η): a-priori belief on word probability.

Here are the functions to get probabilities.

  • get_document_topics(θ): get the topic distribution for the given document.
  • get_topics(β): get the term-topic matrix learned during inference.

Hope this helps.

[1] Blei, Ng, Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993-1022.

[2] GENSIM. models.ldamodel – Latent Dirichlet Allocation. https://radimrehurek.com/gensim/models/ldamodel.html.

--

--

A Zinger
0 Followers

A learner to share knowledge in a simple way.