Latent Dirichlet Reallocation for Term Swapping

TR Image

This paper is an extended abstract of a work in progress, which proposes latent Dirichlet reallocation (LDR), a probabilistic model for text data from different dialects over a shared vocabulary. LDR first uses a topic model to allocate word probabilities to vocabulary terms; it then uses a subtopic model to allow for a possible reallocation of probability between a few potentially swappable terms between dialects. An MCMC inference procedure is derived, combining Gibbs sampling with Hamiltonian Monte-Carlo. Finally, we demonstrate the ability of LDR to correctly switch the probabilities for swappable terms under the subtopics using a toy example.


  • Related News & Events