Data di Pubblicazione:
2022
Abstract:
Topic Modeling algorithms help unveil the latent thematic structure from large document collections. Previous works showed that traditional approaches could be less effective when applied to short texts, e.g., tweets; however, that can be mitigated by assuming that each document is about a single topic, as done in Twitter-LDA. In this work, we relax this assumption and propose a new model where a document can be about single or multiple topics. Our model allows the generation of diverse types of descriptors from latent topics, e.g., words and hashtags, similarly to Hashtag-LDA. Moreover, words/hashtags can be generated from topics or a background/global distribution. The proposed model is modular, and our goal is to tailor it to collections that can be heterogeneous both in the presence of single or multiple-topic documents and in the adoption of diverse topic representations.
Tipologia CRIS:
04.01 - Contributo in atti di convegno
Keywords:
Heterogeneous Text Topic Modeling; Text Mining; Topic Modeling; Topic Modeling for Microblogs
Elenco autori:
Toto, G.; Di Buccio, E.
Link alla scheda completa:
Titolo del libro:
CEUR Workshop Proceedings
Pubblicato in: