Skip to Main Content (Press Enter)

Logo UNIPD
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze

UNI-FIND
Logo UNIPD

|

UNI-FIND

unipd.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze
  1. Pubblicazioni

A Modular Approach to Topic Modeling for Heterogeneous Documents

Contributo in Atti di convegno
Data di Pubblicazione:
2022
Abstract:
Topic Modeling algorithms help unveil the latent thematic structure from large document collections. Previous works showed that traditional approaches could be less effective when applied to short texts, e.g., tweets; however, that can be mitigated by assuming that each document is about a single topic, as done in Twitter-LDA. In this work, we relax this assumption and propose a new model where a document can be about single or multiple topics. Our model allows the generation of diverse types of descriptors from latent topics, e.g., words and hashtags, similarly to Hashtag-LDA. Moreover, words/hashtags can be generated from topics or a background/global distribution. The proposed model is modular, and our goal is to tailor it to collections that can be heterogeneous both in the presence of single or multiple-topic documents and in the adoption of diverse topic representations.
Tipologia CRIS:
04.01 - Contributo in atti di convegno
Keywords:
Heterogeneous Text Topic Modeling; Text Mining; Topic Modeling; Topic Modeling for Microblogs
Elenco autori:
Toto, G.; Di Buccio, E.
Autori di Ateneo:
DI BUCCIO EMANUELE
Link alla scheda completa:
https://www.research.unipd.it/handle/11577/3457706
Link al Full Text:
https://www.research.unipd.it//retrieve/handle/11577/3457706/1011575/paper1.pdf
Titolo del libro:
CEUR Workshop Proceedings
Pubblicato in:
CEUR WORKSHOP PROCEEDINGS
Journal
CEUR WORKSHOP PROCEEDINGS
Series
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0