    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

    A Modular Approach to Topic Modeling for Heterogeneous Documents

    Topic Modeling algorithms help unveil the latent thematic structure from large document collections. Previous works showed that traditional approaches could be less effective when applied to short texts, e.g., tweets; however, that can be mitigated by assuming that each document is about a single topic, as done in Twitter-LDA. In this work, we relax this assumption and propose a new model where a document can be about single or multiple topics. Our model allows the generation of diverse types of descriptors from latent topics, e.g., words and hashtags, similarly to Hashtag-LDA. Moreover, words/hashtags can be generated from topics or a background/global distribution. The proposed model is modular, and our goal is to tailor it to collections that can be heterogeneous both in the presence of single or multiple-topic documents and in the adoption of diverse topic representations

    Sequential modeling in vector space

    In Information Retrieval and Natural Language Processing, representation of discrete objects, e.g., words, usually relies on embedding in vector space; this representation typically ignores sequential information. One instance of such sequential information is temporal evolution. For example, when discrete objects are words, their meaning may smoothly change over time. For this reason, previous works proposed dynamic word embeddings to model this sequential information in word representation explicitly. This paper introduces a representation that relies on sinusoidal functions to capture the sequential order of discrete objects in vector space

    University of Padova @ DIACR-Ita

    Semantic change detection task in a rel atively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance met rics like average Euclidean Distance, av erage Canberra distance, Hausdorff dis tance, as well as Jensen Shannon diver gence between cluster distributions based on K-means clustering and Gaussian mix ture model are used. The final predic-tion is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better perfor-mance than a frequency and collocation based baselines

    Efficient Parameter Estimation for Information Retrieval Using Black-Box Optimization

    Interactive analysis and exploration of experimental evaluation results

    This paper proposes a methodology based on discounted cumulated gain measures and visual analytics techniques in order to improve the analysis and understanding of IR experimental evaluation results. The proposed methodology is geared to favour a natural and effective interaction of the researchers and developers with the experimental data and it is demonstrated by developing an innovative application based on Apple iPad. Copyright © 2011 for the individual papers by the papers' authors