30 research outputs found

    Towards improving WEBSOM with multi-word expressions

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaLarge quantities of free-text documents are usually rich in information and covers several topics. However, since their dimension is very large, searching and filtering data is an exhaustive task. A large text collection covers a set of topics where each topic is affiliated to a group of documents. This thesis presents a method for building a document map about the core contents covered in the collection. WEBSOM is an approach that combines document encoding methods and Self-Organising Maps (SOM) to generate a document map. However, this methodology has a weakness in the document encoding method because it uses single words to characterise documents. Single words tend to be ambiguous and semantically vague, so some documents can be incorrectly related. This thesis proposes a new document encoding method to improve the WEBSOM approach by using multi word expressions (MWEs) to describe documents. Previous research and ongoing experiments encourage us to use MWEs to characterise documents because these are semantically more accurate than single words and more descriptive

    Incremental context creation and its effects on semantic query precision

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-10543-2_19Proceedings of 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009 Graz, Austria, December 2-4, 2009We briefly describe the results of an experimental study on the incremental creation of context out of the results of targeted queries, and discuss the increase in retrieval precision that results from the incremental enrichment of context.This work was supported in part by Consejería de Educación, Comunidad Autónoma de Madrid, under the grant CCG08-UAM/TIC/4303, Búsqueda basada en contexto como alternativa semántica al modelo ontológico. Simone Santini was in part supported by the Ramón y Cajal initiative of the Ministero de educación y ciencia. Alexandra Dumitrescu was in part supported by the European Social Fund, Universidad Autónoma de Madrid

    Combining SOMs and Ontologies for Effective Web Site Mining

    Get PDF

    Using the reader's context to customize news streams

    Full text link
    This is an electronic version of the paper presented at the Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, held in A Coruña on 2011Many people today subscribe to streaming news services, and their number can be expected to grow together the number of web sites that offer a streaming service using protocols such as Real Simple Syndication. In this paper we present a method, and the relative algorithms, for ranking the news for relevance based on the working context of the reader. We use the contents of the reader’s computer as an indication of his or her interests, and build with them a suitable context representation. We then use it to filter the incoming stream of news.The authors were supported in part by the Ministerio de Educación y Ciencia under the grant N. MEC TIN2008-06566-C04-02, Information Retrieval on different media based on multidimensional models: relevance, novelty, personalization and context

    A visual analytics platform for competitive intelligence

    Get PDF
    Silva, D., & Bação, F. (2023). MapIntel: A visual analytics platform for competitive intelligence. Expert Systems, [e13445]. https://doi.org/https://www.authorea.com/doi/full/10.22541/au.166785335.50477185, https://doi.org/10.1111/exsy.13445 --- Funding Information: This work was supported by the (research grant under the DSAIPA/DS/0116/2019 project). Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino SuperiorCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbours to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modelling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups data set, using the semantic document labels provided, and demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.preprintauthorsversionepub_ahead_of_prin

    An oddly-positioned position paper on context and ontology

    Full text link
    Proceedings of the 2008 IEEE International Conference on Semantic Computing,This paper is a theoretical analysis of formal annotation and ontology for the expression of the semantics of document. They are found wanting in this respect, not only for technical reasons, but because they embody a fundamentally misunderstood model of the process of signification. The author proposes an alternative model in which the interpretation context plays a fundamental role, and briefly discuss it and its current technical embodiment

    Context as a non-ontological determinant of semantics

    Full text link
    The final publication is available at Springer via http://dx.doi.org/110.1007/978-3-540-92235-3_11Proceedings of Third International Conference on Semantic and Digital Media Technologies, SAMT 2008, Koblenz, Germany, December 3-5, 2008.This paper proposes an alternative to formal annotation for the representation of semantics. Drawing on the position of most of last century’s linguistics and interpretation theory, the article argues that meaning is not a property of a document, but an outcome of a contextualized and situated process of interpretation. The consequence of this position is that one should not quite try to represent the meaning of a document (the way formal annotation does), but the context of the activity of which search is part. We present some general considerations on the representation and use of the context, and a simple example of a technique to encode the context represented by the documents collected in the computer in which one is working, and to use them to direct search. We show preliminary results showing that even this rather simpleminded context representation can lead to considerable improvements with respect to commercial search engines

    Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    Get PDF
    In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To compensate for this deficiency, we incorporate explicit word ontologies, such as WordNet, into the LDA algorithm to account for various semantic relationships. Experiments over the 20 Newsgroups, NIPS, OHSUMED, and IED document collections demonstrate that incorporating such knowledge improves perplexity measure over LDA alone for given parameters. In addition, the same ontology augmentation improves recall and precision results for user queries
    corecore