30 research outputs found
Towards improving WEBSOM with multi-word expressions
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaLarge quantities of free-text documents are usually rich in information and covers
several topics. However, since their dimension is very large, searching and filtering data is an exhaustive task. A large text collection covers a set of topics where each topic is affiliated to a group of documents. This thesis presents a method for building a document map about the core contents covered in the collection.
WEBSOM is an approach that combines document encoding methods and Self-Organising Maps (SOM) to generate a document map. However, this methodology has a weakness in the document encoding method because it uses single words to characterise documents.
Single words tend to be ambiguous and semantically vague, so some documents can be incorrectly related. This thesis proposes a new document encoding method to improve the WEBSOM approach by using multi word expressions (MWEs) to describe documents. Previous research and ongoing experiments encourage us to use MWEs to characterise documents because these are semantically more accurate than single words and more descriptive
Incremental context creation and its effects on semantic query precision
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-10543-2_19Proceedings of 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009 Graz, Austria, December 2-4, 2009We briefly describe the results of an experimental study on the incremental creation of context out of the results of targeted queries, and discuss the increase in retrieval precision that results from the incremental enrichment of context.This work was supported in part by ConsejerÃa de Educación, Comunidad Autónoma de
Madrid, under the grant CCG08-UAM/TIC/4303, Búsqueda basada en contexto como alternativa
semántica al modelo ontológico. Simone Santini was in part supported by the Ramón
y Cajal initiative of the Ministero de educación y ciencia. Alexandra Dumitrescu was in part
supported by the European Social Fund, Universidad Autónoma de Madrid
Using the reader's context to customize news streams
This is an electronic version of the paper presented at the Jornadas de IngenierÃa del Software y Bases de Datos, JISBD 2011, held in A Coruña on 2011Many people today subscribe to streaming news services, and their
number can be expected to grow together the number of web sites that offer a
streaming service using protocols such as Real Simple Syndication.
In this paper we present a method, and the relative algorithms, for ranking the
news for relevance based on the working context of the reader. We use the contents
of the reader’s computer as an indication of his or her interests, and build with
them a suitable context representation. We then use it to filter the incoming stream
of news.The authors were supported in part by the Ministerio de Educación y Ciencia under
the grant N. MEC TIN2008-06566-C04-02, Information Retrieval on different media
based on multidimensional models: relevance, novelty, personalization and context
A visual analytics platform for competitive intelligence
Silva, D., & Bação, F. (2023). MapIntel: A visual analytics platform for competitive intelligence. Expert Systems, [e13445]. https://doi.org/https://www.authorea.com/doi/full/10.22541/au.166785335.50477185, https://doi.org/10.1111/exsy.13445 --- Funding Information: This work was supported by the (research grant under the DSAIPA/DS/0116/2019 project). Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino SuperiorCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbours to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modelling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups data set, using the semantic document labels provided, and demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.preprintauthorsversionepub_ahead_of_prin
An oddly-positioned position paper on context and ontology
Proceedings of the 2008 IEEE International Conference on Semantic Computing,This paper is a theoretical analysis of formal annotation and ontology for the expression of the semantics of document. They are found wanting in this respect, not only for technical reasons, but because they embody a fundamentally misunderstood model of the process of signification. The author proposes an alternative model in which the interpretation context plays a fundamental role, and briefly discuss it and its current technical embodiment
Context as a non-ontological determinant of semantics
The final publication is available at Springer via http://dx.doi.org/110.1007/978-3-540-92235-3_11Proceedings of Third International Conference on Semantic and Digital Media Technologies, SAMT 2008, Koblenz, Germany, December 3-5, 2008.This paper proposes an alternative to formal annotation for the representation of semantics. Drawing on the position of most of last century’s linguistics and interpretation theory, the article argues that meaning is not a property of a document, but an outcome of a contextualized and situated process of interpretation. The consequence of this position is that one should not quite try to represent the meaning of a document (the way formal annotation does), but the context of the activity of which search is part.
We present some general considerations on the representation and use of the context, and a simple example of a technique to encode the context represented by the documents collected in the computer in which one is working, and to use them to direct search. We show preliminary results showing that even this rather simpleminded context representation can lead to considerable improvements with respect to commercial search engines
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies
In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To compensate for this deficiency, we incorporate explicit word ontologies, such as WordNet, into the LDA algorithm to account for various semantic relationships. Experiments over the 20 Newsgroups, NIPS, OHSUMED, and IED document collections demonstrate that incorporating such knowledge improves perplexity measure over LDA alone for given parameters. In addition, the same ontology augmentation improves recall and precision results for user queries