Search CORE

255 research outputs found

Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

Author: Isaly Laura A.
Publication venue: AFIT Scholar
Publication date: 10/03/2010
Field of study

In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To compensate for this deficiency, we incorporate explicit word ontologies, such as WordNet, into the LDA algorithm to account for various semantic relationships. Experiments over the 20 Newsgroups, NIPS, OHSUMED, and IED document collections demonstrate that incorporating such knowledge improves perplexity measure over LDA alone for given parameters. In addition, the same ontology augmentation improves recall and precision results for user queries

AFTI Scholar (Air Force Institute of Technology)

Novel Event Detection and Classification for Historical Texts

Author: Sara Tonelli
Sprugnoli Rachele (ORCID:0000-0001-6861-5595)
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2019
Field of study

Event processing is an active area of research in the Natural Language Processing community but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts particularly in the current era of massive digitization of historical sources: research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorised into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry so far underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus and new models for automatic annotation

Archivio istituzionale della Ricerca - Università degli Studi di Parma

PubliCatt

Archivio della ricerca - Fondazione Bruno Kessler

Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

Author: Costa Celso Ricardo Martins Maia
Publication venue
Publication date: 01/01/2010
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

A Hierarchical Core Reference Ontology for New Technology Insertion Design in Long Life Cycle, Complex Mission Critical Systems

Author: Michael Kevin J.
Publication venue: ODU Digital Commons
Publication date: 01/04/2019
Field of study

Organizations, including government, commercial and others, face numerous challenges in maintaining and upgrading long life-cycle, complex, mission critical systems. Maintaining and upgrading these systems requires the insertion and integration of new technology to avoid obsolescence of hardware software, and human skills, to improve performance, to maintain and improve security, and to extend useful life. This is particularly true of information technology (IT) intensive systems. The lack of a coherent body of knowledge to organize new technology insertion theory and practice is a significant contributor to this difficulty. This research organized the existing design, technology road mapping, obsolescence, and sustainability literature into an ontology of theory and application as the foundation for a technology design and technology insertion design hierarchical core reference ontology and laid the foundation for body of knowledge that better integrates the new technology insertion problem into the technology design architecture

Old Dominion University

Integrating and conceptualizing heterogeneous ontologies on the web

Author: GOH HAI KIAT
Publication venue
Publication date: 21/12/2006
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Further with Knowledge Graphs:proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands

Author
Publication venue: 'IOS Press'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Further with Knowledge Graphs:proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands

Author
Publication venue: 'IOS Press'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Explaining ambiguity in scientific language

Author: Sterner Beckett
Publication venue
Publication date: 01/01/2022
Field of study

PhilPapers

Data sensitivity detection in chat interactions for privacy protection

Author: Gambarelli Gaia <1994>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 28/03/2023
Field of study

In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks

AMS Tesi di Dottorato