Search CORE

17,158 research outputs found

Towards Building a Knowledge Base of Monetary Transactions from a News Collection

Author: Balog Krisztian
Benetka Jan R.
Nørvåg Kjetil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/09/2017
Field of study

We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

arXiv.org e-Print Archive

Crossref

Applying semantic web technologies to knowledge sharing in aerospace engineering

Author: A. Arasu
A. Chakravarthy
A.-S. Dadzie
A.H.F. Laender
B. Rosenfeld
C. Manning
C. Preisach
D. Petrelli
F. Ciravegna
J. Broekstra
J. Hendler
J. Iria
J. Magalhães
J. Magalhães
J. Xu
M.R. Naphade
R. Bhagdev
S. Chapman
S. Gupta
V. Lanfranchi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/08/2009
Field of study

This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

CiteSeerX

Crossref

White Rose Research Online

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online

Constructing Ontology-Based Cancer Treatment Decision Support System with Case-Based Reasoning

Author: Colloc Joël
Guo Ziyi
Jacquet-Andrieu Armelle
Liu Yong
Shen Ying
Publication venue
Publication date: 18/01/2018
Field of study

Decision support is a probabilistic and quantitative method designed for modeling problems in situations with ambiguity. Computer technology can be employed to provide clinical decision support and treatment recommendations. The problem of natural language applications is that they lack formality and the interpretation is not consistent. Conversely, ontologies can capture the intended meaning and specify modeling primitives. Disease Ontology (DO) that pertains to cancer's clinical stages and their corresponding information components is utilized to improve the reasoning ability of a decision support system (DSS). The proposed DSS uses Case-Based Reasoning (CBR) to consider disease manifestations and provides physicians with treatment solutions from similar previous cases for reference. The proposed DSS supports natural language processing (NLP) queries. The DSS obtained 84.63% accuracy in disease classification with the help of the ontology

arXiv.org e-Print Archive

HAL - Normandie Université