Search CORE

198,781 research outputs found

Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

Maastricht University Research Portal

University of Twente Research Information

Statistical analysis of the owl:sameAs network for aligning concepts in the linking open data cloud

Author: A. Isaac
A. Miles
A.K. Elmagarmid
H. Al-Mubaid
J. Euzenat
L. Ding
M. Mao
M. Salvadores
M. Salvadores
P. Jaccard
P. Jain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The massively distributed publication of linked data has brought to the attention of scientific community the limitations of classic methods for achieving data integration and the opportunities of pushing the boundaries of the field by experimenting this collective enterprise that is the linking open data cloud. While reusing existing ontologies is the choice of preference, the exploitation of ontology alignments still is a required step for easing the burden of integrating heterogeneous data sets. Alignments, even between the most used vocabularies, is still poorly supported in systems nowadays whereas links between instances are the most widely used means for bridging the gap between different data sets. We provide in this paper an account of our statistical and qualitative analysis of the network of instance level equivalences in the Linking Open Data Cloud (i.e. the sameAs network) in order to automatically compute alignments at the conceptual level. Moreover, we explore the effect of ontological information when adopting classical Jaccard methods to the ontology alignment task. Automating such task will allow in fact to achieve a clearer conceptual description of the data at the cloud level, while improving the level of integration between datasets. <br/

Southampton (e-Prints Soton)

Crossref

Integrating heterogeneous distributed COTS discrete-event simulation packages: An emerging standards-based approach

Author: Low M Y H
Taylor S J E
Turner S J
Wang X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper reports on the progress made toward the emergence of standards to support the integration of heterogeneous discrete-event simulations (DESs) created in specialist support tools called commercial-off-the-shelf (COTS) discrete-event simulation packages (CSPs). The general standard for heterogeneous integration in this area has been developed from research in distributed simulation and is the IEEE 1516 standard The High Level Architecture (HLA). However, the specific needs of heterogeneous CSP integration require that the HLA is augmented by additional complementary standards. These are the suite of CSP interoperability (CSPI) standards being developed under the Simulation Interoperability Standards Organization (SISO-http://www.sisostds.org) by the CSPI Product Development Group (CSPI-PDG). The suite consists of several interoperability reference models (IRMs) that outline different integration needs of CSPI, interoperability frameworks (IFs) that define the HLA-based solution to each IRM, appropriate data exchange representations to specify the data exchanged in an IF, and benchmarks termed CSP emulators (CSPEs). This paper contributes to the development of the Type I IF that is intended to represent the HLA-based solution to the problem outlined by the Type I IRM (asynchronous entity passing) by developing the entity transfer specification (ETS) data exchange representation. The use of the ETS in an illustrative case study implemented using a prototype CSPE is shown. This case study also allows us to highlight the importance of event granularity and lookahead in the performance and development of the Type I IF, and to discuss possible methods to automate the capture of appropriate values of lookahead

CiteSeerX

Brunel University Research Archive

Comprehensive Review of Opinion Summarization

Author: Ganesan Kavita
Kim Hyun Duk
Sondhi Parikshit
Zhai ChengXiang
Publication venue
Publication date: 01/01/2011
Field of study

The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

CiteSeerX

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

The determinants of credit default swap spreads in the presence of structural breaks and counterparty risk

Author: Kapar B.
Olmo J.
Publication venue: Department of Economics, City University London
Publication date: 01/01/2011
Field of study

By investigating the determinants of CDS spreads on European contracts before and after the recent crisis we observe significant differences in the explanatory power of market and firm-specific variables. Before the crisis, the underlying credit risk in the overall CDS market is sufficient to explain credit risk. During the crisis investors have a differing view on the risk of financial and non-financial contracts; whereas non-financial CDS contracts reflect the credit risk of the counterparty, financial contracts do not. Our results suggest that in case of default of financial firms, investors expect the government to intervene to alleviate credit risk of the counterparty and fears of systemic risk

City Research Online

Integrating and Ranking Uncertain Scientific Data

Author: Detwiler Landon T
Gatterbauer Wolfgang
Louie Brenton
Suciu Dan
Tarczy-Hornoch Peter
Publication venue
Publication date: 01/01/2008
Field of study

Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

CiteSeerX

Crossref

University of Washington Structural Informatics Group Publications

Type-Constrained Representation Learning in Knowledge Graphs

Author: A Swartz
C Bizer
C Bizer
D Krompaß
GA Miller
N Lao
Publication venue
Publication date: 28/08/2015
Field of study

Large knowledge graphs increasingly add value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. Latent variable models have increasingly gained attention for the statistical modeling of knowledge graphs, showing promising results in tasks related to knowledge graph completion and cleaning. Besides storing facts about the world, schema-based knowledge graphs are backed by rich semantic descriptions of entities and relation-types that allow machines to understand the notion of things and their semantic relationships. In this work, we study how type-constraints can generally support the statistical modeling with latent variable models. More precisely, we integrated prior knowledge in form of type-constraints in various state of the art latent variable approaches. Our experimental results show that prior knowledge on relation-types significantly improves these models up to 77% in link-prediction tasks. The achieved improvements are especially prominent when a low model complexity is enforced, a crucial requirement when these models are applied to very large datasets. Unfortunately, type-constraints are neither always available nor always complete e.g., they can become fuzzy when entities lack proper typing. We show that in these cases, it can be beneficial to apply a local closed-world assumption that approximates the semantics of relation-types based on observations made in the data

arXiv.org e-Print Archive

Crossref