Search CORE

9,272 research outputs found

Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

Maastricht University Research Portal

University of Twente Research Information

On the freezing of variables in random constraint satisfaction problems

Author: A. Goerdt
A. Montanari
A. Montanari
A. Montanari
D. Aldous
D. Aldous
D. Panchenko
E. Friedgut
E. Mossel
F. Guerra
F. Krzakala
F. Krzakala
F.R. Kschischang
G. Biroli
Guilhem Semerjian
H. Zhou
H.-O. Georgii
J. Franco
J. Mourik van
J.M. Schwarz
M. Mézard
M. Mézard
M. Mézard
M. Mézard
M. Mézard
M. Mézard
M. Mézard
M. Mézard
M. Sellitto
M. Talagrand
M. Talagrand
M.R. Garey
O. Dubois
P. Flajolet
P.M. Duxbury
R. Monasson
S. Cocco
S. Franz
S. Janson
S. Mertens
T. Castellani
T. Lindvall
W. Götze
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/05/2007
Field of study

The set of solutions of random constraint satisfaction problems (zero energy groundstates of mean-field diluted spin glasses) undergoes several structural phase transitions as the amount of constraints is increased. This set first breaks down into a large number of well separated clusters. At the freezing transition, which is in general distinct from the clustering one, some variables (spins) take the same value in all solutions of a given cluster. In this paper we study the critical behavior around the freezing transition, which appears in the unfrozen phase as the divergence of the sizes of the rearrangements induced in response to the modification of a variable. The formalism is developed on generic constraint satisfaction problems and applied in particular to the random satisfiability of boolean formulas and to the coloring of random graphs. The computation is first performed in random tree ensembles, for which we underline a connection with percolation models and with the reconstruction problem of information theory. The validity of these results for the original random ensembles is then discussed in the framework of the cavity method.Comment: 32 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Modern Coding Theory: The Statistical Mechanics and Computer Science Point of View

Author: Montanari Andrea
Urbanke Rudiger
Publication venue
Publication date: 01/01/2007
Field of study

These are the notes for a set of lectures delivered by the two authors at the Les Houches Summer School on `Complex Systems' in July 2006. They provide an introduction to the basic concepts in modern (probabilistic) coding theory, highlighting connections with statistical mechanics. We also stress common concepts with other disciplines dealing with similar problems that can be generically referred to as `large graphical models'. While most of the lectures are devoted to the classical channel coding problem over simple memoryless channels, we present a discussion of more complex channel models. We conclude with an overview of the main open challenges in the field.Comment: Lectures at Les Houches Summer School on `Complex Systems', July 2006, 44 pages, 25 ps figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Efficient Decomposed Learning for Structured Prediction

Author: Roth Dan
Samdani Rajhans
Publication venue
Publication date: 01/01/2012
Field of study

Structured prediction is the cornerstone of several machine learning applications. Unfortunately, in structured prediction settings with expressive inter-variable interactions, exact inference-based learning algorithms, e.g. Structural SVM, are often intractable. We present a new way, Decomposed Learning (DecL), which performs efficient learning by restricting the inference step to a limited part of the structured spaces. We provide characterizations based on the structure, target parameters, and gold labels, under which DecL is equivalent to exact learning. We then show that in real world settings, where our theoretical assumptions may not completely hold, DecL-based algorithms are significantly more efficient and as accurate as exact learning.Comment: ICML201

arXiv.org e-Print Archive

CiteSeerX

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

Author: Dalianis Hercules
Eriksson Gunnar
Hassel Martin
Karlgren Jussi
Täckström Oscar
Velupillai Sumithra
Publication venue
Publication date: 01/01/2010
Field of study

This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5

CiteSeerX

Publikationer från Stockholms universitet

Publikationer från Uppsala Universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive