9,272 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
On the freezing of variables in random constraint satisfaction problems
The set of solutions of random constraint satisfaction problems (zero energy
groundstates of mean-field diluted spin glasses) undergoes several structural
phase transitions as the amount of constraints is increased. This set first
breaks down into a large number of well separated clusters. At the freezing
transition, which is in general distinct from the clustering one, some
variables (spins) take the same value in all solutions of a given cluster. In
this paper we study the critical behavior around the freezing transition, which
appears in the unfrozen phase as the divergence of the sizes of the
rearrangements induced in response to the modification of a variable. The
formalism is developed on generic constraint satisfaction problems and applied
in particular to the random satisfiability of boolean formulas and to the
coloring of random graphs. The computation is first performed in random tree
ensembles, for which we underline a connection with percolation models and with
the reconstruction problem of information theory. The validity of these results
for the original random ensembles is then discussed in the framework of the
cavity method.Comment: 32 pages, 7 figure
Modern Coding Theory: The Statistical Mechanics and Computer Science Point of View
These are the notes for a set of lectures delivered by the two authors at the
Les Houches Summer School on `Complex Systems' in July 2006. They provide an
introduction to the basic concepts in modern (probabilistic) coding theory,
highlighting connections with statistical mechanics. We also stress common
concepts with other disciplines dealing with similar problems that can be
generically referred to as `large graphical models'.
While most of the lectures are devoted to the classical channel coding
problem over simple memoryless channels, we present a discussion of more
complex channel models. We conclude with an overview of the main open
challenges in the field.Comment: Lectures at Les Houches Summer School on `Complex Systems', July
2006, 44 pages, 25 ps figure
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
Efficient Decomposed Learning for Structured Prediction
Structured prediction is the cornerstone of several machine learning
applications. Unfortunately, in structured prediction settings with expressive
inter-variable interactions, exact inference-based learning algorithms, e.g.
Structural SVM, are often intractable. We present a new way, Decomposed
Learning (DecL), which performs efficient learning by restricting the inference
step to a limited part of the structured spaces. We provide characterizations
based on the structure, target parameters, and gold labels, under which DecL is
equivalent to exact learning. We then show that in real world settings, where
our theoretical assumptions may not completely hold, DecL-based algorithms are
significantly more efficient and as accurate as exact learning.Comment: ICML201
Uncertainty Detection as Approximate Max-Margin Sequence Labelling
This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features.
Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5
- …