35,284 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Enhancing hyperspectral image unmixing with spatial correlations
This paper describes a new algorithm for hyperspectral image unmixing. Most
of the unmixing algorithms proposed in the literature do not take into account
the possible spatial correlations between the pixels. In this work, a Bayesian
model is introduced to exploit these correlations. The image to be unmixed is
assumed to be partitioned into regions (or classes) where the statistical
properties of the abundance coefficients are homogeneous. A Markov random field
is then proposed to model the spatial dependency of the pixels within any
class. Conditionally upon a given class, each pixel is modeled by using the
classical linear mixing model with additive white Gaussian noise. This strategy
is investigated the well known linear mixing model. For this model, the
posterior distributions of the unknown parameters and hyperparameters allow
ones to infer the parameters of interest. These parameters include the
abundances for each pixel, the means and variances of the abundances for each
class, as well as a classification map indicating the classes of all pixels in
the image. To overcome the complexity of the posterior distribution of
interest, we consider Markov chain Monte Carlo methods that generate samples
distributed according to the posterior of interest. The generated samples are
then used for parameter and hyperparameter estimation. The accuracy of the
proposed algorithms is illustrated on synthetic and real data.Comment: Manuscript accepted for publication in IEEE Trans. Geoscience and
Remote Sensin
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Joint extraction of entities and relations is an important task in
information extraction. To tackle this problem, we firstly propose a novel
tagging scheme that can convert the joint extraction task to a tagging problem.
Then, based on our tagging scheme, we study different end-to-end models to
extract entities and their relations directly, without identifying entities and
relations separately. We conduct experiments on a public dataset produced by
distant supervision method and the experimental results show that the tagging
based methods are better than most of the existing pipelined and joint learning
methods. What's more, the end-to-end model proposed in this paper, achieves the
best results on the public dataset
- âŠ