23,215 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Multi-camera Realtime 3D Tracking of Multiple Flying Animals
Automated tracking of animal movement allows analyses that would not
otherwise be possible by providing great quantities of data. The additional
capability of tracking in realtime - with minimal latency - opens up the
experimental possibility of manipulating sensory feedback, thus allowing
detailed explorations of the neural basis for control of behavior. Here we
describe a new system capable of tracking the position and body orientation of
animals such as flies and birds. The system operates with less than 40 msec
latency and can track multiple animals simultaneously. To achieve these
results, a multi target tracking algorithm was developed based on the Extended
Kalman Filter and the Nearest Neighbor Standard Filter data association
algorithm. In one implementation, an eleven camera system is capable of
tracking three flies simultaneously at 60 frames per second using a gigabit
network of nine standard Intel Pentium 4 and Core 2 Duo computers. This
manuscript presents the rationale and details of the algorithms employed and
shows three implementations of the system. An experiment was performed using
the tracking system to measure the effect of visual contrast on the flight
speed of Drosophila melanogaster. At low contrasts, speed is more variable and
faster on average than at high contrasts. Thus, the system is already a useful
tool to study the neurobiology and behavior of freely flying animals. If
combined with other techniques, such as `virtual reality'-type computer
graphics or genetic manipulation, the tracking system would offer a powerful
new way to investigate the biology of flying animals.Comment: pdfTeX using libpoppler 3.141592-1.40.3-2.2 (Web2C 7.5.6), 18 pages
with 9 figure
Digital-forensics based pattern recognition for discovering identities in electronic evidence
With the pervasiveness of computers and mobile devices, digital forensics becomes more important in law enforcement. Detectives increasingly depend on the scarce support of digital specialists which impedes efficiency of criminal investigations. This paper proposes and algorithm to extract, merge and rank identities that are encountered in the electronic evidence during processing. Two experiments are described demonstrating that our approach can assist with the identification of frequently occurring identities so that investigators can prioritize the investigation of evidence units accordingly
JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition
This paper proposes a novel algorithm to reassemble an arbitrarily shredded
image to its original status. Existing reassembly pipelines commonly consist of
a local matching stage and a global compositions stage. In the local stage, a
key challenge in fragment reassembly is to reliably compute and identify
correct pairwise matching, for which most existing algorithms use handcrafted
features, and hence, cannot reliably handle complicated puzzles. We build a
deep convolutional neural network to detect the compatibility of a pairwise
stitching, and use it to prune computed pairwise matches. To improve the
network efficiency and accuracy, we transfer the calculation of CNN to the
stitching region and apply a boost training strategy. In the global composition
stage, we modify the commonly adopted greedy edge selection strategies to two
new loop closure based searching algorithms. Extensive experiments show that
our algorithm significantly outperforms existing methods on solving various
puzzles, especially those challenging ones with many fragment pieces
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
Objective Classification of Galaxy Spectra using the Information Bottleneck Method
A new method for classification of galaxy spectra is presented, based on a
recently introduced information theoretical principle, the `Information
Bottleneck'. For any desired number of classes, galaxies are classified such
that the information content about the spectra is maximally preserved. The
result is classes of galaxies with similar spectra, where the similarity is
determined via a measure of information. We apply our method to approximately
6000 galaxy spectra from the ongoing 2dF redshift survey, and a mock-2dF
catalogue produced by a Cold Dark Matter-based semi-analytic model of galaxy
formation. We find a good match between the mean spectra of the classes found
in the data and in the models. For the mock catalogue, we find that the classes
produced by our algorithm form an intuitively sensible sequence in terms of
physical properties such as colour, star formation activity, morphology, and
internal velocity dispersion. We also show the correlation of the classes with
the projections resulting from a Principal Component Analysis.Comment: submitted to MNRAS, 17 pages, Latex, with 14 figures embedde
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
- …