13,612 research outputs found
Duplicate Detection in Probabilistic Data
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data
Standardization and application of microsatellite markers for variety identification in tomato and wheat
The present study is part of a EU project that aims to demonstrate the technical viability of STMS markers for variety identification. As examples two important European crop species, tomato and wheat were chosen. Initially, about 30-40 STMS markers were used to identify a set of 20 good markers per crop and to standardise the methodology and the interpretation of the results in different laboratories. Several systems were used for the detection of STMS polymorphisms. The selected STMS markers are being tested on 500 varieties of each species and databases are being constructed. The first comparisons of data generated by the different laboratories revealed a high degree of agreement. The causes of discrepancies between duplicate samples analysed in different laboratories and precautions to prevent them, are discussed
Ranking News-Quality Multimedia
News editors need to find the photos that best illustrate a news piece and
fulfill news-media quality standards, while being pressed to also find the most
recent photos of live events. Recently, it became common to use social-media
content in the context of news media for its unique value in terms of immediacy
and quality. Consequently, the amount of images to be considered and filtered
through is now too much to be handled by a person. To aid the news editor in
this process, we propose a framework designed to deliver high-quality,
news-press type photos to the user. The framework, composed of two parts, is
based on a ranking algorithm tuned to rank professional media highly and a
visual SPAM detection module designed to filter-out low-quality media. The core
ranking algorithm is leveraged by aesthetic, social and deep-learning semantic
features. Evaluation showed that the proposed framework is effective at finding
high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and
a classification precision of 70%.Comment: To appear in ICMR'1
Localization Recall Precision (LRP): A New Performance Metric for Object Detection
Average precision (AP), the area under the recall-precision (RP) curve, is
the standard performance measure for object detection. Despite its wide
acceptance, it has a number of shortcomings, the most important of which are
(i) the inability to distinguish very different RP curves, and (ii) the lack of
directly measuring bounding box localization accuracy. In this paper, we
propose 'Localization Recall Precision (LRP) Error', a new metric which we
specifically designed for object detection. LRP Error is composed of three
components related to localization, false negative (FN) rate and false positive
(FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable
LRP error representing the best achievable configuration of the detector in
terms of recall-precision and the tightness of the boxes. In contrast to AP,
which considers precisions over the entire recall domain, Optimal LRP
determines the 'best' confidence score threshold for a class, which balances
the trade-off between localization and recall-precision. In our experiments, we
show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides
richer and more discriminative information than AP. We also demonstrate that
the best confidence score thresholds vary significantly among classes and
detectors. Moreover, we present LRP results of a simple online video object
detector which uses a SOTA still image object detector and show that the
class-specific optimized thresholds increase the accuracy against the common
approach of using a general threshold for all classes. At
https://github.com/cancam/LRP we provide the source code that can compute LRP
for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted
to other datasets as well.Comment: to appear in ECCV 201
Maximum Production Of Transmission Messages Rate For Service Discovery Protocols
Minimizing the number of dropped User Datagram Protocol (UDP) messages in a network is regarded as a challenge by researchers. This issue represents serious problems for many protocols particularly those that depend on sending messages as part of their strategy, such us service discovery protocols. This paper proposes and evaluates an algorithm to predict the minimum period of time required between two or more consecutive messages and suggests the minimum queue sizes for the routers, to manage the traffic and minimise the number of dropped messages that has been caused by either congestion or queue overflow or both together. The algorithm has been applied to the Universal Plug and Play (UPnP) protocol using ns2 simulator. It was tested when the routers were connected in two configurations; as a centralized and de centralized. The message length and bandwidth of the links among the routers were taken in the consideration. The result shows Better improvement in number of dropped messages `among the routers
An automated wrapper-based approach to the design of dependable software
The design of dependable software systems invariably comprises two main activities: (i) the design of dependability mechanisms, and (ii) the location of dependability mechanisms. It has been shown that these activities are intrinsically difficult. In this paper we propose an automated wrapper-based methodology to circumvent the problems associated with the design and location of dependability mechanisms. To achieve this we replicate important variables so that they can be used as part of standard, efficient dependability mechanisms. These well-understood mechanisms are then deployed in all relevant locations. To validate the proposed methodology we apply it to three complex software systems, evaluating the dependability enhancement and execution overhead in each case. The results generated demonstrate that the system failure rate of a wrapped software system can be several orders of magnitude lower than that of an unwrapped equivalent
- …