Search CORE

345 research outputs found

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

Author: Keulen Maurice van
Panse Fabian
Ritter Norbert
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2010
Field of study

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

University of Twente Research Information

Accommodations deduplication

Author: Pérez Sena Francis Damián
Publication venue
Publication date: 25/09/2018
Field of study

The problem to address is the accommodations deduplication. The deduplication is a special case of entity resolution (ER) consisting in grouping different representa- tions of the same entity, usually coming from different sources. The deduplication is a complex process that requires several phases, being the most common ones, block- ing and pair resolution. A new phase is introduced in addition to the previous ones, clustering, that was not considered in previous work. We aim to build a framework able to cover the different phases and design a strategy of clustering maximizing the precision with the maximal possible recall

Archivo Digital para la Docencia y la Investigación

Indeterministic Handling of Uncertain Decisions in Deduplication

Author: Batini C.
Benjelloun O.
Dechter R.
Fabian Panse
Koch C.
Koudas N.
Maurice van Keulen
Norbert Ritter
Ravikumar P. D.
Sen P.
Wang Y. R.
Widom J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Deduplication potential of HPC applications' checkpoints

Author: Andre Brinkmann (7168145)
Federico Padua (7169648)
Jurgen Kaiser (7169645)
Lars Nagel (4482649)
Ramy Gad (7168196)
Tim Suss (7168136)
Publication venue
Publication date: 01/01/2016
Field of study

© 2016 IEEE. HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design

Loughborough University Institutional Repository

Accommodations deduplication

Author: Pérez Sena Francis Damián
Publication venue
Publication date: 01/01/2018
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Undermining User Privacy on Mobile Devices Using AI

Author: Abadi Martin
Berk Gü
Diao Wenrui
Genkin Daniel
Green Marc
Gruss Daniel
Hu Wei-Ming
Lipp Moritz
Maghrebi Houssem
Martinasek Zdenek
Martinasek Zdenek
Prouff Emmanuel
Schuster Roei
Schwarz Michael
Varadarajan Venkatanathan
Vila Pepe
Yarom Yuval
Publication venue
Publication date: 01/01/2019
Field of study

Over the past years, literature has shown that attacks exploiting the microarchitecture of modern processors pose a serious threat to the privacy of mobile phone users. This is because applications leave distinct footprints in the processor, which can be used by malware to infer user activities. In this work, we show that these inference attacks are considerably more practical when combined with advanced AI techniques. In particular, we focus on profiling the activity in the last-level cache (LLC) of ARM processors. We employ a simple Prime+Probe based monitoring technique to obtain cache traces, which we classify with Deep Learning methods including Convolutional Neural Networks. We demonstrate our approach on an off-the-shelf Android phone by launching a successful attack from an unprivileged, zeropermission App in well under a minute. The App thereby detects running applications with an accuracy of 98% and reveals opened websites and streaming videos by monitoring the LLC for at most 6 seconds. This is possible, since Deep Learning compensates measurement disturbances stemming from the inherently noisy LLC monitoring and unfavorable cache characteristics such as random line replacement policies. In summary, our results show that thanks to advanced AI techniques, inference attacks are becoming alarmingly easy to implement and execute in practice. This once more calls for countermeasures that confine microarchitectural leakage and protect mobile phone applications, especially those valuing the privacy of their users

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

XML Matchers: approaches and challenges

Author: Agreste Santa
De Meo Pasquale
Ferrara Emilio
Ursino Domenico
Publication venue: 'Elsevier BV'
Publication date: 10/07/2014
Field of study

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche