Search CORE

22 research outputs found

EFFICIENT DUPLICATE DETECTION USING PROGRESSIVE ALGORITHMS

Author: Radha Gottipati
Srikanth Lukka
Publication venue: International Journal of Innovative Technology and Research
Publication date: 23/10/2016
Field of study

Duplicate detection is the way toward recognizing different representations of same certifiable elements. Today, Duplicate detection strategies need to prepare ever bigger datasets in ever shorter time: keeping up the nature of a dataset turns out to be progressively troublesome. The two novel, dynamic copy detection calculations that altogether increment the ability of discovering copies while the execution time is constrained: They boost the pickup of the general procedure inside the time accessible by reporting most results much sooner than customary methodologies. Far reaching tests demonstrate that our dynamic calculations can twofold the proficiency after some time of customary copy detection and essentially enhance related work

International Journal of Innovative Technology and Research (IJITR)

A Progressive Technique for Duplicate Detection Evaluating Multiple Data Using Genetic Algorithm with Real World Objects

Author: Mallikarjuna K. Sai
N. Sushma N
Publication venue: Kakinada Institute of Engineering and Technology for Women
Publication date: 08/06/2018
Field of study

Here in this paper we discuss about an analysis on progressive duplicate record detection in real world data have at least two redundancy in database. Duplicate detection is strategy for recognizing all instances of various delineation of some genuine items, case client relationship administration or data mining. An agent case client relationship administration, where an organization loses cash by sending different inventories to a similar individual that would bring down consumer loyalty. Another application is Data Mining i.e to rectify input data is important to build valuable reports that from the premise of components. In this paper to learn about the progressive duplication calculation with the assistance of guide lessen to recognize the duplicates data and erase those duplicate records

International Journal of Science Engineering and Advance Technology (IJSEAT)

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Author: Christophides Vassilis
Efthymiou Vasilis
Papadakis George
Stefanidis Kostas
Publication venue
Publication date: 15/05/2019
Field of study

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

arXiv.org e-Print Archive

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University