Search CORE

425 research outputs found

On Demand Quality of web services using Ranking by multi criteria

Author: Kumar Pradeep
Meena B.
Rajanikath Nagelli
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 04/11/2011
Field of study

In the Web database scenario, the records to match are highly query-dependent, since they can only be obtained through online queries. Moreover, they are only a partial and biased portion of all the data in the source Web databases. Consequently, hand-coding or offline-learning approaches are not appropriate for two reasons. First, the full data set is not available beforehand, and therefore, good representative data for training are hard to obtain. Second, and most importantly, even if good representative data are found and labeled for learning, the rules learned on the representatives of a full data set may not work well on a partial and biased part of that data set. Keywords: SOA, Web Services, Network

International Institute for Science, Technology and Education (IISTE): E-Journals

Recommended from our members

A. Survey of Entity Resolution and Record Linkage Methodologies

Author: Brizan David Guy
Tansel Abdullah Uz
Publication venue: CSUSB ScholarWorks
Publication date: 07/01/2015
Field of study

A great deal of research is focused on formation of a data warehouse. This is an important area of research as it could save many computation cycles and thus allow accurate information provided to the right people at the right time. Two considerations when forming a data warehouse are data cleansing (including entity resolution) and with schema integration (including record linkage). Uncleansed and fragmented data requires time to decipher and may lead to increased costs for an organization, so data cleansing and schema integration can save a great many (human) computation cycles and can lead to higher organizational efficiency. In this study we survey the literature for the methodologies proposed or developed for entity resolution and record linkage. This survey provides a foundation for solving many problems in data warehousing. For instance, little or no research has been directed at the problem of maintenance of cleansed and linked relations

CSUSB ScholarWorks

INDEPENDENT DE-DUPLICATION IN DATA CLEANING

Author: Ajumobi Udechukwu
Christie Ezeife
Ken Barker
Publication venue: Faculty of Organization and Informatics University of Zagreb
Publication date: 01/01/2005
Field of study

Many organizations collect large amounts of data to support their business and decision-making processes. The data originate from a variety of sources that may have inherent data-quality problems. These problems become more pronounced when heterogeneous data sources are integrated (for example, in data warehouses). A major problem that arises from integrating different databases is the existence of duplicates. The challenge of de-duplication is identifying “equivalent” records within the database. Most published research in de-duplication propose techniques that rely heavily on domain knowledge. A few others propose solutions that are partially domain-independent. This paper identifies two levels of domain-independence in de-duplication namely: domain-independence at the attribute level, and domain-independence at the record level. The paper then proposes a positional algorithm that achieves domain-independent de-duplication at the attribute level, and a technique for field weighting by data profiling, which, when used with the positional algorithm, achieves domain-independence at the record level. Experiments show that the proposed techniques achieve more accurate de-duplication than the existing algorithms

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Improving Database Quality through Eliminating Duplicate Records

Author: Cather Martha E.
Sung Andrew H.
Wei Mingzhen
Publication venue: Scholars\u27 Mine
Publication date: 01/11/2006
Field of study

Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

The role of Industry 4.0 enabling technologies for safety management: A systematic literature review

Author: Antonio Forcina
Domenico Falcone
Publication venue
Publication date: 01/01/2021
Field of study

Abstract Innovations introduced during the Industry 4.0 era consist in the integration of the so called "nine pillars of technologies" in manufacturing, transforming the conventional factory in a smart factory. The aim of this study is to investigate enabling technologies of Industry 4.0, focusing on technologies that have a greater impact on safety management. Main characteristics of such technologies will be identified and described according to their use in an industrial environment. In order to do this, we chose a systematic literature review (SLR) to answer the research question in a comprehensively way. Results show that articles can be grouped according to different criteria. Moreover, we found that Industry 4.0 can increase safety levels in warehouse and logistic, as well as several solutions are available for building sector

Open Access Repository

A METHOD TO IDENTIFY DUPLICATE REFRESH RECORDS WITH CONTINUOUS QUERY BASED MULTIPLE WEB DATABASES

Author: . Sudhakar
Vasudevan Shriram K
. Sivaraman
Vighneshwar Jai
Publication venue: International Journal of Innovative Technology and Research
Publication date: 03/07/1948
Field of study

Record matching, which identifies the records that represent the same real world entity is an important step for data integration. In information retrieval, one of the main problems is to retrieve a set of documents that is semantically related to a given user query. Most existing work requires human-labelled training data (positive, negative, or both), which places a heavy burden on users. Existing supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. A new record matching method named Unsupervised Duplicate Refresh Elimination (UDRE) is proposed for identifying and eliminating duplicates among refresh records in dynamic query results. The idea of this research is to adjust the weights classifier record fields in calculating similarities among refresh records. Three classifiers namely weight component similarity summing time bound classifier, support vector machine classifier and threshold-based support vector machine classifier are iteratively employed with UDRE where the first classifier utilizes the weights concentrated on string similarity measures for comparing records from different data sources. We also design a new record alignment algorithm that aligns the attributes in Identify Duplicate Refresh Records

Murray State University

International Journal of Innovative Technology and Research (IJITR)

A METHOD TO IDENTIFY DUPLICATE REFRESH RECORDS WITH CONTINUOUS QUERY BASED MULTIPLE WEB DATABASES

Author: . Sivaraman
. Sudhakar
Vasudevan Shriram K
Vighneshwar Jai
Publication venue: International Journal of Innovative Technology and Research
Publication date: 01/06/2013
Field of study

International Journal of Innovative Technology and Research (IJITR)

Quality and complexity measures for data linkage and deduplication

Author: C Shearer
Centre for Epidemiology and Research NSW Department of Health
CW Kelman
D Pyle
DP Bertsekas
DS Zingmond
E Rahm
HB Newcombe
I Fellegi
L Gill
MA Hernandez
ME Smith
RA Baeza-Yates
S Gomatam
S Salzberg
T Blakely
TROC Fawcett
WS Cooper
Publication venue: Springer
Publication date: 01/01/2007
Field of study

Summary. Deduplicating one data set or linking several data sets are increasingly important tasks in the data preparation steps of many data mining projects. The aim of such linkages is to match all records relating to the same entity. Research interest in this area has increased in recent years, with techniques originating from statistics, machine learning, information retrieval, and database research being combined and applied to improve the linkage quality, as well as to increase performance and efficiency when linking or deduplicating very large data sets. Different measures have been used to characterise the quality and complexity of data linkage algorithms, and several new metrics have been proposed. An overview of the issues involved in measuring data linkage and deduplication quality and complexity is presented in this chapter. It is shown that measures in the space of record pair comparisons can produce deceptive quality results. Various measures are discussed and recommendations are given on how to assess data linkage and deduplication quality and complexity. Key words: data or record linkage, data integration and matching, deduplication, data mining pre-processing, quality and complexity measures

CiteSeerX

Crossref

Correlation-based methods for data cleaning, with application to biological databases

Author: KOH LIE YONG
Publication venue
Publication date: 25/09/2007
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

The role of Industry 4.0 enabling technologies for safety management: A systematic literature review

Author: Falcone D.
Forcina A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Innovations introduced during the Industry 4.0 era consist in the integration of the so called "nine pillars of technologies" in manufacturing, transforming the conventional factory in a smart factory. The aim of this study is to investigate enabling technologies of Industry 4.0, focusing on technologies that have a greater impact on safety management. Main characteristics of such technologies will be identified and described according to their use in an industrial environment. In order to do this, we chose a systematic literature review (SLR) to answer the research question in a comprehensively way. Results show that articles can be grouped according to different criteria. Moreover, we found that Industry 4.0 can increase safety levels in warehouse and logistic, as well as several solutions are available for building sector

Archivio della ricerca - Università degli studi di Napoli "Parthenope"