146,121 research outputs found
Review on Master Patient Index
In today's health care establishments there is a great diversity of
information systems. Each with different specificities and capacities,
proprietary communication methods, and hardly allow scalability. This set of
characteristics hinders the interoperability of all these systems, in the
search for the good of the patient. It is vulgar that, when we look at all the
databases of each of these information systems, we come across different
registers that refer to the same person; records with insufficient data;
records with erroneous data due to errors or misunderstandings when inserting
patient data; and records with outdated data. These problems cause duplicity,
incoherence, discontinuation and dispersion in patient data. With the intention
of minimizing these problems that the concept of a Master Patient Index is
necessary. A Master Patient Index proposes a centralized repository, which
indexes all patient records of a given set of information systems. Which is
composed of a set of demographic data sufficient to unambiguously identify a
person and a list of identifiers that identify the various records that the
patient has in the repositories of each information system. This solution
allows for synchronization between all the actors, minimizing incoherence, out
datedness, lack of data, and a decrease in duplicate registrations. The Master
Patient Index is an asset to patients, the medical staff and health care
providers
A Fast Detection of Duplicates Using Progressive Methods
In any database large amount of data will be present and as different people use this data, there is a chance of occurring quality of data problems, representing similar objects in different forms called as ‘duplicates’ and identifying these duplicates is one of the major problems. In now-a-days, different methods of duplicate - detection need to process huge datasets in shorter amounts of time and at same time maintaining the quality of a dataset which is becoming difficult. In existing system, methods of duplicate - detection like Sorted Neighborhood Method (SNM) and Blocking Methods are used for increasing the efficiency of finding duplicate records. In this paper, two new Progressive duplicate - detection algorithms are used for increasing the efficiency of finding the duplicate records and to eliminate the identified duplicate records if there is a limited time for duplicate - detection process. These algorithms increase the overall process gain by delivering complete results faster. In this paper am comparing the two progressive algorithms and results are displayed
Automatic Threshold Selections by exploration and exploitation of optimization algorithm in Record Deduplication
A deduplication process uses similarity function to identify the two entries are duplicate or not by setting the threshold. This threshold setting is an important issue to achieve more accuracy and it relies more on human intervention. Swarm Intelligence algorithm such as PSO and ABC have been used for automatic detection of threshold to find the duplicate records. Though the algorithms performed well there is still an insufficiency regarding the solution search equation, which is used to generate new candidate solutions based on the information of previous solutions.  The proposed work addressed two problems: first to find the optimal equation using Genetic Algorithm(GA) and next it adopts an modified  Artificial Bee Colony (ABC) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. CORA dataset is considered to analyze the proposed algorithm
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study on Local Data
Data warehouse is a collective entity of data from various data sources. Data are prone to several complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure data quality. Data cleaning service involves identification of errors, removing them and improve the quality of data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology, involved stages and services within data warehouse environment. It also provides a comparison through some experiments on local data with different cases, such as different spelling on different pronunciation, misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All services are evaluated based on the proposed quality of service metrics such as performance, capability to process the number of records, platform support, data heterogeneity, and price; so that in the future these services are reliable to handle big data in data warehouse
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called "matching dependencies" (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating four
components of ER: (a) Building a classifier for duplicate/non-duplicate record
pairs built using machine learning (ML) techniques; (b) Use of MDs for
supporting the blocking phase of ML; (c) Record merging on the basis of the
classifier results; and (d) The use of the declarative language "LogiQL" -an
extended form of Datalog supported by the "LogicBlox" platform- for all
activities related to data processing, and the specification and enforcement of
MDs.Comment: Final journal version, with some minor technical corrections.
Extended version of arXiv:1508.0601
Generalized Bayesian Record Linkage and Regression with Exact Error Propagation
Record linkage (de-duplication or entity resolution) is the process of
merging noisy databases to remove duplicate entities. While record linkage
removes duplicate entities from such databases, the downstream task is any
inferential, predictive, or post-linkage task on the linked data. One goal of
the downstream task is obtaining a larger reference data set, allowing one to
perform more accurate statistical analyses. In addition, there is inherent
record linkage uncertainty passed to the downstream task. Motivated by the
above, we propose a generalized Bayesian record linkage method and consider
multiple regression analysis as the downstream task. Records are linked via a
random partition model, which allows for a wide class to be considered. In
addition, we jointly model the record linkage and downstream task, which allows
one to account for the record linkage uncertainty exactly. Moreover, one is
able to generate a feedback propagation mechanism of the information from the
proposed Bayesian record linkage model into the downstream task. This feedback
effect is essential to eliminate potential biases that can jeopardize resulting
downstream task. We apply our methodology to multiple linear regression, and
illustrate empirically that the "feedback effect" is able to improve the
performance of record linkage.Comment: 18 pages, 5 figure
Analisis Sistem Akuntansi Penggajian Dalam Rangka Meningkatkan Pengendalian Intern (Studi Pada Dinas Kebudayaan Dan Pariwisata Kabupaten Kediri)
Payroll accounting system is closely related to internal control to minimize the problems and find solutions appropriate. The aim of this study was to analysis the payroll accounting system that existed at the Dinas Kebudayaan dan Pariwisata Kabupaten Kediri and is already improving existing internal controls. Results of this study concluded that the existing payroll system at the Dinas Kebudayaan dan Pariwisata Kabupaten Kediri is good enough. This is proved by the documents and records used in the payroll system is already contains all the necessary information. However, there are still some weaknesses, there is duplicate functions between the manufacturer and payroll functions. In addition, in the employees acceptance are not in accordance with the required fields. Existing weaknesses could lead to problems in the payroll system so that internal control can not be achieved. Therefore, need a improvements so that internal control can be improved
Better duplicate detection for systematic reviewers: Evaluation of Systematic Review Assistant-Deduplication Module
BACKGROUND: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. METHODS: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote’s default one step auto-deduplication process matching on (‘author’, ‘year’, ‘title’). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. RESULTS: The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. CONCLUSIONS: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online
- …