146,121 research outputs found

    Review on Master Patient Index

    Full text link
    In today's health care establishments there is a great diversity of information systems. Each with different specificities and capacities, proprietary communication methods, and hardly allow scalability. This set of characteristics hinders the interoperability of all these systems, in the search for the good of the patient. It is vulgar that, when we look at all the databases of each of these information systems, we come across different registers that refer to the same person; records with insufficient data; records with erroneous data due to errors or misunderstandings when inserting patient data; and records with outdated data. These problems cause duplicity, incoherence, discontinuation and dispersion in patient data. With the intention of minimizing these problems that the concept of a Master Patient Index is necessary. A Master Patient Index proposes a centralized repository, which indexes all patient records of a given set of information systems. Which is composed of a set of demographic data sufficient to unambiguously identify a person and a list of identifiers that identify the various records that the patient has in the repositories of each information system. This solution allows for synchronization between all the actors, minimizing incoherence, out datedness, lack of data, and a decrease in duplicate registrations. The Master Patient Index is an asset to patients, the medical staff and health care providers

    A Fast Detection of Duplicates Using Progressive Methods

    Get PDF
    In any database large amount of data will be present and as different people use this data, there is a chance of occurring quality of data problems, representing similar objects in different forms called as ‘duplicates’ and identifying these duplicates is one of the major problems. In now-a-days, different methods of duplicate - detection need to process huge datasets in shorter amounts of time and at same time maintaining the quality of a dataset which is becoming difficult. In existing system, methods of duplicate - detection like Sorted Neighborhood Method (SNM) and Blocking Methods are used for increasing the efficiency of finding duplicate records. In this paper, two new Progressive duplicate - detection algorithms are used for increasing the efficiency of finding the duplicate records and to eliminate the identified duplicate records if there is a limited time for duplicate - detection process. These algorithms increase the overall process gain by delivering complete results faster. In this paper am comparing the two progressive algorithms and results are displayed

    Automatic Threshold Selections by exploration and exploitation of optimization algorithm in Record Deduplication

    Get PDF
    A deduplication process uses similarity function to identify the two entries are duplicate or not by setting the threshold.  This threshold setting is an important issue to achieve more accuracy and it relies more on human intervention. Swarm Intelligence algorithm such as PSO and ABC have been used for automatic detection of threshold to find the duplicate records. Though the algorithms performed well there is still an insufficiency regarding the solution search equation, which is used to generate new candidate solutions based on the information of previous solutions.  The proposed work addressed two problems: first to find the optimal equation using Genetic Algorithm(GA) and next it adopts an modified  Artificial Bee Colony (ABC) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. CORA dataset is considered to analyze the proposed algorithm

    Data Cleaning Service for Data Warehouse: An Experimental Comparative Study on Local Data

    Get PDF
    Data warehouse is a collective entity of data from various data sources. Data are prone to several complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure data quality. Data cleaning service involves identification of errors, removing them and improve the quality of data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology, involved stages and services within data warehouse environment. It also provides a comparison through some experiments on local data with different cases, such as different spelling on different pronunciation, misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All services are evaluated based on the proposed quality of service metrics such as performance, capability to process the number of records, platform support, data heterogeneity, and price; so that in the future these services are reliable to handle big data in data warehouse

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.Comment: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.0601

    Generalized Bayesian Record Linkage and Regression with Exact Error Propagation

    Full text link
    Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-linkage task on the linked data. One goal of the downstream task is obtaining a larger reference data set, allowing one to perform more accurate statistical analyses. In addition, there is inherent record linkage uncertainty passed to the downstream task. Motivated by the above, we propose a generalized Bayesian record linkage method and consider multiple regression analysis as the downstream task. Records are linked via a random partition model, which allows for a wide class to be considered. In addition, we jointly model the record linkage and downstream task, which allows one to account for the record linkage uncertainty exactly. Moreover, one is able to generate a feedback propagation mechanism of the information from the proposed Bayesian record linkage model into the downstream task. This feedback effect is essential to eliminate potential biases that can jeopardize resulting downstream task. We apply our methodology to multiple linear regression, and illustrate empirically that the "feedback effect" is able to improve the performance of record linkage.Comment: 18 pages, 5 figure

    Analisis Sistem Akuntansi Penggajian Dalam Rangka Meningkatkan Pengendalian Intern (Studi Pada Dinas Kebudayaan Dan Pariwisata Kabupaten Kediri)

    Full text link
    Payroll accounting system is closely related to internal control to minimize the problems and find solutions appropriate. The aim of this study was to analysis the payroll accounting system that existed at the Dinas Kebudayaan dan Pariwisata Kabupaten Kediri and is already improving existing internal controls. Results of this study concluded that the existing payroll system at the Dinas Kebudayaan dan Pariwisata Kabupaten Kediri is good enough. This is proved by the documents and records used in the payroll system is already contains all the necessary information. However, there are still some weaknesses, there is duplicate functions between the manufacturer and payroll functions. In addition, in the employees acceptance are not in accordance with the required fields. Existing weaknesses could lead to problems in the payroll system so that internal control can not be achieved. Therefore, need a improvements so that internal control can be improved

    Better duplicate detection for systematic reviewers: Evaluation of Systematic Review Assistant-Deduplication Module

    Get PDF
    BACKGROUND: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote. METHODS: A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote’s default one step auto-deduplication process matching on (‘author’, ‘year’, ‘title’). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm. RESULTS: The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication. CONCLUSIONS: The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online
    • …
    corecore