8,917 research outputs found

    California's Most Vulnerable Parents: When Maltreated Children Have Children

    Get PDF
    This report takes an in-depth look at the intersection between teen births, child maltreatment, and involvement with the child protection system. Putnam-Hornstein, along with other researchers at USC and the University of California, Berkeley, linked and then analyzed roughly 1.5 million California birth records and 1 million CPS records, with a second phase of research focusing on the maltreatment risk of children born to adolescent mothers.In 2012, California became one of the first states in the nation to extend foster youth status until age 21. Different programs and services will likely be required to adequately respond to the needs and circumstances of non-minor youth who remain in the foster care system, particularly in the area of parenting supports. This report finds that as many as one in three female youth in California may be parenting by the time they exit the foster care system on their 21st birthday

    Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies

    Get PDF
    The Enhanced Matching System (EMS) is a probabilistic record linkage program developed by the tuberculosis section at Public Health England to match data for individuals across two datasets. This paper outlines how EMS works and investigates its accuracy for linkage across public health datasets

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    Stat J IAOS

    Get PDF
    Record linkage enables survey data to be integrated with other data sources, expanding the analytic potential of both sources. However, depending on the number of records being linked, the processing time can be prohibitive. This paper describes a case study using a supervised machine learning algorithm, known as the Sequential Coverage Algorithm (SCA). The SCA was used to develop the join strategy for two data sources, the National Center for Health Statistics' (NCHS) 2016 National Hospital Care Survey (NHCS) and the Center for Medicare & Medicaid Services (CMS) Enrollment Database (EDB), during record linkage. Due to the size of the CMS data, common record joining methods (i.e. blocking) were used to reduce the number of pairs that need to be evaluated to identify the vast majority of matches. NCHS conducted a case study examining how the SCA improved the efficiency of blocking. This paper describes how the SCA was used to design the blocking used in this linkage.CC999999/ImCDC/Intramural CDC HHSUnited States/2021-01-18T00:00:00Z34413910PMC83716781023

    Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach

    Full text link
    Abstract. One challenge for Linked Data is scalably establishing high-quality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional ap-proaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In this paper, we pro-pose a candidate selection algorithm for pruning the search space for entity coreference. We select candidate instance pairs by computing a character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. We index the instances on the chosen predicates ’ literal values to efficiently look up similar in-stances. We evaluate our approach on two RDF and three structured datasets. We show that the traditional metrics don’t always accurately reflect the relative benefits of candidate selection, and propose additional metrics. We show that our algorithm frequently outperforms alternatives and is able to process 1 million instances in under one hour on a single Sun Workstation. Furthermore, on the RDF datasets, we show that the entire entity coreference process scales well by applying our technique. Surprisingly, this high recall, low precision filtering mechanism frequently leads to higher F-scores in the overall system
    • …
    corecore