749,402 research outputs found

    Data linkage algebra, data linkage dynamics, and priority rewriting

    Get PDF
    We introduce an algebra of data linkages. Data linkages are intended for modelling the states of computations in which dynamic data structures are involved. We present a simple model of computation in which states of computations are modelled as data linkages and state changes take place by means of certain actions. We describe the state changes and replies that result from performing those actions by means of a term rewriting system with rule priorities. The model in question is an upgrade of molecular dynamics. The upgrading is mainly concerned with the features to deal with values and the features to reclaim garbage.Comment: 48 pages, typos corrected, phrasing improved, definition of services replaced; presentation improved; presentation improved and appendix adde

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    Linking routinely collected social work, education and health data to enable monitoring of the health and health care of school-aged children in state care (‘looked after children’) in Scotland: a national demonstration project

    Get PDF
    Background and objectives: Children in state care (‘looked after children’) have poorer health than children who are not looked after. Recent developments in Scotland and elsewhere have aimed to improve services and outcomes for looked after children. Routine monitoring of the health outcomes of looked after children compared to those of their non-looked after peers is currently lacking. Developing capacity for comparative monitoring of population based outcomes based on linkage of routinely collected administrative data has been identified as a priority. To our knowledge there are no existing population based data linkage studies providing data on the health of looked after and non-looked after children at national level. Smaller scale studies that are available generally provide very limited information on linkage methods and hence do not allow scrutiny of bias that may be introduced through the linkage process. Study design and methods: National demonstration project testing the feasibility of linking routinely collected looked after children, education, and health data. Participants: All children in publicly funded school in Scotland in 2011/12. Results: Linkage between looked after children data and the national pupil census classified 10,009 (1.5%) and 1,757 (0.3%) of 670,952 children as, respectively, currently and previously looked after. Recording of the unique pupil identifier (Scottish Candidate Number, SCN) on looked after children returns is incomplete, with 66% of looked after records for 2011/12 for children of possible school age containing a valid SCN. This will have resulted in some under-ascertainment of currently and, particularly, previously looked after children within the general pupil population. Further linkage of the pupil census to the NHS Scotland master patient index demonstrated that a safe link to the child’s unique health service (Community Health Index, CHI) number could be obtained for a very high proportion of children in each group (94%, 95%, and 95% of children classified as currently, previously, and non-looked after respectively). In general linkage rates were higher for older children and those living in more affluent areas. Within the looked after group, linkage rates were highest for children with the fewest placements and for those in permanent fostering. Conclusions: This novel data linkage demonstrates the feasibility of monitoring population based health outcomes of school aged looked after and non-looked after children using linked routine administrative data. Improved recording of the unique pupil identifier number on looked after data returns would be beneficial. Extending the range of personal identifiers on looked after children returns would enable linkage to health data for looked after children who are not in publicly funded schooling (i.e. those who are pre- or post-school, home schooled, or in independent schooling)

    De novo construction of polyploid linkage maps using discrete graphical models

    Full text link
    Linkage maps are used to identify the location of genes responsible for traits and diseases. New sequencing techniques have created opportunities to substantially increase the density of genetic markers. Such revolutionary advances in technology have given rise to new challenges, such as creating high-density linkage maps. Current multiple testing approaches based on pairwise recombination fractions are underpowered in the high-dimensional setting and do not extend easily to polyploid species. We propose to construct linkage maps using graphical models either via a sparse Gaussian copula or a nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes, and the order of markers in each LG are determined by inferring the conditional independence relationships among large numbers of markers in the genome. Through simulations, we illustrate the utility of our map construction method and compare its performance with other available methods, both when the data are clean and contain no missing observations and when data contain genotyping errors and are incomplete. We apply the proposed method to two genotype datasets: barley and potato from diploid and polypoid populations, respectively. Our comprehensive map construction method makes full use of the dosage SNP data to reconstruct linkage map for any bi-parental diploid and polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure

    Spanning Trees and bootstrap reliability estimation in correlation based networks

    Get PDF
    We introduce a new technique to associate a spanning tree to the average linkage cluster analysis. We term this tree as the Average Linkage Minimum Spanning Tree. We also introduce a technique to associate a value of reliability to links of correlation based graphs by using bootstrap replicas of data. Both techniques are applied to the portfolio of the 300 most capitalized stocks traded at New York Stock Exchange during the time period 2001-2003. We show that the Average Linkage Minimum Spanning Tree recognizes economic sectors and sub-sectors as communities in the network slightly better than the Minimum Spanning Tree does. We also show that the average reliability of links in the Minimum Spanning Tree is slightly greater than the average reliability of links in the Average Linkage Minimum Spanning Tree.Comment: 17 pages, 3 figure
    corecore