749,402 research outputs found
Data linkage algebra, data linkage dynamics, and priority rewriting
We introduce an algebra of data linkages. Data linkages are intended for
modelling the states of computations in which dynamic data structures are
involved. We present a simple model of computation in which states of
computations are modelled as data linkages and state changes take place by
means of certain actions. We describe the state changes and replies that result
from performing those actions by means of a term rewriting system with rule
priorities. The model in question is an upgrade of molecular dynamics. The
upgrading is mainly concerned with the features to deal with values and the
features to reclaim garbage.Comment: 48 pages, typos corrected, phrasing improved, definition of services
replaced; presentation improved; presentation improved and appendix adde
SLIM : Scalable Linkage of Mobility Data
We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup
Linking routinely collected social work, education and health data to enable monitoring of the health and health care of school-aged children in state care (‘looked after children’) in Scotland: a national demonstration project
Background and objectives: Children in state care (‘looked after children’) have poorer health than children who are not looked after. Recent developments in Scotland and elsewhere have aimed to improve services and outcomes for looked after children. Routine monitoring of the health outcomes of looked after children compared to those of their non-looked after peers is currently lacking. Developing capacity for comparative monitoring of population based outcomes based on linkage of routinely collected administrative data has been identified as a priority. To our knowledge there are no existing population based data linkage studies providing data on the health of looked after and non-looked after children at national level. Smaller scale studies that are available generally provide very limited information on linkage methods and hence do not allow scrutiny of bias that may be introduced through the linkage process. Study design and methods: National demonstration project testing the feasibility of linking routinely collected looked after children, education, and health data. Participants: All children in publicly funded school in Scotland in 2011/12. Results: Linkage between looked after children data and the national pupil census classified 10,009 (1.5%) and 1,757 (0.3%) of 670,952 children as, respectively, currently and previously looked after. Recording of the unique pupil identifier (Scottish Candidate Number, SCN) on looked after children returns is incomplete, with 66% of looked after records for 2011/12 for children of possible school age containing a valid SCN. This will have resulted in some under-ascertainment of currently and, particularly, previously looked after children within the general pupil population. Further linkage of the pupil census to the NHS Scotland master patient index demonstrated that a safe link to the child’s unique health service (Community Health Index, CHI) number could be obtained for a very high proportion of children in each group (94%, 95%, and 95% of children classified as currently, previously, and non-looked after respectively). In general linkage rates were higher for older children and those living in more affluent areas. Within the looked after group, linkage rates were highest for children with the fewest placements and for those in permanent fostering. Conclusions: This novel data linkage demonstrates the feasibility of monitoring population based health outcomes of school aged looked after and non-looked after children using linked routine administrative data. Improved recording of the unique pupil identifier number on looked after data returns would be beneficial. Extending the range of personal identifiers on looked after children returns would enable linkage to health data for looked after children who are not in publicly funded schooling (i.e. those who are pre- or post-school, home schooled, or in independent schooling)
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
Spanning Trees and bootstrap reliability estimation in correlation based networks
We introduce a new technique to associate a spanning tree to the average
linkage cluster analysis. We term this tree as the Average Linkage Minimum
Spanning Tree. We also introduce a technique to associate a value of
reliability to links of correlation based graphs by using bootstrap replicas of
data. Both techniques are applied to the portfolio of the 300 most capitalized
stocks traded at New York Stock Exchange during the time period 2001-2003. We
show that the Average Linkage Minimum Spanning Tree recognizes economic sectors
and sub-sectors as communities in the network slightly better than the Minimum
Spanning Tree does. We also show that the average reliability of links in the
Minimum Spanning Tree is slightly greater than the average reliability of links
in the Average Linkage Minimum Spanning Tree.Comment: 17 pages, 3 figure
- …
