101,973 research outputs found

    A linked data representation for summary statistics and grouping criteria

    Get PDF
    Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium’s provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute

    Shrinkage and Variable Selection by Polytopes

    Get PDF
    Constrained estimators that enforce variable selection and grouping of highly correlated data have been shown to be successful in finding sparse representations and obtaining good performance in prediction. We consider polytopes as a general class of compact and convex constraint regions. Well established procedures like LASSO (Tibshirani, 1996) or OSCAR (Bondell and Reich, 2008) are shown to be based on specific subclasses of polytopes. The general framework of polytopes can be used to investigate the geometric structure that underlies these procedures. Moreover, we propose a specifically designed class of polytopes that enforces variable selection and grouping. Simulation studies and an application illustrate the usefulness of the proposed method

    Perceptual Abstraction for Robotic Cognitive Development

    Get PDF
    We are concerned with the design of a developmental robot that learns from scratch simple models about itself and its surroundings. A particular attention is given to perceptual abstraction from high-dimensional sensors

    Inferring Strategies for Sentence Ordering in Multidocument News Summarization

    Full text link
    The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies

    On Cross-Country Differences in the Persistence of Real Exchange Rates

    Get PDF
    Previous findings of long-run purchasing power parity come mainly from data for industrial countries, raising the issue of whether the results suffer sample-selection bias and exaggerate the general relevance of parity reversion. This study uncovers substantial cross-country heterogeneity in the persistence of deviations from parity. The results show that it is more likely, rather than less likely, to find parity reversion for developing countries than industrial countries. Although some persistence variations may partly reflect country differences in structural characteristics such as inflation experience and government spending, a considerable portion of those variations seems unaccounted for.Parity deviations, cross-country persistence differences, structural deterterminants

    Regionalization of landscape pattern indices using multivariate cluster analysis

    Get PDF
    This project was funded by the Government of Canada through the Mountain Pine Beetle Program, a six-year, $40 million program administered by Natural Resources Canada, Canadian Forest Service. Additional information on the Mountain Pine Beetle Program may be found at: http://mpb.cfs.nrcan.gc.ca.Regionalization, or the grouping of objects in space, is a useful tool for organizing, visualizing, and synthesizing the information contained in multivariate spatial data. Landscape pattern indices can be used to quantify the spatial pattern (composition and configuration) of land cover features. Observable patterns can be linked to underlying processes affecting the generation of landscape patterns (e.g., forest harvesting). The objective of this research is to develop an approach for investigating the spatial distribution of forest pattern across a study area where forest harvesting, other anthropogenic activities, and topography, are all influencing forest pattern. We generate spatial pattern regions (SPR) that describe forest pattern with a regionalization approach. Analysis is performed using a 2006 land cover dataset covering the Prince George and Quesnel Forest Districts, 5.5 million ha of primarily forested land base situated within the interior plateau of British Columbia, Canada. Multivariate cluster analysis (with the CLARA algorithm) is used to group landscape objects containing forest pattern information into SPR. Of the six generated SPR, the second cluster (SPR2) is the most prevalent covering 22% of the study area. On average, landscapes in SPR2 are comprised of 55.5% forest cover, and contain the highest number of patches, and forest/non-forest joins, indicating highly fragmented landscapes. Regionalization of landscape pattern metrics provides a useful approach for examining the spatial distribution of forest pattern. Where forest patterns are associated with positive or negative environmental conditions, SPR can be used to identify similar regions for conservation or management activities.PostprintPeer reviewe

    Cause for alarm?: A multi-national, multi-institutional study of student-generated software designs

    Get PDF
    This paper reports a multi-national, multi-institutional study to investigate Computer Science students' understanding of software design and software design criteria. Students were recruited at two levels: those termed 'first competency' programmers, and those completing their Bachelor degrees. The study, including participants from 21 institutions over the academic year 2003/4, aimed to examine students' ability to generate software designs, to elicit students' understanding and valuation of key design activities, and to examine whether students at different stages in their undergraduate education display different understanding of software design. Differences were found in participants' recognition of ambiguity in requirements; in their use of formal (and semi-formal) design representation and in their prioritisation of design criteria

    Haplotype affinities resolve a major component of goat (<i>Capra hircus</i>) MtDNA D-loop diversity and reveal specific features of the Sardinian stock

    Get PDF
    Goat mtDNA haplogroup A is a poorly resolved lineage absorbing most of the overall diversity and is found in locations as distant as Eastern Asia and Southern Africa. Its phylogenetic dissection would cast light on an important portion of the spread of goat breeding. The aims of this work were 1) to provide an operational definition of meaningful mtDNA units within haplogroup A, 2) to investigate the mechanisms underlying the maintenance of diversity by considering the modes of selection operated by breeders and 3) to identify the peculiarities of Sardinian mtDNA types. We sequenced the mtDNA D-loop in a large sample of animals (1,591) which represents a non-trivial quota of the entire goat population of Sardinia. We found that Sardinia mirrors a large quota of mtDNA diversity of Western Eurasia in the number of variable sites, their mutational pattern and allele frequency. By using Bayesian analysis, a distance-based tree and a network analysis, we recognized demographically coherent groups of sequences identified by particular subsets of the variable positions. The results showed that this assignment system could be reproduced in other studies, capturing the greatest part of haplotype diversity. We identified haplotype groups overrepresented in Sardinian goats as a result of founder effects. We found that breeders maintain diversity of matrilines most likely through equalization of the reproductive potential. Moreover, the relevant amount of inter-farm mtDNA diversity found does not increase proportionally with distance. Our results illustrate the effects of breeding practices on the composition of maternal gene pool and identify mtDNA types that may be considered in projects aimed at retrieving the maternal component of the oldest breeds of Sardinia.</br
    • …
    corecore