1,132 research outputs found

    Desiderata for the development of next-generation electronic health record phenotype libraries

    Get PDF
    Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

    Generalized Geometry and M theory

    Full text link
    We reformulate the Hamiltonian form of bosonic eleven dimensional supergravity in terms of an object that unifies the three-form and the metric. For the case of four spatial dimensions, the duality group is manifest and the metric and C-field are on an equal footing even though no dimensional reduction is required for our results to hold. One may also describe our results using the generalized geometry that emerges from membrane duality. The relationship between the twisted Courant algebra and the gauge symmetries of eleven dimensional supergravity are described in detail.Comment: 29 pages of Latex, v2 References added, typos fixed, v3 corrected kinetic term and references adde

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Environmental Costs of Government-Sponsored Agrarian Settlements in Brazilian Amazonia

    Get PDF
    Brazil has presided over the most comprehensive agrarian reform frontier colonization program on Earth, in which ~1.2 million settlers have been translocated by successive governments since the 1970's, mostly into forested hinterlands of Brazilian Amazonia. These settlements encompass 5.3% of this ~5 million km2 region, but have contributed with 13.5% of all land conversion into agropastoral land uses. The Brazilian Federal Agrarian Agency (INCRA) has repeatedly claimed that deforestation in these areas largely predates the sanctioned arrival of new settlers. Here, we quantify rates of natural vegetation conversion across 1911 agrarian settlements allocated to 568 Amazonian counties and compare fire incidence and deforestation rates before and after the official occupation of settlements by migrant farmers. The timing and spatial distribution of deforestation and fires in our analysis provides irrefutable chronological and spatially explicit evidence of agropastoral conversion both inside and immediately outside agrarian settlements over the last decade. Deforestation rates are strongly related to local human population density and road access to regional markets. Agrarian settlements consistently accelerated rates of deforestation and fires, compared to neighboring areas outside settlements, but within the same counties. Relocated smallholders allocated to forest areas undoubtedly operate as pivotal agents of deforestation, and most of the forest clearance occurs in the aftermath of government-induced migration

    Coalition-structured governance improves cooperation to provide public goods

    Get PDF
    While the benefits of common and public goods are shared, they tend to be scarce when contributions are provided voluntarily. Failure to cooperate in the provision or preservation of these goods is fundamental to sustainability challenges, ranging from local fisheries to global climate change. In the real world, such cooperative dilemmas occur in multiple interactions with complex strategic interests and frequently without full information. We argue that voluntary cooperation enabled across overlapping coalitions (akin to polycentricity) not only facilitates a higher generation of non-excludable public goods, but it may also allow evolution toward a more cooperative, stable, and inclusive approach to governance. Contrary to any previous study, we show that these merits of multi-coalition governance are far more general than the singular examples occurring in the literature, and they are robust under diverse conditions of excludability, congestion of the non-excludable public good, and arbitrary shapes of the return-to-contribution function. We first confirm the intuition that a single coalition without enforcement and with players pursuing their self-interest without knowledge of returns to contribution is prone to cooperative failure. Next, we demonstrate that the same pessimistic model but with a multi-coalition structure of governance experiences relatively higher cooperation by enabling recognition of marginal gains of cooperation in the game at stake. In the absence of enforcement, public-goods regimes that evolve through a proliferation of voluntary cooperative forums can maintain and increase cooperation more successfully than singular, inclusive regimes.Supported by US Defense Advanced Research Projects Agency (D17AC00005), National Science Foundation grant GEO-1211972, and Fundacao para a Ciencia e Tecnologia (FCT) through grants PTDC/MAT/STA/3358/2014, PTDC/EEI-SII/5081/2014, and UID/BIA/04050/2013. P.M.H. was supported by the Walbridge Fund at the Princeton Environmental Institute

    GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.</p> <p>Results</p> <p>We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from <url>http://genesrf2.bioinfo.cnio.es</url>. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.</p> <p>Conclusion</p> <p>varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.</p

    A2 gene of Old World cutaneous Leishmania is a single highly conserved functional gene

    Get PDF
    BACKGROUND: Leishmaniases are among the most proteiform parasitic infections in humans ranging from unapparent to cutaneous, mucocutaneous or visceral diseases. The various clinical issues depend on complex and still poorly understood mechanisms where both host and parasite factors are interacting. Among the candidate factors of parasite virulence are the A2 genes, a family of multiple genes that are developmentally expressed in species of the Leishmania donovani group responsible for visceral diseases (VL). By contrast, in L. major determining cutaneous infections (CL) we showed that A2 genes are present in a truncated form only. Furthermore, the A2 genomic sequences of L. major were considered subsequently to represent non-expressed pseudogenes [1]. Consequently, it was suggested that the structural and functional properties of A2 genes could play a role in the differential tropism of CL and VL leishmanias. On this basis, it was of importance to determine whether the observed structural/functional particularities of the L. major A2 genes were shared by other CL Leishmania, therefore representing a proper characteristic of CL A2 genes as opposed to those of VL isolates. METHODS: In the present study we amplified by PCR and sequenced the A2 genes from genomic DNA and from clonal libraries of the four Old World CL species comparatively to a clonal population of L. infantum VL parasites. Using RT-PCR we also amplified and sequenced A2 mRNA transcripts from L. major. RESULTS: A unique A2 sequence was identified in Old World cutaneous Leishmania by sequencing. The shared sequence was highly conserved among the various CL strains and species analysed, showing a single polymorphism C/G at position 58. The CL A2 gene was found to be functionally transcribed at both parasite stages. CONCLUSION: The present study shows that cutaneous strains of leishmania share a conserved functional A2 gene. As opposed to the multiple A2 genes described in VL isolates, the CL A2 gene is unique, lacking most of the nucleotide repeats that constitute the variable region at the 5'end of the VL A2 sequences. As the variable region of the VL A2 gene has been shown to correspond to a portion of the protein which is highly immunogenic, the present results support the hypothesis of a possible role of the A2 gene in the differential tropism of CL and VL leishmania parasites

    Indirect Reciprocity under Incomplete Observation

    Get PDF
    Indirect reciprocity, in which individuals help others with a good reputation but not those with a bad reputation, is a mechanism for cooperation in social dilemma situations when individuals do not repeatedly interact with the same partners. In a relatively large society where indirect reciprocity is relevant, individuals may not know each other's reputation even indirectly. Previous studies investigated the situations where individuals playing the game have to determine the action possibly without knowing others' reputations. Nevertheless, the possibility that observers of the game, who generate the reputation of the interacting players, assign reputations without complete information about them has been neglected. Because an individual acts as an interacting player and as an observer on different occasions if indirect reciprocity is endogenously sustained in a society, the incompleteness of information may affect either role. We examine the game of indirect reciprocity when the reputations of players are not necessarily known to observers and to interacting players. We find that the trustful discriminator, which cooperates with good and unknown players and defects against bad players, realizes cooperative societies under seven social norms. Among the seven social norms, three of the four suspicious norms under which cooperation (defection) to unknown players leads to a good (bad) reputation enable cooperation down to a relatively small observation probability. In contrast, the three trustful norms under which both cooperation and defection to unknown players lead to a good reputation are relatively efficient
    • …
    corecore