2,647 research outputs found

    Removing the influence of a group variable in high-dimensional predictive modelling

    Full text link
    In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors, or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce poor out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive algorithm with the guarantee that predictions will be statistically independent of the group variable. We develop a scalable algorithm for implementing the method, along with theory support in the form of independence guarantees and optimality. The method is illustrated on some simulation examples and applied to two case studies: removing machine-specific correlations from brain scan data, and removing race and ethnicity information from a dataset used to predict recidivism. That the motivation for removing undesirable correlations is quite different in the two applications illustrates the broad applicability of our approach.Comment: Update. 18 pages, 3 figure

    Assessing the prognosis of dengue-infected patients

    Get PDF
    Dengue infections pose a huge burden to health care providers in most tropical countries. Careful clinical examination and history-taking supplemented by newer rapid diagnostic tests may lead to early etiological diagnosis. For severe dengue, early recognition of vascular permeability followed by rapid physiological replacement of fluids is life-saving. Prognosis of patients depends upon optimum management, an outcome that requires preparation via organization, training, and use of evidence-based practice guidelines

    Virus Propagation in Multiple Profile Networks

    Full text link
    Suppose we have a virus or one competing idea/product that propagates over a multiple profile (e.g., social) network. Can we predict what proportion of the network will actually get "infected" (e.g., spread the idea or buy the competing product), when the nodes of the network appear to have different sensitivity based on their profile? For example, if there are two profiles A\mathcal{A} and B\mathcal{B} in a network and the nodes of profile A\mathcal{A} and profile B\mathcal{B} are susceptible to a highly spreading virus with probabilities βA\beta_{\mathcal{A}} and βB\beta_{\mathcal{B}} respectively, what percentage of both profiles will actually get infected from the virus at the end? To reverse the question, what are the necessary conditions so that a predefined percentage of the network is infected? We assume that nodes of different profiles can infect one another and we prove that under realistic conditions, apart from the weak profile (great sensitivity), the stronger profile (low sensitivity) will get infected as well. First, we focus on cliques with the goal to provide exact theoretical results as well as to get some intuition as to how a virus affects such a multiple profile network. Then, we move to the theoretical analysis of arbitrary networks. We provide bounds on certain properties of the network based on the probabilities of infection of each node in it when it reaches the steady state. Finally, we provide extensive experimental results that verify our theoretical results and at the same time provide more insight on the problem

    Genome-wide landscape of alternative splicing events in brachypodium distachyon

    Get PDF
    Recently, Brachypodium distachyon has emerged as a model plant for studying monocot grasses and cereal crops. Using assembled expressed transcript sequences and subsequent mapping to the corresponding genome, we identified 1219 alternative splicing (AS) events spanning across 2021 putatively assembled transcripts generated from 941 genes. Approximately, 6.3% of expressed genes are alternatively spliced in B. distachyon. We observed that a majority of the identified AS events were related to retained introns (55.5%), followed by alternative acceptor sites (16.7%).We also observed a low percentage of exon skipping (5.0%) and alternative donor site events (8.8%). The 'complex event' that consists of a combination of two or more basic splicing events accounted for ~14.0%. Comparative AS transcript analysis revealed 163 and 39 homologous pairs between B. distachyon and Oryza sativa and between B. distachyon and Arabidopsis thaliana, respectively. In all, we found 16 AS transcripts to be conserved in all 3 species. AS events and related putative assembled transcripts annotation can be systematically browsed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/plant/). © The Author 2012

    Interactive Multi-volume Visualization

    Get PDF
    Abstract. This paper is concerned with simultaneous visualization of two or more volumes, which may be from different imaging modalities or numerical simulations for the same subject of study. The main visualization challenge is to establish visual correspondences while maintaining distinctions among multiple volumes. One solution is to use different rendering styles for different volumes. Interactive rendering is required so the user can choose with ease an appropriate rendering style and its associated parameters for each volume. Rendering effi-ciency is maximized by utilizing commodity graphics cards. We demonstrate our preliminary results with two case studies.

    Removing the influence of a group variable in high-dimensional predictive modelling

    Get PDF
    In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive algorithm with the guarantee that predictions will be statistically independent of the nuisance variables. We develop a scalable algorithm for implementing the method, along with theory support in the form of independence guarantees and optimality. The method is illustrated on some simulation examples and applied to two case studies: removing machine-specific correlations from brain scan data, and removing ethnicity information from a dataset used to predict recidivism. That the motivation for removing undesirable correlations is quite different in the two applications illustrates the broad applicability of our approach

    Arc-Smooth Continua

    Get PDF
    Continua admitting arc-structures and arc-smooth continua are introduced as higher dimensional analogues of dendroids and smooth dendroids, respectively. These continua include such spaces as: cones over compacta, convex continua in l2, strongly convex metric continua, injectively metrizable continua, as well as various topological semigroups, partially ordered spaces, and hyperspaces. The arc-smooth continua are shown to coincide with the freely contractible continua and with the metric K-spaces of Stadtlander. Known characterizations of smoothness in dendroids involving closed partial orders, the set function T, radially convex metrics, continuous selections, and order preserving mappings are extended to the setting of continua with arc-structures. Various consequences of the special contractibility properties of arc-smooth continua are also obtained
    • …
    corecore