2,647 research outputs found
Removing the influence of a group variable in high-dimensional predictive modelling
In many application areas, predictive models are used to support or make
important decisions. There is increasing awareness that these models may
contain spurious or otherwise undesirable correlations. Such correlations may
arise from a variety of sources, including batch effects, systematic
measurement errors, or sampling bias. Without explicit adjustment, machine
learning algorithms trained using these data can produce poor out-of-sample
predictions which propagate these undesirable correlations. We propose a method
to pre-process the training data, producing an adjusted dataset that is
statistically independent of the nuisance variables with minimum information
loss. We develop a conceptually simple approach for creating an adjusted
dataset in high-dimensional settings based on a constrained form of matrix
decomposition. The resulting dataset can then be used in any predictive
algorithm with the guarantee that predictions will be statistically independent
of the group variable. We develop a scalable algorithm for implementing the
method, along with theory support in the form of independence guarantees and
optimality. The method is illustrated on some simulation examples and applied
to two case studies: removing machine-specific correlations from brain scan
data, and removing race and ethnicity information from a dataset used to
predict recidivism. That the motivation for removing undesirable correlations
is quite different in the two applications illustrates the broad applicability
of our approach.Comment: Update. 18 pages, 3 figure
Assessing the prognosis of dengue-infected patients
Dengue infections pose a huge burden to health care providers in most tropical countries. Careful clinical examination and history-taking supplemented by newer rapid diagnostic tests may lead to early etiological diagnosis. For severe dengue, early recognition of vascular permeability followed by rapid physiological replacement of fluids is life-saving. Prognosis of patients depends upon optimum management, an outcome that requires preparation via organization, training, and use of evidence-based practice guidelines
Virus Propagation in Multiple Profile Networks
Suppose we have a virus or one competing idea/product that propagates over a
multiple profile (e.g., social) network. Can we predict what proportion of the
network will actually get "infected" (e.g., spread the idea or buy the
competing product), when the nodes of the network appear to have different
sensitivity based on their profile? For example, if there are two profiles
and in a network and the nodes of profile
and profile are susceptible to a highly spreading
virus with probabilities and
respectively, what percentage of both profiles will actually get infected from
the virus at the end? To reverse the question, what are the necessary
conditions so that a predefined percentage of the network is infected? We
assume that nodes of different profiles can infect one another and we prove
that under realistic conditions, apart from the weak profile (great
sensitivity), the stronger profile (low sensitivity) will get infected as well.
First, we focus on cliques with the goal to provide exact theoretical results
as well as to get some intuition as to how a virus affects such a multiple
profile network. Then, we move to the theoretical analysis of arbitrary
networks. We provide bounds on certain properties of the network based on the
probabilities of infection of each node in it when it reaches the steady state.
Finally, we provide extensive experimental results that verify our theoretical
results and at the same time provide more insight on the problem
Genome-wide landscape of alternative splicing events in brachypodium distachyon
Recently, Brachypodium distachyon has emerged as a model plant for studying monocot grasses and cereal crops. Using assembled expressed transcript sequences and subsequent mapping to the corresponding genome, we identified 1219 alternative splicing (AS) events spanning across 2021 putatively assembled transcripts generated from 941 genes. Approximately, 6.3% of expressed genes are alternatively spliced in B. distachyon. We observed that a majority of the identified AS events were related to retained introns (55.5%), followed by alternative acceptor sites (16.7%).We also observed a low percentage of exon skipping (5.0%) and alternative donor site events (8.8%). The 'complex event' that consists of a combination of two or more basic splicing events accounted for ~14.0%. Comparative AS transcript analysis revealed 163 and 39 homologous pairs between B. distachyon and Oryza sativa and between B. distachyon and Arabidopsis thaliana, respectively. In all, we found 16 AS transcripts to be conserved in all 3 species. AS events and related putative assembled transcripts annotation can be systematically browsed at Plant Alternative Splicing Database (http://proteomics.ysu.edu/altsplice/plant/). © The Author 2012
Interactive Multi-volume Visualization
Abstract. This paper is concerned with simultaneous visualization of two or more volumes, which may be from different imaging modalities or numerical simulations for the same subject of study. The main visualization challenge is to establish visual correspondences while maintaining distinctions among multiple volumes. One solution is to use different rendering styles for different volumes. Interactive rendering is required so the user can choose with ease an appropriate rendering style and its associated parameters for each volume. Rendering effi-ciency is maximized by utilizing commodity graphics cards. We demonstrate our preliminary results with two case studies.
Removing the influence of a group variable in high-dimensional predictive modelling
In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive algorithm with the guarantee that predictions will be statistically independent of the nuisance variables. We develop a scalable algorithm for implementing the method, along with theory support in the form of independence guarantees and optimality. The method is illustrated on some simulation examples and applied to two case studies: removing machine-specific correlations from brain scan data, and removing ethnicity information from a dataset used to predict recidivism. That the motivation for removing undesirable correlations is quite different in the two applications illustrates the broad applicability of our approach
Arc-Smooth Continua
Continua admitting arc-structures and arc-smooth continua are introduced as higher dimensional analogues of dendroids and smooth dendroids, respectively. These continua include such spaces as: cones over compacta, convex continua in l2, strongly convex metric continua, injectively metrizable continua, as well as various topological semigroups, partially ordered spaces, and hyperspaces. The arc-smooth continua are shown to coincide with the freely contractible continua and with the metric K-spaces of Stadtlander. Known characterizations of smoothness in dendroids involving closed partial orders, the set function T, radially convex metrics, continuous selections, and order preserving mappings are extended to the setting of continua with arc-structures. Various consequences of the special contractibility properties of arc-smooth continua are also obtained
- …