3,248 research outputs found
Multiscale Dictionary Learning for Estimating Conditional Distributions
Nonparametric estimation of the conditional distribution of a response given
high-dimensional features is a challenging problem. It is important to allow
not only the mean but also the variance and shape of the response density to
change flexibly with features, which are massive-dimensional. We propose a
multiscale dictionary learning model, which expresses the conditional response
density as a convex combination of dictionary densities, with the densities
used and their weights dependent on the path through a tree decomposition of
the feature space. A fast graph partitioning algorithm is applied to obtain the
tree decomposition, with Bayesian methods then used to adaptively prune and
average over different sub-trees in a soft probabilistic manner. The algorithm
scales efficiently to approximately one million features. State of the art
predictive performance is demonstrated for toy examples and two neuroscience
applications including up to a million features
Nonparametric Bayes Modeling of Populations of Networks
Replicated network data are increasingly available in many research fields.
In connectomic applications, inter-connections among brain regions are
collected for each patient under study, motivating statistical models which can
flexibly characterize the probabilistic generative mechanism underlying these
network-valued data. Available models for a single network are not designed
specifically for inference on the entire probability mass function of a
network-valued random variable and therefore lack flexibility in characterizing
the distribution of relevant topological structures. We propose a flexible
Bayesian nonparametric approach for modeling the population distribution of
network-valued data. The joint distribution of the edges is defined via a
mixture model which reduces dimensionality and efficiently incorporates network
information within each mixture component by leveraging latent space
representations. The formulation leads to an efficient Gibbs sampler and
provides simple and coherent strategies for inference and goodness-of-fit
assessments. We provide theoretical results on the flexibility of our model and
illustrate improved performance --- compared to state-of-the-art models --- in
simulations and application to human brain networks
The Minimum Wiener Connector
The Wiener index of a graph is the sum of all pairwise shortest-path
distances between its vertices. In this paper we study the novel problem of
finding a minimum Wiener connector: given a connected graph and a set
of query vertices, find a subgraph of that connects all
query vertices and has minimum Wiener index.
We show that The Minimum Wiener Connector admits a polynomial-time (albeit
impractical) exact algorithm for the special case where the number of query
vertices is bounded. We show that in general the problem is NP-hard, and has no
PTAS unless . Our main contribution is a
constant-factor approximation algorithm running in time
.
A thorough experimentation on a large variety of real-world graphs confirms
that our method returns smaller and denser solutions than other methods, and
does so by adding to the query set a small number of important vertices
(i.e., vertices with high centrality).Comment: Published in Proceedings of the 2015 ACM SIGMOD International
Conference on Management of Dat
The Importance of DNA Repair in Tumor Suppression
The transition from a normal to cancerous cell requires a number of highly
specific mutations that affect cell cycle regulation, apoptosis,
differentiation, and many other cell functions. One hallmark of cancerous
genomes is genomic instability, with mutation rates far greater than those of
normal cells. In microsatellite instability (MIN tumors), these are often
caused by damage to mismatch repair genes, allowing further mutation of the
genome and tumor progression. These mutation rates may lie near the error
catastrophe found in the quasispecies model of adaptive RNA genomes, suggesting
that further increasing mutation rates will destroy cancerous genomes. However,
recent results have demonstrated that DNA genomes exhibit an error threshold at
mutation rates far lower than their conservative counterparts. Furthermore,
while the maximum viable mutation rate in conservative systems increases
indefinitely with increasing master sequence fitness, the semiconservative
threshold plateaus at a relatively low value. This implies a paradox, wherein
inaccessible mutation rates are found in viable tumor cells. In this paper, we
address this paradox, demonstrating an isomorphism between the conservatively
replicating (RNA) quasispecies model and the semiconservative (DNA) model with
post-methylation DNA repair mechanisms impaired. Thus, as DNA repair becomes
inactivated, the maximum viable mutation rate increases smoothly to that of a
conservatively replicating system on a transformed landscape, with an upper
bound that is dependent on replication rates. We postulate that inactivation of
post-methylation repair mechanisms are fundamental to the progression of a
tumor cell and hence these mechanisms act as a method for prevention and
destruction of cancerous genomes.Comment: 7 pages, 5 figures; Approximation replaced with exact calculation;
Minor error corrected; Minor changes to model syste
Accumulation of driver and passenger mutations during tumor progression
Major efforts to sequence cancer genomes are now occurring throughout the
world. Though the emerging data from these studies are illuminating, their
reconciliation with epidemiologic and clinical observations poses a major
challenge. In the current study, we provide a novel mathematical model that
begins to address this challenge. We model tumors as a discrete time branching
process that starts with a single driver mutation and proceeds as each new
driver mutation leads to a slightly increased rate of clonal expansion. Using
the model, we observe tremendous variation in the rate of tumor development -
providing an understanding of the heterogeneity in tumor sizes and development
times that have been observed by epidemiologists and clinicians. Furthermore,
the model provides a simple formula for the number of driver mutations as a
function of the total number of mutations in the tumor. Finally, when applied
to recent experimental data, the model allows us to calculate, for the first
time, the actual selective advantage provided by typical somatic mutations in
human tumors in situ. This selective advantage is surprisingly small, 0.005 +-
0.0005, and has major implications for experimental cancer research
STATISTICAL METHODS FOR THE ANALYSIS OF CANCER GENOME SEQUENCING DATA
The purpose of cancer genome sequencing studies is to determine the nature and types of alterations present in a typical cancer and to discover genes mutated at high frequencies. In this article we discuss statistical methods for the analysis of data generated in these studies. We place special emphasis on a two-stage study design introduced by Sjoblom et al.[1]. In this context, we describe statistical methods for constructing scores that can be used to prioritize candidate genes for further investigation and to assess the statistical signicance of the candidates thus identfied
Exact solution of a two-type branching process: Clone size distribution in cell division kinetics
We study a two-type branching process which provides excellent description of
experimental data on cell dynamics in skin tissue (Clayton et al., 2007). The
model involves only a single type of progenitor cell, and does not require
support from a self-renewed population of stem cells. The progenitor cells
divide and may differentiate into post-mitotic cells. We derive an exact
solution of this model in terms of generating functions for the total number of
cells, and for the number of cells of different types. We also deduce large
time asymptotic behaviors drawing on our exact results, and on an independent
diffusion approximation.Comment: 16 page
Mapping the spatiotemporal dynamics of calcium signaling in cellular neural networks using optical flow
An optical flow gradient algorithm was applied to spontaneously forming net-
works of neurons and glia in culture imaged by fluorescence optical microscopy
in order to map functional calcium signaling with single pixel resolution.
Optical flow estimates the direction and speed of motion of objects in an image
between subsequent frames in a recorded digital sequence of images (i.e. a
movie). Computed vector field outputs by the algorithm were able to track the
spatiotemporal dynamics of calcium signaling pat- terns. We begin by briefly
reviewing the mathematics of the optical flow algorithm, and then describe how
to solve for the displacement vectors and how to measure their reliability. We
then compare computed flow vectors with manually estimated vectors for the
progression of a calcium signal recorded from representative astrocyte
cultures. Finally, we applied the algorithm to preparations of primary
astrocytes and hippocampal neurons and to the rMC-1 Muller glial cell line in
order to illustrate the capability of the algorithm for capturing different
types of spatiotemporal calcium activity. We discuss the imaging requirements,
parameter selection and threshold selection for reliable measurements, and
offer perspectives on uses of the vector data.Comment: 23 pages, 5 figures. Peer reviewed accepted version in press in
Annals of Biomedical Engineerin
Recommended from our members
Multi-Probe Investigation of Proteomic Structure of Pathogens
Complete genome sequences are available for understanding biotransformation, environmental resistance and pathogenesis of microbial, cellular and pathogen systems. The present technological and scientific challenges are to unravel the relationships between the organization and function of protein complexes at cell, microbial and pathogens surfaces, to understand how these complexes evolve during the bacterial, cellular and pathogen life cycles, and how they respond to environmental changes, chemical stimulants and therapeutics. In particular, elucidating the molecular structure and architecture of human pathogen surfaces is essential to understanding mechanisms of pathogenesis, immune response, physicochemical interactions, environmental resistance and development of countermeasures against bioterrorist agents. The objective of this project was to investigate the architecture, proteomic structure, and function of bacterial spores through a combination of high-resolution in vitro atomic force microscopy (AFM) and AFM-based immunolabeling with threat-specific antibodies. Particular attention in this project was focused on spore forming Bacillus species including the Sterne vaccine strain of Bacillus anthracis and the spore forming near-neighbor of Clostridium botulinum, C. novyi-NT. Bacillus species, including B. anthracis, the causative agent of inhalation anthrax are laboratory models for elucidating spore structure/function. Even though the complete genome sequence is available for B. subtilis, cereus, anthracis and other species, the determination and composition of spore structure/function is not understood. Prof. B. Vogelstein and colleagues at the John Hopkins University have recently developed a breakthrough bacteriolytic therapy for cancer treatment (1). They discovered that intravenously injected Clostridium novyi-NT spores germinate exclusively within the avascular regions of tumors in mice and destroy advanced cancerous lesions. The bacteria were also found to significantly improve the efficacy of chemotherapeutic drugs and radiotherapy (2,3). Currently, there is no understanding of the structure-function relationships of Clostridium novyi-NT spores. As well as their therapeutic interest, studies of Clostridium noyii spores could provide a model for further studies of human pathogenic spore formers including Clostridium botulinum and Clostridium perfringens. This project involved a multi-institutional collaboration of our LLNL group with the groups of Prof. T.J. Leighton (Children's Hospital Oakland Research Institute) and Prof. B. Vogelstein (The Howard Hughes Medical Institute and the Ludwig Center for Cancer Genetics and Therapeutics at The John Hopkins Sidney Kimmel Comprehensive Cancer Center)
- …