227 research outputs found
Propagation of charged particle waves in a uniform magnetic field
This paper considers the probability density and current distributions
generated by a point-like, isotropic source of monoenergetic charges embedded
into a uniform magnetic field environment. Electron sources of this kind have
been realized in recent photodetachment microscopy experiments. Unlike the
total photocurrent cross section, which is largely understood, the spatial
profiles of charge and current emitted by the source display an unexpected
hierarchy of complex patterns, even though the distributions, apart from
scaling, depend only on a single physical parameter. We examine the electron
dynamics both by solving the quantum problem, i. e., finding the energy Green
function, and from a semiclassical perspective based on the simple cyclotron
orbits followed by the electron. Simulations suggest that the semiclassical
method, which involves here interference between an infinite set of paths,
faithfully reproduces the features observed in the quantum solution, even in
extreme circumstances, and lends itself to an interpretation of some (though
not all) of the rich structure exhibited in this simple problem.Comment: 39 pages, 16 figure
Motif Discovery through Predictive Modeling of Gene Regulation
We present MEDUSA, an integrative method for learning motif models of
transcription factor binding sites by incorporating promoter sequence and gene
expression data. We use a modern large-margin machine learning approach, based
on boosting, to enable feature selection from the high-dimensional search space
of candidate binding sequences while avoiding overfitting. At each iteration of
the algorithm, MEDUSA builds a motif model whose presence in the promoter
region of a gene, coupled with activity of a regulator in an experiment, is
predictive of differential expression. In this way, we learn motifs that are
functional and predictive of regulatory response rather than motifs that are
simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model
of the transcriptional control logic that can predict the expression of any
gene in the organism, given the sequence of the promoter region of the target
gene and the expression state of a set of known or putative transcription
factors and signaling molecules. Each motif model is either a -length
sequence, a dimer, or a PSSM that is built by agglomerative probabilistic
clustering of sequences with similar boosting loss. By applying MEDUSA to a set
of environmental stress response expression data in yeast, we learn motifs
whose ability to predict differential expression of target genes outperforms
motifs from the TRANSFAC dataset and from a previously published candidate set
of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed
binding sites associated with environmental stress response from the
literature.Comment: RECOMB 200
A genetic algorithm for interpretable model extraction from decision tree ensembles
Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial.
To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity
Ballistic matter waves with angular momentum: Exact solutions and applications
An alternative description of quantum scattering processes rests on
inhomogeneous terms amended to the Schroedinger equation. We detail the
structure of sources that give rise to multipole scattering waves of definite
angular momentum, and introduce pointlike multipole sources as their limiting
case. Partial wave theory is recovered for freely propagating particles. We
obtain novel results for ballistic scattering in an external uniform force
field, where we provide analytical solutions for both the scattering waves and
the integrated particle flux. Our theory directly applies to p-wave
photodetachment in an electric field. Furthermore, illustrating the effects of
extended sources, we predict some properties of vortex-bearing atom laser beams
outcoupled from a rotating Bose-Einstein condensate under the influence of
gravity.Comment: 42 pages, 8 figures, extended version including photodetachment and
semiclassical theor
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Evolution of Resistance to Targeted Anti-Cancer Therapies during Continuous and Pulsed Administration Strategies
The discovery of small molecules targeted to specific oncogenic pathways has revolutionized anti-cancer therapy. However, such therapy often fails due to the evolution of acquired resistance. One long-standing question in clinical cancer research is the identification of optimum therapeutic administration strategies so that the risk of resistance is minimized. In this paper, we investigate optimal drug dosing schedules to prevent, or at least delay, the emergence of resistance. We design and analyze a stochastic mathematical model describing the evolutionary dynamics of a tumor cell population during therapy. We consider drug resistance emerging due to a single (epi)genetic alteration and calculate the probability of resistance arising during specific dosing strategies. We then optimize treatment protocols such that the risk of resistance is minimal while considering drug toxicity and side effects as constraints. Our methodology can be used to identify optimum drug administration schedules to avoid resistance conferred by one (epi)genetic alteration for any cancer and treatment type
Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data
BACKGROUND: A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. METHODOLOGY/PRINCIPAL FINDINGS: We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. CONCLUSIONS: Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide insights in erroneous and missed annotations
A novel approach to the clustering of microarray data via nonparametric density estimation
<p>Abstract</p> <p>Background</p> <p>Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations.</p> <p>Results</p> <p>Here, we present a general framework to deal with the clustering of microarray data, based on a three-step procedure: (i) gene filtering; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Via a nonparametric model-based clustering approach we obtain promising results both in simulated and real data.</p> <p>Conclusions</p> <p>The proposed algorithm is a simple and effective tool for the clustering of microarray data, in an unsupervised setting.</p
Pompe disease diagnosis and management guideline
ACMG standards and guidelines are designed primarily as an educational resource for physicians and other health care providers to help them provide quality medical genetic services. Adherence to these standards and guidelines does not necessarily ensure a successful medical outcome. These standards and guidelines should not be considered inclusive of all proper procedures and tests or exclusive of other procedures and tests that are reasonably directed to obtaining the same results. in determining the propriety of any specific procedure or test, the geneticist should apply his or her own professional judgment to the specific clinical circumstances presented by the individual patient or specimen. It may be prudent, however, to document in the patient's record the rationale for any significant deviation from these standards and guidelines.Duke Univ, Med Ctr, Durham, NC 27706 USAOregon Hlth Sci Univ, Portland, OR 97201 USANYU, Sch Med, New York, NY USAUniv Florida, Coll Med, Powell Gene Therapy Ctr, Gainesville, FL 32611 USAIndiana Univ, Bloomington, in 47405 USAUniv Miami, Miller Sch Med, Coral Gables, FL 33124 USAHarvard Univ, Childrens Hosp, Sch Med, Cambridge, MA 02138 USAUniversidade Federal de São Paulo, São Paulo, BrazilColumbia Univ, New York, NY 10027 USANYU, Bellevue Hosp, Sch Med, New York, NY USAColumbia Univ, Med Ctr, New York, NY 10027 USAUniversidade Federal de São Paulo, São Paulo, BrazilWeb of Scienc
Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work
One of the most critical problems we face in the study of biological systems
is building accurate statistical descriptions of them. This problem has been
particularly challenging because biological systems typically contain large
numbers of interacting elements, which precludes the use of standard brute
force approaches. Recently, though, several groups have reported that there may
be an alternate strategy. The reports show that reliable statistical models can
be built without knowledge of all the interactions in a system; instead,
pairwise interactions can suffice. These findings, however, are based on the
analysis of small subsystems. Here we ask whether the observations will
generalize to systems of realistic size, that is, whether pairwise models will
provide reliable descriptions of true biological systems. Our results show
that, in most cases, they will not. The reason is that there is a crossover in
the predictive power of pairwise models: If the size of the subsystem is below
the crossover point, then the results have no predictive power for large
systems. If the size is above the crossover point, the results do have
predictive power. This work thus provides a general framework for determining
the extent to which pairwise models can be used to predict the behavior of
whole biological systems. Applied to neural data, the size of most systems
studied so far is below the crossover point
- …