227 research outputs found

    Propagation of charged particle waves in a uniform magnetic field

    Full text link
    This paper considers the probability density and current distributions generated by a point-like, isotropic source of monoenergetic charges embedded into a uniform magnetic field environment. Electron sources of this kind have been realized in recent photodetachment microscopy experiments. Unlike the total photocurrent cross section, which is largely understood, the spatial profiles of charge and current emitted by the source display an unexpected hierarchy of complex patterns, even though the distributions, apart from scaling, depend only on a single physical parameter. We examine the electron dynamics both by solving the quantum problem, i. e., finding the energy Green function, and from a semiclassical perspective based on the simple cyclotron orbits followed by the electron. Simulations suggest that the semiclassical method, which involves here interference between an infinite set of paths, faithfully reproduces the features observed in the quantum solution, even in extreme circumstances, and lends itself to an interpretation of some (though not all) of the rich structure exhibited in this simple problem.Comment: 39 pages, 16 figure

    Motif Discovery through Predictive Modeling of Gene Regulation

    Full text link
    We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a kk-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

    A genetic algorithm for interpretable model extraction from decision tree ensembles

    Get PDF
    Models obtained by decision tree induction techniques excel in being interpretable. However, they can be prone to overfitting, which results in a low predictive performance. Ensemble techniques provide a solution to this problem, and are hence able to achieve higher accuracies. However, this comes at a cost of losing the excellent interpretability of the resulting model, making ensemble techniques impractical in applications where decision support, instead of decision making, is crucial. To bridge this gap, we present the genesim algorithm that transforms an ensemble of decision trees into a single decision tree with an enhanced predictive performance while maintaining interpretability by using a genetic algorithm. We compared genesim to prevalent decision tree induction algorithms, ensemble techniques and a similar technique, called ism, using twelve publicly available data sets. The results show that genesim achieves better predictive performance on most of these data sets compared to decision tree induction techniques & ism. The results also show that genesim's predictive performance is in the same order of magnitude as the ensemble techniques. However, the resulting model of genesim outperforms the ensemble techniques regarding interpretability as it has a very low complexity

    Ballistic matter waves with angular momentum: Exact solutions and applications

    Full text link
    An alternative description of quantum scattering processes rests on inhomogeneous terms amended to the Schroedinger equation. We detail the structure of sources that give rise to multipole scattering waves of definite angular momentum, and introduce pointlike multipole sources as their limiting case. Partial wave theory is recovered for freely propagating particles. We obtain novel results for ballistic scattering in an external uniform force field, where we provide analytical solutions for both the scattering waves and the integrated particle flux. Our theory directly applies to p-wave photodetachment in an electric field. Furthermore, illustrating the effects of extended sources, we predict some properties of vortex-bearing atom laser beams outcoupled from a rotating Bose-Einstein condensate under the influence of gravity.Comment: 42 pages, 8 figures, extended version including photodetachment and semiclassical theor

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Evolution of Resistance to Targeted Anti-Cancer Therapies during Continuous and Pulsed Administration Strategies

    Get PDF
    The discovery of small molecules targeted to specific oncogenic pathways has revolutionized anti-cancer therapy. However, such therapy often fails due to the evolution of acquired resistance. One long-standing question in clinical cancer research is the identification of optimum therapeutic administration strategies so that the risk of resistance is minimized. In this paper, we investigate optimal drug dosing schedules to prevent, or at least delay, the emergence of resistance. We design and analyze a stochastic mathematical model describing the evolutionary dynamics of a tumor cell population during therapy. We consider drug resistance emerging due to a single (epi)genetic alteration and calculate the probability of resistance arising during specific dosing strategies. We then optimize treatment protocols such that the risk of resistance is minimal while considering drug toxicity and side effects as constraints. Our methodology can be used to identify optimum drug administration schedules to avoid resistance conferred by one (epi)genetic alteration for any cancer and treatment type

    Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data

    Get PDF
    BACKGROUND: A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. METHODOLOGY/PRINCIPAL FINDINGS: We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. CONCLUSIONS: Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide insights in erroneous and missed annotations

    A novel approach to the clustering of microarray data via nonparametric density estimation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations.</p> <p>Results</p> <p>Here, we present a general framework to deal with the clustering of microarray data, based on a three-step procedure: (i) gene filtering; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Via a nonparametric model-based clustering approach we obtain promising results both in simulated and real data.</p> <p>Conclusions</p> <p>The proposed algorithm is a simple and effective tool for the clustering of microarray data, in an unsupervised setting.</p

    Pompe disease diagnosis and management guideline

    Get PDF
    ACMG standards and guidelines are designed primarily as an educational resource for physicians and other health care providers to help them provide quality medical genetic services. Adherence to these standards and guidelines does not necessarily ensure a successful medical outcome. These standards and guidelines should not be considered inclusive of all proper procedures and tests or exclusive of other procedures and tests that are reasonably directed to obtaining the same results. in determining the propriety of any specific procedure or test, the geneticist should apply his or her own professional judgment to the specific clinical circumstances presented by the individual patient or specimen. It may be prudent, however, to document in the patient's record the rationale for any significant deviation from these standards and guidelines.Duke Univ, Med Ctr, Durham, NC 27706 USAOregon Hlth Sci Univ, Portland, OR 97201 USANYU, Sch Med, New York, NY USAUniv Florida, Coll Med, Powell Gene Therapy Ctr, Gainesville, FL 32611 USAIndiana Univ, Bloomington, in 47405 USAUniv Miami, Miller Sch Med, Coral Gables, FL 33124 USAHarvard Univ, Childrens Hosp, Sch Med, Cambridge, MA 02138 USAUniversidade Federal de São Paulo, São Paulo, BrazilColumbia Univ, New York, NY 10027 USANYU, Bellevue Hosp, Sch Med, New York, NY USAColumbia Univ, Med Ctr, New York, NY 10027 USAUniversidade Federal de São Paulo, São Paulo, BrazilWeb of Scienc

    Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work

    Get PDF
    One of the most critical problems we face in the study of biological systems is building accurate statistical descriptions of them. This problem has been particularly challenging because biological systems typically contain large numbers of interacting elements, which precludes the use of standard brute force approaches. Recently, though, several groups have reported that there may be an alternate strategy. The reports show that reliable statistical models can be built without knowledge of all the interactions in a system; instead, pairwise interactions can suffice. These findings, however, are based on the analysis of small subsystems. Here we ask whether the observations will generalize to systems of realistic size, that is, whether pairwise models will provide reliable descriptions of true biological systems. Our results show that, in most cases, they will not. The reason is that there is a crossover in the predictive power of pairwise models: If the size of the subsystem is below the crossover point, then the results have no predictive power for large systems. If the size is above the crossover point, the results do have predictive power. This work thus provides a general framework for determining the extent to which pairwise models can be used to predict the behavior of whole biological systems. Applied to neural data, the size of most systems studied so far is below the crossover point
    corecore