96 research outputs found

    Robust Online Hamiltonian Learning

    Get PDF
    In this work we combine two distinct machine learning methodologies, sequential Monte Carlo and Bayesian experimental design, and apply them to the problem of inferring the dynamical parameters of a quantum system. We design the algorithm with practicality in mind by including parameters that control trade-offs between the requirements on computational and experimental resources. The algorithm can be implemented online (during experimental data collection), avoiding the need for storage and post-processing. Most importantly, our algorithm is capable of learning Hamiltonian parameters even when the parameters change from experiment-to-experiment, and also when additional noise processes are present and unknown. The algorithm also numerically estimates the Cramer-Rao lower bound, certifying its own performance.Comment: 24 pages, 12 figures; to appear in New Journal of Physic

    A Statistically Rigorous Test for the Identification of Parent−Fragment Pairs in LC-MS Datasets

    Get PDF
    Untargeted global metabolic profiling by liquid chromato-graphy−mass spectrometry generates numerous signals that are due to unknown compounds and whose identification forms an important challenge. The analysis of metabolite fragmentation patterns, following collision-induced dissociation, provides a valuable tool for identification, but can be severely impeded by close chromatographic coelution of distinct metabolites. We propose a new algorithm for identifying related parent−fragment pairs and for distinguishing these from signals due to unrelated compounds. Unlike existing methods, our approach addresses the problem by means of a hypothesis test that is based on the distribution of the recorded ion counts, and thereby provides a statistically rigorous measure of the uncertainty involved in the classification problem. Because of technological constraints, the test is of primary use at low and intermediate ion counts, above which detector saturation causes substantial bias to the recorded ion count. The validity of the test is demonstrated through its application to pairs of coeluting isotopologues and to known parent−fragment pairs, which results in test statistics consistent with the null distribution. The performance of the test is compared with a commonly used Pearson correlation approach and found to be considerably better (e.g., false positive rate of 6.25%, compared with a value of 50% for the correlation for perfectly coeluting ions). Because the algorithm may be used for the analysis of high-mass compounds in addition to metabolic data, we expect it to facilitate the analysis of fragmentation patterns for a wide range of analytical problems

    Representative transcript sets for evaluating a translational initiation sites predictor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Translational initiation site (TIS) prediction is a very important and actively studied topic in bioinformatics. In order to complete a comparative analysis, it is desirable to have several benchmark data sets which can be used to test the effectiveness of different algorithms. An ideal benchmark data set should be reliable, representative and readily available. Preferably, proteins encoded by members of the data set should also be representative of the protein population actually expressed in cellular specimens.</p> <p>Results</p> <p>In this paper, we report a general algorithm for constructing a reliable sequence collection that only includes mRNA sequences whose corresponding protein products present an average profile of the general protein population of a given organism, with respect to three major structural parameters. Four representative transcript collections, each derived from a model organism, have been obtained following the algorithm we propose. Evaluation of these data sets shows that they are reasonable representations of the spectrum of proteins obtained from cellular proteomic studies. Six state-of-the-art predictors have been used to test the usefulness of the construction algorithm that we proposed. Comparative study which reports the predictors' performance on our data set as well as three other existing benchmark collections has demonstrated the actual merits of our data sets as benchmark testing collections.</p> <p>Conclusion</p> <p>The proposed data set construction algorithm has demonstrated its property of being a general and widely applicable scheme. Our comparison with published proteomic studies has shown that the expression of our data set of transcripts generates a polypeptide population that is representative of that obtained from evaluation of biological specimens. Our data set thus represents "real world" transcripts that will allow more accurate evaluation of algorithms dedicated to identification of TISs, as well as other translational regulatory motifs within mRNA sequences. The algorithm proposed by us aims at compiling a redundancy-free data set by removing redundant copies of homologous proteins. The existence of such data sets may be useful for conducting statistical analyses of protein sequence-structure relations. At the current stage, our approach's focus is to obtain an "average" protein data set for any particular organism without posing much selection bias. However, with the three major protein structural parameters deeply integrated into the scheme, it would be a trivial task to extend the current method for obtaining a more selective protein data set, which may facilitate the study of some particular protein structure.</p

    Effects of Sample Size on Estimates of Population Growth Rates Calculated with Matrix Models

    Get PDF
    BACKGROUND: Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (lambda) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of lambda-Jensen's Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of lambda due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of lambda. METHODOLOGY/PRINCIPAL FINDINGS: Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating lambda for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of lambda with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography. CONCLUSIONS/SIGNIFICANCE: We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities

    Assessment of Mechanism Exam Questions in the Second-Year Organic Chemistry Course Sequence

    Full text link
    This presentation was given at the 2018 Biennial Conference for Chemical Education (BCCE)

    Power Analysis for Interleaving Experiments by Means of Offline Evaluation

    Get PDF
    Evaluation in information retrieval takes one of two forms: collection-based offline evaluation, and in-situ online evaluation. Collections constructed by the former methodology are reusable, and hence able to test the effectiveness of any experimental algorithm, while the latter requires a different experiment for every new algorithm. Due to this a funnel approach is often being used, with experimental algorithms being compared to the baseline in an online experiment only if they outperform the baseline in an offline experiment. One of the key questions in the design of online and offline experiments concerns the number of measurements required to detect a statistically significant difference between two algorithms. Power analysis can provide an answer to this question, however, it requires an a-priori knowledge of the difference in effectiveness to be detected, and the variance in the measurements. The variance is typically estimated using historical data, but setting a detectable difference prior to the experiment can lead to suboptimal, upper-bound results. In this work we make use of the funnel approach in evaluation and test whether the difference in the effectiveness of two algorithms measured by the offline experiment can inform the required number of impression of an online interleaving experiment. Our analysis on simulated data shows that the number of impressions required are correlated with the difference in the offline experiment, but at the same time widely vary for any given difference
    corecore