104 research outputs found

    Machine-learning of atomic-scale properties based on physical principles

    Full text link
    We briefly summarize the kernel regression approach, as used recently in materials modelling, to fitting functions, particularly potential energy surfaces, and highlight how the linear algebra framework can be used to both predict and train from linear functionals of the potential energy, such as the total energy and atomic forces. We then give a detailed account of the Smooth Overlap of Atomic Positions (SOAP) representation and kernel, showing how it arises from an abstract representation of smooth atomic densities, and how it is related to several popular density-based representations of atomic structure. We also discuss recent generalisations that allow fine control of correlations between different atomic species, prediction and fitting of tensorial properties, and also how to construct structural kernels---applicable to comparing entire molecules or periodic systems---that go beyond an additive combination of local environments

    Arthritis of the base of the thumb

    Get PDF
    The purpose of this article is to outline the pathophysiology and epidemiology of arthritis of the base of the thumb. The usual presentation and diagnosis will be discussed along with the current conservative treatment options. Surgical treatment options are determined by the stage of the arthritis as well as the demands of the patient. The current standard surgical treatment options will be reviewed along with their results in the literature

    Randomization in Laboratory Procedure Is Key to Obtaining Reproducible Microarray Results

    Get PDF
    The quality of gene expression microarray data has improved dramatically since the first arrays were introduced in the late 1990s. However, the reproducibility of data generated at multiple laboratory sites remains a matter of concern, especially for scientists who are attempting to combine and analyze data from public repositories. We have carried out a study in which a common set of RNA samples was assayed five times in four different laboratories using Affymetrix GeneChip arrays. We observed dramatic differences in the results across laboratories and identified batch effects in array processing as one of the primary causes for these differences. When batch processing of samples is confounded with experimental factors of interest it is not possible to separate their effects, and lists of differentially expressed genes may include many artifacts. This study demonstrates the substantial impact of sample processing on microarray analysis results and underscores the need for randomization in the laboratory as a means to avoid confounding of biological factors with procedural effects

    Randomization in Laboratory Procedure Is Key to Obtaining Reproducible Microarray Results

    Get PDF
    The quality of gene expression microarray data has improved dramatically since the first arrays were introduced in the late 1990s. However, the reproducibility of data generated at multiple laboratory sites remains a matter of concern, especially for scientists who are attempting to combine and analyze data from public repositories. We have carried out a study in which a common set of RNA samples was assayed five times in four different laboratories using Affymetrix GeneChip arrays. We observed dramatic differences in the results across laboratories and identified batch effects in array processing as one of the primary causes for these differences. When batch processing of samples is confounded with experimental factors of interest it is not possible to separate their effects, and lists of differentially expressed genes may include many artifacts. This study demonstrates the substantial impact of sample processing on microarray analysis results and underscores the need for randomization in the laboratory as a means to avoid confounding of biological factors with procedural effects

    Calculation of partial isotope incorporation into peptides measured by mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Stable isotope probing (SIP) technique was developed to link function, structure and activity of microbial cultures metabolizing carbon and nitrogen containing substrates to synthesize their biomass. Currently, available methods are restricted solely to the estimation of fully saturated heavy stable isotope incorporation and convenient methods with sufficient accuracy are still missing. However in order to track carbon fluxes in microbial communities new methods are required that allow the calculation of partial incorporation into biomolecules.</p> <p>Results</p> <p>In this study, we use the characteristics of the so-called 'half decimal place rule' (HDPR) in order to accurately calculate the partial<sup>13</sup>C incorporation in peptides from enzymatic digested proteins. Due to the clade-crossing universality of proteins within bacteria, any available high-resolution mass spectrometry generated dataset consisting of tryptically-digested peptides can be used as reference.</p> <p>We used a freely available peptide mass dataset from <it>Mycobacterium tuberculosis </it>consisting of 315,579 entries. From this the error of estimated versus known heavy stable isotope incorporation from an increasing number of randomly drawn peptide sub-samples (100 times each; no repetition) was calculated. To acquire an estimated incorporation error of less than 5 atom %, about 100 peptide masses were needed. Finally, for testing the general applicability of our method, peptide masses of tryptically digested proteins from <it>Pseudomonas putida </it>ML2 grown on labeled substrate of various known concentrations were used and<sup>13</sup>C isotopic incorporation was successfully predicted. An easy-to-use script <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> was further developed to guide users through the calculation procedure for their own data series.</p> <p>Conclusion</p> <p>Our method is valuable for estimating<sup>13</sup>C incorporation into peptides/proteins accurately and with high sensitivity. Generally, our method holds promise for wider applications in qualitative and especially quantitative proteomics.</p

    Misty Mountain clustering: application to fast unsupervised flow cytometry gating

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 10<sup>6 </sup>points that are often generated by high throughput experiments.</p> <p>Results</p> <p>To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 10<sup>6 </sup>data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment.</p> <p>Conclusions</p> <p>Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data.</p

    Automated left ventricular diastolic function evaluation from phase-contrast cardiovascular magnetic resonance and comparison with Doppler echocardiography

    Get PDF
    International audienceBACKGROUND: Early detection of diastolic dysfunction is crucial for patients with incipient heart failure. Although this evaluation could be performed from phase-contrast (PC) cardiovascular magnetic resonance (CMR) data, its usefulness in clinical routine is not yet established, mainly because the interpretation of such data remains mostly based on manual post-processing. Accordingly, our goal was to develop a robust process to automatically estimate velocity and flow rate-related diastolic parameters from PC-CMR data and to test the consistency of these parameters against echocardiography as well as their ability to characterize left ventricular (LV) diastolic dysfunction. RESULTS: We studied 35 controls and 18 patients with severe aortic valve stenosis and preserved LV ejection fraction who had PC-CMR and Doppler echocardiography exams on the same day. PC-CMR mitral flow and myocardial velocity data were analyzed using custom software for semi-automated extraction of diastolic parameters. Inter-operator reproducibility of flow pattern segmentation and functional parameters was assessed on a sub-group of 30 subjects. The mean percentage of overlap between the transmitral flow segmentations performed by two independent operators was 99.7 ± 1.6%, resulting in a small variability ( 0.71) and receiver operating characteristic (ROC) analysis revealed their ability to separate patients from controls, with sensitivity > 0.80, specificity > 0.80 and accuracy > 0.85. Slight superiority in terms of correlation with echocardiography (r = 0.81) and accuracy to detect LV abnormalities (sensitivity > 0.83, specificity > 0.91 and accuracy > 0.89) was found for the PC-CMR flow-rate related parameters. CONCLUSIONS: A fast and reproducible technique for flow and myocardial PC-CMR data analysis was successfully used on controls and patients to extract consistent velocity-related diastolic parameters, as well as flow rate-related parameters. This technique provides a valuable addition to established CMR tools in the evaluation and the management of patients with diastolic dysfunction

    Genomic Data Reveal Toxoplasma gondii Differentiation Mutants Are Also Impaired with Respect to Switching into a Novel Extracellular Tachyzoite State

    Get PDF
    Toxoplasma gondii pathogenesis includes the invasion of host cells by extracellular parasites, replication of intracellular tachyzoites, and differentiation to a latent bradyzoite stage. We present the analysis of seven novel T. gondii insertional mutants that do not undergo normal differentiation to bradyzoites. Microarray quantification of the variation in genome-wide RNA levels for each parasite line and times after induction allowed us to describe states in the normal differentiation process, to analyze mutant lines in the context of these states, and to identify genes that may have roles in initiating the transition from tachyzoite to bradyzoite. Gene expression patterns in wild-type parasites undergoing differentiation suggest a novel extracellular state within the tachyzoite stage. All mutant lines exhibit aberrant regulation of bradyzoite gene expression and notably some of the mutant lines appear to exhibit high proportions of the intracellular tachyzoite state regardless of whether they are intracellular or extracellular. In addition to the genes identified by the insertional mutagenesis screen, mixture model analysis allowed us to identify a small number of genes, in mutants, for which expression patterns could not be accounted for using the three parasite states – genes that may play a mechanistic role in switching from the tachyzoite to bradyzoite stage
    corecore