267 research outputs found

    apex: phylogenetics with multiple genes.

    Get PDF
    Genetic sequences of multiple genes are becoming increasingly common for a wide range of organisms including viruses, bacteria and eukaryotes. While such data may sometimes be treated as a single locus, in practice, a number of biological and statistical phenomena can lead to phylogenetic incongruence. In such cases, different loci should, at least as a preliminary step, be examined and analysed separately. The r software has become a popular platform for phylogenetics, with several packages implementing distance-based, parsimony and likelihood-based phylogenetic reconstruction, and an even greater number of packages implementing phylogenetic comparative methods. Unfortunately, basic data structures and tools for analysing multiple genes have so far been lacking, thereby limiting potential for investigating phylogenetic incongruence. In this study, we introduce the new r package apex to fill this gap. apex implements new object classes, which extend existing standards for storing DNA and amino acid sequences, and provides a number of convenient tools for handling, visualizing and analysing these data. In this study, we introduce the main features of the package and illustrate its functionalities through the analysis of a simple data set

    A case of adaptation through a mutation in a tandem duplication during experimental evolution in Escherichia coli

    Get PDF
    Background DNA duplications constitute important precursors for genome variation. Here we analyzed an unequal duplication harboring a beneficial mutation that may provide alternative evolutionary outcomes. Results We characterized this evolutionary event during experimental evolution for only 100 generations of an Escherichia coli strain under glucose limitation within chemostats. By combining Insertion Sequence based Restriction Length Polymorphism experiments, pulsed field gel electrophoresis and two independent genome re-sequencing experiments, we identified an evolved lineage carrying a 180 kb duplication of the 46? region of the E. coli chromosome. This evolved duplication revealed a heterozygous state, with one copy harboring a 2668 bp deletion that included part of the ogrK gene and both the yegR and yegS genes. By genetically manipulating ancestral and evolved strains, we showed that the single yegS inactivation was sufficient to confer a frequency dependent fitness increase under the chemostat selective conditions in both the ancestor and evolved genetic contexts, implying that the duplication itself was not a direct fitness contributor. Nonetheless, the heterozygous duplicated state was relatively stable in the conditions prevailing during evolution in chemostats, in striking contrast to non selective conditions in which the duplication resolved at high frequency into either its ancestral or deleted copy

    Bayesian optimization in ab initio nuclear physics

    Get PDF
    Theoretical models of the strong nuclear interaction contain unknown coupling constants (parameters) that must be determined using a pool of calibration data. In cases where the models are complex, leading to time consuming calculations, it is particularly challenging to systematically search the corresponding parameter domain for the best fit to the data. In this paper, we explore the prospect of applying Bayesian optimization to constrain the coupling constants in chiral effective field theory descriptions of the nuclear interaction. We find that Bayesian optimization performs rather well with low-dimensional parameter domains and foresee that it can be particularly useful for optimization of a smaller set of coupling constants. A specific example could be the determination of leading three-nucleon forces using data from finite nuclei or three-nucleon scattering experiments

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

    Get PDF
    Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

    Standardized Computer-Assisted Analysis of 5-hmC Immunoreactivity in Dysplastic Nevi and Superficial Spreading Melanomas

    Get PDF
    5-Hydroxymethylcytosine (5-hmC) is an important intermediate of DNA demethylation. Hypomethylation of DNA is frequent in cancer, resulting in deregulation of 5-hmC levels in melanoma. However, the interpretation of the intensity and distribution of 5-hmC immunoreactivity is not very standardized, which makes its interpretation difficult. In this study, 5-hmC-stained histological slides of superficial spreading melanomas (SSM) and dysplastic compound nevi (DN) were digitized and analyzed using the digital pathology and image platform QuPath. Receiver operating characteristic/area under the curve (ROCAUC) and t-tests were performed. A p-value of &lt;0.05 was used for statistical significance, and a ROCAUC score of &gt;0.8 was considered a “good” result. In total, 92 5-hmC-stained specimens were analyzed, including 42 SSM (45.7%) and 50 DN (54.3%). The mean of 5-hmC-positive cells/mm2 for the epidermis and dermo-epidermal junction and the entire lesion differed significantly between DN and SSM (p = 0.002 and p = 0.006, respectively) and showed a trend towards higher immunoreactivity in the dermal component (p = 0.069). The ROCAUC of 5-hmC-positive cells of the epidermis and dermo-epidermal junction was 0.79, for the dermis 0.74, and for the entire lesion 0.76. These results show that the assessment of the epidermal with junctional expression of 5-hmC is slightly superior to dermal immunoreactivity in distinguishing between DN and SSM.</p

    Fast MCMC sampling for hidden markov models to determine copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.</p> <p>Results</p> <p>We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.</p> <p>Conclusions</p> <p>We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.</p> <p><it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

    A Platform for Processing Expression of Short Time Series (PESTS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data.</p> <p>Results</p> <p>To this end, we introduce a platform for Processing Expression of Short Time Series (PESTS). It was designed with a focus on usability and interpretability of analyses for the researcher. As such, it implements several standard techniques for comparability as well as visualization functions. However, it is designed specifically for the unique methods we have developed for significance analysis, multiple test correction and clustering of short time series data. The central tenet of these methods is the use of biologically relevant features for analysis. Features summarize short gene expression profiles, inherently incorporate dependence across time, and allow for both full description of the examined curve and missing data points.</p> <p>Conclusions</p> <p>PESTS is fully generalizable to other types of time series analyses. PESTS implements novel methods as well as several standard techniques for comparability and visualization functions. These features and functionality make PESTS a valuable resource for a researcher's toolkit. PESTS is available to download for free to academic and non-profit users at <url>http://www.mailman.columbia.edu/academic-departments/biostatistics/research-service/software-development</url>.</p
    • 

    corecore