210 research outputs found

    Thermodynamically based DNA strand design

    Get PDF
    We describe a new algorithm for design of strand sets, for use in DNA computations or universal microarrays. Our algorithm can design sets that satisfy any of several thermodynamic and combinatorial constraints, which aim to maximize desired hybridizations between strands and their complements, while minimizing undesired cross-hybridizations. To heuristically search for good strand sets, our algorithm uses a conflict-driven stochastic local search approach, which is known to be effective in solving comparable search problems. The PairFold program of Andronescu et al. [M. Andronescu, Z. C. Zhang and A. Condon (2005) J. Mol. Biol., 345, 987–1001; M. Andronescu, R. Aguirre-Hernandez, A. Condon, and H. Hoos (2003) Nucleic Acids Res., 31, 3416–3422.] is used to calculate the minimum free energy of hybridization between two mismatched strands. We describe new thermodynamic measures of the quality of strand sets. With respect to these measures of quality, our algorithm consistently finds, within reasonable time, sets that are significantly better than previously published sets in the literature

    Excitonic Funneling in Extended Dendrimers with Non-Linear and Random Potentials

    Full text link
    The mean first passage time (MFPT) for photoexcitations diffusion in a funneling potential of artificial tree-like light-harvesting antennae (phenylacetylene dendrimers with generation-dependent segment lengths) is computed. Effects of the non-linearity of the realistic funneling potential and slow random solvent fluctuations considerably slow down the center-bound diffusion beyond a temperature-dependent optimal size. Diffusion on a disordered Cayley tree with a linear potential is investigated analytically. At low temperatures we predict a phase in which the MFPT is dominated by a few paths.Comment: 4 pages, 4 figures, To be published in Phys. Rev. Let

    Disorder and Funneling Effects on Exciton Migration in Tree-Like Dendrimers

    Full text link
    The center-bound excitonic diffusion on dendrimers subjected to several types of non-homogeneous funneling potentials, is considered. We first study the mean-first passage time (MFPT) for diffusion in a linear potential with different types of correlated and uncorrelated random perturbations. Increasing the funneling force, there is a transition from a phase in which the MFPT grows exponentially with the number of generations gg, to one in which it does so linearly. Overall the disorder slows down the diffusion, but the effect is much more pronounced in the exponential compared to the linear phase. When the disorder gives rise to uncorrelated random forces there is, in addition, a transition as the temperature TT is lowered. This is a transition from a high-TT regime in which all paths contribute to the MFPT to a low-TT regime in which only a few of them do. We further explore the funneling within a realistic non-linear potential for extended dendrimers in which the dependence of the lowest excitonic energy level on the segment length was derived using the Time-Dependent Hatree-Fock approximation. Under this potential the MFPT grows initially linearly with gg but crosses-over, beyond a molecular-specific and TT-dependent optimal size, to an exponential increase. Finally we consider geometrical disorder in the form of a small concentration of long connections as in the {\it small world} model. Beyond a critical concentration of connections the MFPT decreases significantly and it changes to a power-law or to a logarithmic scaling with gg, depending on the strength of the funneling force.Comment: 13 pages, 9 figure

    Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction

    Get PDF
    Suicide risk prediction models can identify individuals for targeted intervention. Discussions of transparency, explainability, and transportability in machine learning presume complex prediction models with many variables outperform simpler models. We compared random forest, artificial neural network, and ensemble models with 1500 temporally defined predictors to logistic regression models. Data from 25,800,888 mental health visits made by 3,081,420 individuals in 7 health systems were used to train and evaluate suicidal behavior prediction models. Model performance was compared across several measures. All models performed well (area under the receiver operating curve [AUC]: 0.794-0.858). Ensemble models performed best, but improvements over a regression model with 100 predictors were minimal (AUC improvements: 0.006-0.020). Results are consistent across performance metrics and subgroups defined by race, ethnicity, and sex. Our results suggest simpler parametric models, which are easier to implement as part of routine clinical practice, perform comparably to more complex machine learning methods

    Identification and Quantification of Proteoforms by Mass Spectrometry

    Get PDF
    A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, we outline some recent advances and discuss current challenges and future directions for the field

    A thermodynamic approach to designing structure-free combinatorial DNA word sets

    Get PDF
    An algorithm is presented for the generation of sets of non-interacting DNA sequences, employing existing thermodynamic models for the prediction of duplex stabilities and secondary structures. A DNA ‘word’ structure is employed in which individual DNA ‘words’ of a given length (e.g. 12mer and 16mer) may be concatenated into longer sequences (e.g. four tandem words and six tandem words). This approach, where multiple word variants are used at each tandem word position, allows very large sets of non-interacting DNA strands to be assembled from combinations of the individual words. Word sets were generated and their figures of merit are compared to sets as described previously in the literature (e.g. 4, 8, 12, 15 and 16mer). The predicted hybridization behavior was experimentally verified on selected members of the sets using standard UV hyperchromism measurements of duplex melting temperatures (T(m)s). Additional experimental validation was obtained by using the sequences in formulating and solving a small example of a DNA computing problem

    Enhanced protein isoform characterization through long-read proteogenomics

    Get PDF
    [Background] The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms.[Results] We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis.[Conclusions] Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.This work was supported by a National Institutes of Health (NIH) grant R35GM142647 (G.M.S.), NIH grant R35GM126914 (L.M.S.), and Jackson Laboratory (A.D.M.). The codeathon which initiated the project was supported by the NIH STRIDES Initiative at the NIH.Peer reviewe
    • …
    corecore