11,562 research outputs found

    DART-ID increases single-cell proteome coverage.

    Get PDF
    Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

    Efficient isolation on Vero.DogSLAMtag cells and full genome characterization of Dolphin Morbillivirus (DMV) by next generation sequencing

    Get PDF
    The Dolphin Morbillivirus (DMV) genome from the frst Mediterranean epidemic (1990-\u201992) is the only cetacean Morbillivirus that has been completely sequenced. Here, we report the frst application of next generation sequencing (NGS) to morbillivirus infection of aquatic mammals. A viral isolate, representative of the 2006-\u201908 Mediterranean epidemic (DMV_IZSPLV_2008), efciently grew on Vero.DogSLAMtag cells and was submitted to whole genome characterization by NGS. The fnal genome length was 15,673 nucleotides, covering 99.82% of the DMV reference genome. Comparison of DMV_IZSPLV_2008 and 1990-\u201992 DMV strain sequences revealed 157 nucleotide mutations and 47 amino acid changes. The sequence similarity was 98.7% at the full genome level. Whole-genome phylogeny suggested that the DMV strain circulating during the 2006-\u201908 epidemics emerged from the 1990-\u201992 DMV strain. Viral isolation is considered the \u201cgold standard\u201d for morbillivirus diagnostics but efcient propagation of infectious virus is difcult to achieve. The successful cell replication of this strain allowed performing NGS directly from the viral RNA, without prior PCR amplifcation. We therefore provide to the scientifc community a second DMV genome, representative of another major outbreak. Interestingly, genome comparison revealed that the neglected L gene encompasses 74% of the genetic diversity and might serve as \u201chypervariable\u201d target for strain characterization

    Evaluation of protein surface roughness index using its heat denatured aggregates

    Get PDF
    Recent research works on potential of different protein surface describing parameters to predict protein surface properties gained significance for its possible implication in extracting clues on protein's functional site. In this direction, Surface Roughness Index, a surface topological parameter, showed its potential to predict SCOP-family of protein. The present work stands on the foundation of these works where a semi-empirical method for evaluation of Surface Roughness Index directly from its heat denatured protein aggregates (HDPA) was designed and demonstrated successfully. The steps followed consist, the extraction of a feature, Intensity Level Multifractal Dimension (ILMFD) from the microscopic images of HDPA, followed by the mapping of ILMFD into Surface Roughness Index (SRI) through recurrent backpropagation network (RBPN). Finally SRI for a particular protein was predicted by clustering of decisions obtained through feeding of multiple data into RBPN, to obtain general tendency of decision, as well as to discard the noisy dataset. The cluster centre of the largest cluster was found to be the best match for mapping of Surface Roughness Index of each protein in our study. The semi-empirical approach adopted in this paper, shows a way to evaluate protein's surface property without depending on its already evaluated structure

    Conservation and co-option in developmental programmes: the importance of homology relationships

    Get PDF
    One of the surprising insights gained from research in evolutionary developmental biology (evo-devo) is that increasing diversity in body plans and morphology in organisms across animal phyla are not reflected in similarly dramatic changes at the level of gene composition of their genomes. For instance, simplicity at the tissue level of organization often contrasts with a high degree of genetic complexity. Also intriguing is the observation that the coding regions of several genes of invertebrates show high sequence similarity to those in humans. This lack of change (conservation) indicates that evolutionary novelties may arise more frequently through combinatorial processes, such as changes in gene regulation and the recruitment of novel genes into existing regulatory gene networks (co-option), and less often through adaptive evolutionary processes in the coding portions of a gene. As a consequence, it is of great interest to examine whether the widespread conservation of the genetic machinery implies the same developmental function in a last common ancestor, or whether homologous genes acquired new developmental roles in structures of independent phylogenetic origin. To distinguish between these two possibilities one must refer to current concepts of phylogeny reconstruction and carefully investigate homology relationships. Particularly problematic in terms of homology decisions is the use of gene expression patterns of a given structure. In the future, research on more organisms other than the typical model systems will be required since these can provide insights that are not easily obtained from comparisons among only a few distantly related model species

    Segmenting DNA sequence into words based on statistical language model

    Get PDF
    This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach to word segmentation’ method to segment the DNA sequences. The benchmark of segmenting method is also proposed. In cross segmenting test, we find different genomes may use the similar language, but belong to different branches, just like the English and French/Latin. We present some possible applications of this method at last

    Complete genome sequence and taxonomic position of anguillid herpesvirus 1

    Get PDF
    Eel herpesvirus or anguillid herpesvirus 1 (AngHV1) frequently causes disease in freshwater eels. The complete genome sequence of AngHV1 and its taxonomic position within the family Alloherpesviridae were determined. Shotgun sequencing revealed a 249 kbp genome including an 11 kbp terminal direct repeat that contains 7 of the 136 predicted protein-coding open reading frames. Twelve of these genes are conserved among other members of the family Alloherpesviridae and another 28 genes have clear homologues in cyprinid herpesvirus 3. Phylogenetic analyses based on amino acid sequences of five conserved genes, including the ATPase subunit of the terminase, confirm the position of AngHV1 within the family Alloherpesviridae, where it is most closely related to the cyprinid herpesviruses. Our analyses support a recent proposal to subdivide the family Alloherpesviridae into two sister clades, one containing AngHV1 and the cyprinid herpesviruses and the other containing Ictalurid herpesvirus 1 and the ranid herpesviruses

    Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    Full text link
    Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold
    corecore