844 research outputs found

    Convolutional LSTM Networks for Subcellular Localization of Proteins

    Get PDF
    Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

    Subfamily specific conservation profiles for proteins based on n-gram patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on n-gram patterns (NP{<it>n,m</it>}) which are sets of <it>n </it>residues and <it>m </it>wildcards in windows of size <it>n+m</it>. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query.</p> <p>Results</p> <p>The new algorithm was used to construct 4,248 profiles from 120 randomly selected Pfam-A families. These were compared to profiles generated from multiple alignments using the consensus approach. The two profiles were similar whenever the subfamily associated with the query sequence was well represented in the multiple alignment. It was possible to construct subfamily specific conservation profiles using the new algorithm for subfamilies with as few as five members. The speed of the new algorithm was comparable to the multiple alignment approach.</p> <p>Conclusion</p> <p>Subfamily specific conservation profiles can be generated by the new algorithm without aprioi knowledge of family relationships or domain architecture. This is useful when the subfamily contains multiple domains with different levels of representation in protein databases. It may also be applicable when the subfamily sample size is too small for the multiple alignment approach.</p

    PRED_PPI: a server for predicting protein-protein interactions based on sequence data with probability assignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions (PPIs) are crucial for almost all cellular processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. Given the importance of PPIs, several methods have been developed to detect them. Since the experimental methods are time-consuming and expensive, developing computational methods for effectively identifying PPIs is of great practical significance.</p> <p>Findings</p> <p>Most previous methods were developed for predicting PPIs in only one species, and do not account for probability estimations. In this work, a relatively comprehensive prediction system was developed, based on a support vector machine (SVM), for predicting PPIs in five organisms, specifically humans, yeast, <it>Drosophila</it>, <it>Escherichia coli</it>, and <it>Caenorhabditis elegans</it>. This PPI predictor includes the probability of its prediction in the output, so it can be used to assess the confidence of each SVM prediction by the probability assignment. Using a probability of 0.5 as the threshold for assigning class labels, the method had an average accuracy for detecting protein interactions of 90.67% for humans, 88.99% for yeast, 90.09% for <it>Drosophila</it>, 92.73% for <it>E. coli</it>, and 97.51% for <it>C. elegans</it>. Moreover, among the correctly predicted pairs, more than 80% were predicted with a high probability of ≥0.8, indicating that this tool could predict novel PPIs with high confidence.</p> <p>Conclusions</p> <p>Based on this work, a web-based system, Pred_PPI, was constructed for predicting PPIs from the five organisms. Users can predict novel PPIs and obtain a probability value about the prediction using this tool. Pred_PPI is freely available at <url>http://cic.scu.edu.cn/bioinformatics/predict_ppi/default.html</url>.</p

    Random Lasing Action from Randomly Assembled ZnS Nanosheets

    Get PDF
    Lasing characteristics of randomly assembled ZnS nanosheets are studied at room temperature. Under 266-nm optical excitation, sharp lasing peaks emitted at around 332 nm with a linewidth less than 0.4 nm are observed in all directions. In addition, the dependence of lasing threshold intensity with the excitation area is shown in good agreement with the random laser theory. Hence, it is verified that the lasing characteristics of randomly assembled ZnS nanosheets are attributed to coherent random lasing action

    Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data

    Get PDF
    The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins

    Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria

    Get PDF
    Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found

    Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein

    Get PDF
    BACKGROUND: The largest open reading frame in the Saccharomyces genome encodes midasin (MDN1p, YLR106p), an AAA ATPase of 560 kDa that is essential for cell viability. Orthologs of midasin have been identified in the genome projects for Drosophila, Arabidopsis, and Schizosaccharomyces pombe. RESULTS: Midasin is present as a single-copy gene encoding a well-conserved protein of ~600 kDa in all eukaryotes for which data are available. In humans, the gene maps to 6q15 and encodes a predicted protein of 5596 residues (632 kDa). Sequence alignments of midasin from humans, yeast, Giardia and Encephalitozoon indicate that its domain structure comprises an N-terminal domain (35 kDa), followed by an AAA domain containing six tandem AAA protomers (~30 kDa each), a linker domain (260 kDa), an acidic domain (~70 kDa) containing 35–40% aspartate and glutamate, and a carboxy-terminal M-domain (30 kDa) that possesses MIDAS sequence motifs and is homologous to the I-domain of integrins. Expression of hemagglutamin-tagged midasin in yeast demonstrates a polypeptide of the anticipated size that is localized principally in the nucleus. CONCLUSIONS: The highly conserved structure of midasin in eukaryotes, taken in conjunction with its nuclear localization in yeast, suggests that midasin may function as a nuclear chaperone and be involved in the assembly/disassembly of macromolecular complexes in the nucleus. The AAA domain of midasin is evolutionarily related to that of dynein, but it appears to lack a microtubule-binding site

    A combined HM-PCR/SNuPE method for high sensitive detection of rare DNA methylation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA methylation changes are widely used as early molecular markers in cancer detection. Sensitive detection and classification of rare methylation changes in DNA extracted from circulating body fluids or complex tissue samples is crucial for the understanding of tumor etiology, clinical diagnosis and treatment. In this paper, we describe a combined method to monitor the presence of methylated tumor DNA in an excess of unmethylated background DNA of non-tumorous cells. The method combines heavy methyl-PCR, which favors preferential amplification of methylated marker sequence from bisulfite-treated DNA with a methylation-specific single nucleotide primer extension monitored by ion-pair, reversed-phase, high-performance liquid chromatography separation.</p> <p>Results</p> <p>This combined method allows detection of 14 pg (that is, four to five genomic copies) of methylated chromosomal DNA in a 2000-fold excess (that is, 50 ng) of unmethylated chromosomal background, with an analytical sensitivity of > 90%. We outline a detailed protocol for the combined assay on two examples of known cancer markers (SEPT9 and TMEFF2) and discuss general aspects of assay design and data interpretation. Finally, we provide an application example for rapid testing on tumor methylation in plasma DNA derived from a small cohort of patients with colorectal cancer.</p> <p>Conclusion</p> <p>The method allows unambiguous detection of rare DNA methylation, for example in body fluid or DNA isolates from cells or tissues, with very high sensitivity and accuracy. The application combines standard technologies and can easily be adapted to any target region of interest. It does not require costly reagents and can be used for routine screening of many samples.</p

    Purification of Nanoparticles by Size and Shape

    Get PDF
    Producing monodisperse nanoparticles is essential to ensure consistency in biological experiments and to enable a smooth translation into the clinic. Purification of samples into discrete sizes and shapes may not only improve sample quality, but also provide us with the tools to understand which physical properties of nanoparticles are beneficial for a drug delivery vector. In this study, using polymersomes as a model system, we explore four techniques for purifying pre-formed nanoparticles into discrete fractions based on their size, shape or density. We show that these techniques can successfully separate polymersomes into monodisperse fractions
    • …
    corecore