42 research outputs found

    Evolutionary conservation of DNA-contact residues in DNA-binding domains

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA-binding proteins are of utmost importance to gene regulation. The identification of DNA-binding domains is useful for understanding the regulation mechanisms of DNA-binding proteins. In this study, we proposed a method to determine whether a domain or a protein can has DNA binding capability by considering evolutionary conservation of DNA-binding residues.</p> <p>Results</p> <p>Our method achieves high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, experimental results show that our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the use of evolutionary conservation of DNA-contact residues.</p> <p>Conclusion</p> <p>This study shows the conservation of DNA-contact residues in DNA-binding domains. We conclude that the members in the same subfamily bind DNA specifically and the members in different subfamilies often recognize different DNA targets. Additionally, we observe the co-evolution of DNA-contact residues and interacting DNA base-pairs.</p

    Species Identification on A Small Sample Size of RNA Sequences by Combined Method of Noise Filtering with L 2 -norm

    Get PDF
    Abstract. This paper proposed a noise filter with L 2 -norm distance method to design a classification of RNA sequences for the species identification, included of the small sample size of the nucleic acid sequence. This method amended and expanded the study by Hu et al. in 2011 [1]. We verified this method with the biological sample &quot;slipper orchids&quot; and its hybrid for biological species identification test. The result is showed that after applied our method, we can distinguish the paternity of a hybrid among a set of samples of &quot;slipper orchids&quot;

    Extracting the abstraction pyramid from complex networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>At present, the organization of system modules is typically limited to either a multilevel hierarchy that describes the "vertical" relationships between modules at different levels (e.g., module A at level two is included in module B at level one), or a single-level graph that represents the "horizontal" relationships among modules (e.g., genetic interactions between module A and module B). Both types of organizations fail to provide a broader and deeper view of the complex systems that arise from an integration of vertical and horizontal relationships.</p> <p>Results</p> <p>We propose a complex network analysis tool, Pyramabs, which was developed to integrate vertical and horizontal relationships and extract information at various granularities to create a pyramid from a complex system of interacting objects. The pyramid depicts the nested structure implied in a complex system, and shows the vertical relationships between abstract networks at different levels. In addition, at each level the abstract network of modules, which are connected by weighted links, represents the modules' horizontal relationships. We first tested Pyramabs on hierarchical random networks to verify its ability to find the module organization pre-embedded in the networks. We later tested it on a protein-protein interaction (PPI) network and a metabolic network. According to Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), the vertical relationships identified from the PPI and metabolic pathways correctly characterized the <it>inclusion </it>(i.e., <it>part-of</it>) relationship, and the horizontal relationships provided a good indication of the functional closeness between modules. Our experiments with Pyramabs demonstrated its ability to perform knowledge mining in complex systems.</p> <p>Conclusions</p> <p>Networks are a flexible and convenient method of representing interactions in a complex system, and an increasing amount of information in real-world situations is described by complex networks. We considered the analysis of a complex network as an iterative process for extracting meaningful information at multiple granularities from a system of interacting objects. The quality of the interpretation of the networks depends on the completeness and expressiveness of the extracted knowledge representations. Pyramabs was designed to interpret a complex network through a disclosure of a pyramid of abstractions. The abstraction pyramid is a new knowledge representation that combines vertical and horizontal viewpoints at different degrees of abstraction. Interpretations in this form are more accurate and more meaningful than multilevel dendrograms or single-level graphs. Pyramabs can be accessed at <url>http://140.113.166.165/pyramabs.php/</url>.</p

    Protein structure search and local structure characterization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.</p> <p>Results</p> <p>We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>.</p> <p>Conclusion</p> <p>The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p

    A Wrapper Approach for Constructive Induction

    No full text
    Inductive algorithms rely strongly on their representational biases. Representational inadequacy can be mitigated by constructive induction. This paper introduces the notion of a relative gain measure and describes a new constructive induction algorithm (GALA) which is independent of the learning algorithm. GALA generates a small number of new boolean attributes from existing boolean, nominal or real-valued attributes. Unlike most previous research on constructive induction, our methods are designed as preprocessing step before standard machine learning algorithms are applied. We present results which demonstrate the effectiveness of GALA on both artificial and real domains for both symbolic and subsymbolic learners. For symbolic learners, we used C4.5 and CN2. For subsymbolic learners, we used perceptron and backpropagation. In all cases, the GALA preprocessor increased the performance of the learning algorithm. 1 A Wrapper Approach for Constructive Induction Word Count..

    GalaII: Integrating the Construction of Boolean and Prototypical Features

    No full text
    We examine a new approach to constructive induction that com- bines Boolean feature construction with prototypical feature construc- tion. This particular representation language appears to be useful in many naturally occurring domains. Despite its representational power, the results are reasonably comprehensible. Also, the learning process is efficient. We present evidence for the effectiveness of this ap- proach on real and artificial domains. Through a series of systematic experiments, we partially characterize the generality of the approach

    The 3D conformation of the representative segment for each alphabet letter

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Protein structure search and local structure characterization"</p><p>http://www.biomedcentral.com/1471-2105/9/349</p><p>BMC Bioinformatics 2008;9():349-349.</p><p>Published online 22 Aug 2008</p><p>PMCID:PMC2529324.</p><p></p

    Legislative Documents

    No full text
    Also, variously referred to as: House bills; House documents; House legislative documents; legislative documents; General Court documents
    corecore