10 research outputs found

    A Predictive Model for Secondary RNA Structure Using Graph Theory and a Neural Network

    Get PDF
    Background: Determining the secondary structure of RNA from the primary structure is a challenging computational problem. A number of algorithms have been developed to predict the secondary structure from the primary structure. It is agreed that there is still room for improvement in each of these approaches. In this work we build a predictive model for secondary RNA structure using a graph-theoretic tree representation of secondary RNA structure. We model the bonding of two RNA secondary structures to form a larger secondary structure with a graph operation we call merge. We consider all combinatorial possibilities using all possible tree inputs, both those that are RNA-like in structure and those that are not. The resulting data from each tree merge operation is represented by a vector. We use these vectors as input values for a neural network and train the network to recognize a tree as RNA-like or not, based on the merge data vector. The network estimates the probability of a tree being RNA-like.Results: The network correctly assigned a high probability of RNA-likeness to trees previously identified as RNA-like and a low probability of RNA-likeness to those classified as not RNA-like. We then used the neural network to predict the RNA-likeness of the unclassified trees.Conclusions: There are a number of secondary RNA structure prediction algorithms available online. These programs are based on finding the secondary structure with the lowest total free energy. In this work, we create a predictive tool for secondary RNA structures using graph-theoretic values as input for a neural network. The use of a graph operation to theoretically describe the bonding of secondary RNA is novel and is an entirely different approach to the prediction of secondary RNA structures. Our method correctly predicted trees to be RNA-like or not RNA-like for all known cases. In addition, our results convey a measure of likelihood that a tree is RNA-like or not RNA-like. Given that the majority of secondary RNA folding algorithms return more than one possible outcome, our method provides a means of determining the best or most likely structures among all of the possible outcomes

    A predictive model for secondary RNA structure using graph theory and a neural network

    Get PDF
    Background: Determining the secondary structure of RNA from the primary structure is a challenging computational problem. A number of algorithms have been developed to predict the secondary structure from the primary structure. It is agreed that there is still room for improvement in each of these approaches. In this work we build a predictive model for secondary RNA structure using a graph-theoretic tree representation of secondary RNA structure. We model the bonding of two RNA secondary structures to form a larger secondary structure with a graph operation we call merge. We consider all combinatorial possibilities using all possible tree inputs, both those that are RNA-like in structure and those that are not. The resulting data from each tree merge operation is represented by a vector. We use these vectors as input values for a neural network and train the network to recognize a tree as RNA-like or not, based on the merge data vector. The network estimates the probability of a tree being RNA-like.Results: The network correctly assigned a high probability of RNA-likeness to trees previously identified as RNA-like and a low probability of RNA-likeness to those classified as not RNA-like. We then used the neural network to predict the RNA-likeness of the unclassified trees.Conclusions: There are a number of secondary RNA structure prediction algorithms available online. These programs are based on finding the secondary structure with the lowest total free energy. In this work, we create a predictive tool for secondary RNA structures using graph-theoretic values as input for a neural network. The use of a graph operation to theoretically describe the bonding of secondary RNA is novel and is an entirely different approach to the prediction of secondary RNA structures. Our method correctly predicted trees to be RNA-like or not RNA-like for all known cases. In addition, our results convey a measure of likelihood that a tree is RNA-like or not RNA-like. Given that the majority of secondary RNA folding algorithms return more than one possible outcome, our method provides a means of determining the best or most likely structures among all of the possible outcomes

    Elliptic Curve Cryptography: Making the Simple Complex

    No full text
    The fast paced development of new technologies continues to advance the newest security measures and innovative cryptographic methods. As a result, the intention of this thesis is to perform an in-depth study of the most contemporary procedure available to the general public at the time of this study: ElGamal Elliptic Curve Cryptography. This modern form of encryption merges the notion of public key cryptography with some of the most powerful mathematical proceduresknown to this field. In order to adequately establish the mathematical foundation for this applied topic, the key topics for upcoming chapters will include Cryptography, Number Theory, Abstract Algebra and elliptic curves. Subsequently, this study will merge these four topics to properly establish the procedure of elliptic curve cryptography. As a supplement to the procedures and computations introduced during the body of this thesis, Appendix A steps through the mathematical processes of the entire ElGamal cryptosystem using code generated in Maple R . Despite increasing popularity, the high level of theoretical mathematics involved in elliptic curve cryptography prevents its fast adoption. Therefore, this thesis serves as an introductory level explanation of the mathematical procedures behind the implementation of ElGamal Elliptic Curve Cryptography

    The Effects of Tabular-Based Content Extraction on Patent Document Clustering

    No full text
    Data can be represented in many different ways within a particular document or set of documents. Hence, attempts to automatically process the relationships between documents or determine the relevance of certain document objects can be problematic. In this study, we have developed software to automatically catalog objects contained in HTML files for patents granted by the United States Patent and Trademark Office (USPTO). Once these objects are recognized, the software creates metadata that assigns a data type to each document object. Such metadata can be easily processed and analyzed for subsequent text mining tasks. Specifically, document similarity and clustering techniques were applied to a subset of the USPTO document collection. Although our preliminary results demonstrate that tables and numerical data do not provide quantifiable value to a document’s content, the stage for future work in measuring the importance of document objects within a large corpus has been set

    Building better interdisciplinary scientists: creating graduate level courses to address the communication gap in interdisciplinary research

    Get PDF
    Background The SCALE-IT (Scalable Computing and Leading Edge Innovative Technologies) program at the University of Tennessee-Knoxville is one of an increasing number of programs at institutions across the country that relies on the success of interdisciplinary research. To prepare students for interdisciplinary problem solving, universities typically offer advanced courses or seminars in interdisciplinary topics. While courses like this are ideal for advanced students who have extensive backgrounds in both computational science and domain sciences, most graduate students lack core competency in fields outside of their own disciplines and are thus unprepared to step up to high-level multidisciplinary courses. However, traditional institutional curricula do not provide opportunities for graduate students to develop an appropriate ground-level understanding in disciplines outside of their primary department. To address this issue, the SCALE-IT program has initiated the creation of introductory graduate level courses to overcome the lexical barrier between academic fields. Over the past two years, the SCALE-IT program has developed four successful graduate level courses dedicated to teaching introductory topics in bioinformatics at the University of Tennessee. Results One course in particular, A Survey of Biology for Computational Researchers, demonstrates the success of this SCALE-IT initiative. This course aims to introduce a survey of biology to graduate students in other computational fields by addressing the crippling language barrier and building a community around six computational topics. The six topics explored are: Genomics, Biochemistry and Protein Biophysics, Cell Biology and Cell Signaling, Immunology, Phylogenetics and Evolution, and Populations Ecology. During the course of the semester, each topic concludes with a guest lecture and open discussion from an expert at the University of Tennessee in the computational domain. Graduate students from across five different academic disciplines are registered in this course for its first semester of instruction. Conclusions In response to the high level of interest and success in these courses at the University of Tennessee, SCALE-IT is working towards establishing this curriculum as a permanent foundation in graduate interdisciplinary education. The permanent inclusion of graduate level basic courses would greatly enhance the versatility of the interdisciplinary student. In turn, the development of more capable graduate students will directly enable university-level academic fields to make greater strides in advancing the scope of successful interdisciplinary research

    Computational Ranking of Yerba Mate Small Molecules Based on Their Predicted Contribution to Antibacterial Activity against Methicillin-Resistant <i>Staphylococcus aureus</i>

    Get PDF
    <div><p>The aqueous extract of yerba mate, a South American tea beverage made from <i>Ilex paraguariensis</i> leaves, has demonstrated bactericidal and inhibitory activity against bacterial pathogens, including methicillin-resistant <i>Staphylococcus aureus</i> (MRSA). The gas chromatography-mass spectrometry (GC-MS) analysis of two unique fractions of yerba mate aqueous extract revealed 8 identifiable small molecules in those fractions with antimicrobial activity. For a more comprehensive analysis, a data analysis pipeline was assembled to prioritize compounds for antimicrobial testing against both MRSA and methicillin-sensitive <i>S</i>. <i>aureus</i> using forty-two unique fractions of the tea extract that were generated in duplicate, assayed for activity, and analyzed with GC-MS. As validation of our automated analysis, we checked our predicted active compounds for activity in literature references and used authentic standards to test for antimicrobial activity. 3,4-dihydroxybenzaldehyde showed the most antibacterial activity against MRSA at low concentrations in our bioassays. In addition, quinic acid and quercetin were identified using random forests analysis and 5-hydroxy pipecolic acid was identified using linear discriminant analysis. We also generated a ranked list of unidentified compounds that may contribute to the antimicrobial activity of yerba mate against MRSA. Here we utilized GC-MS data to implement an automated analysis that resulted in a ranked list of compounds that likely contribute to the antimicrobial activity of aqueous yerba mate extract against MRSA.</p></div
    corecore