Search CORE

208 research outputs found

A data science approach to pattern discovery in complex structures with applications in bioinformatics

Author: Hua Lei
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2016
Field of study

Pattern discovery aims to find interesting, non-trivial, implicit, previously unknown and potentially useful patterns in data. This dissertation presents a data science approach for discovering patterns or motifs from complex structures, particularly complex RNA structures. RNA secondary and tertiary structure motifs are very important in biological molecules, which play multiple vital roles in cells. A lot of work has been done on RNA motif annotation. However, pattern discovery in RNA structure is less studied. In the first part of this dissertation, an ab initio algorithm, named DiscoverR, is introduced for pattern discovery in RNA secondary structures. This algorithm works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using a quadratic time dynamic programming algorithm. The algorithm is able to identify and extract the largest common substructures from two RNA molecules of different sizes, without prior knowledge of locations and topologies of these substructures. One application of DiscoverR is to locate the RNA structural elements in genomes. Experimental results show that this tool complements the currently used approaches for mining conserved structural RNAs in the human genome. DiscoverR can also be extended to find repeated regions in an RNA secondary structure. Specifically, this extended method is used to detect structural repeats in the 3\u27-untranslated region of a protein kinase gene

Digital Commons @ New Jersey Institute of Technology (NJIT)

Transcriptional Basis for Differential Thermosensitivity of Seedlings of Various Tomato Genotypes

Author: Fragkostefanakis Sotirios
Hu Yangjie
Schleiff Enrico
Simm Stefan
Publication venue: 'MDPI AG'
Publication date: 16/06/2020
Field of study

Publication Server of Greifswald University

Similarity-based approaches to molecular function discovery

Author: Hoksza David
Publication venue
Publication date: 18/05/2021
Field of study

CU Digital Repository

MultiSETTER: web server for multiple RNA structure comparison

Author: AT Willingham
BS Schuwirth
C Kemena
C Neubauer
CW Wang
D Hoksza
D Hoksza
Daniel Svozil
David Hoksza
DG Higgins
DH Mathews
DJ Klein
DK Hendrix
E Capriotti
E Capriotti
E Capriotti
EP Nawrocki
F Ferre
G He
H Berman
HM Berman
I Tinoco Jr
J Harms
JD Westbrook
MA Huynen
MG Seetin
MN Nguyen
N Saitou
O Dror
O Dror
P Cech
P Hogeweg
Petr Čech
PW Rose
R Lorenz
RR Rahrig
RR Rahrig
S Gutmann
S Kirillova
SR Holbrook
TM Schmeing
WN Moss
XJ Lu
YC Liu
YF Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

SupeRNAlign: a new tool for flexible superposition of homologous RNA structures and inference of accurate structure-based sequence alignments

Author: Bujnicki JM
Dawson WK
Jabtonska J
Jankowska E
Matelska D
Niedziatek D
Piatkowski P
Walen T
Zyta A
Publication venue: OXFORD UNIV PRESS
Publication date: 20/07/2017
Field of study

RNA has been found to play an ever-increasing role in a variety of biological processes. The function of most non-coding RNA molecules depends on their structure. Comparing and classifying macromolecular 3D structures is of crucial importance for structure-based function inference and it is used in the characterization of functional motifs and in structure prediction by comparative modeling. However, compared to the numerous methods for protein structure superposition, there are few tools dedicated to the superimposing of RNA 3D structures. Here, we present SupeRNAlign (v1.3.1), a new method for flexible superposition of RNA 3D structures, and SupeRNAlign-Coffee—a workflow that combines SupeRNAlign with T-Coffee for inferring structure-based sequence alignments. The methods have been benchmarked with eight other methods for RNA structural superposition and alignment. The benchmark included 151 structures from 32 RNA families (with a total of 1734 pairwise superpositions). The accuracy of superpositions was assessed by comparing structure-based sequence alignments to the reference alignments from the Rfam database. SupeRNAlign and SupeRNAlign-Coffee achieved significantly higher scores than most of the benchmarked methods: SupeRNAlign generated the most accurate sequence alignments among the structure superposition methods, and SupeRNAlign-Coffee performed best among the sequence alignment methods

UCL Discovery

TPS Genes Silencing Alters Constitutive Indirect and Direct Defense in Tomato

Author: Antonio Pietro Garonna
Bossi Simone
Emilio Guerrieri
Giandomenico Corrado
Maffei Massimo Emilio
Mariangela Coppola
Pasquale Cascone
Rosa Rao
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Computational Methods for Comparative Non-coding RNA Analysis: from Secondary Structures to Tertiary Structures

Author: Ge Ping
Publication venue: University of Central Florida
Publication date: 01/01/2016
Field of study

Unlike message RNAs (mRNAs) whose information is encoded in the primary sequences, the cellular roles of non-coding RNAs (ncRNAs) originate from the structures. Therefore studying the structural conservation in ncRNAs is important to yield an in-depth understanding of their functionalities. In the past years, many computational methods have been proposed to analyze the common structural patterns in ncRNAs using comparative methods. However, the RNA structural comparison is not a trivial task, and the existing approaches still have numerous issues in efficiency and accuracy. In this dissertation, we will introduce a suite of novel computational tools that extend the classic models for ncRNA secondary and tertiary structure comparisons. For RNA secondary structure analysis, we first developed a computational tool, named PhyloRNAalifold, to integrate the phylogenetic information into the consensus structural folding. The underlying idea of this algorithm is that the importance of a co-varying mutation should be determined by its position on the phylogenetic tree. By assigning high scores to the critical covariances, the prediction of RNA secondary structure can be more accurate. Besides structure prediction, we also developed a computational tool, named ProbeAlign, to improve the efficiency of genome-wide ncRNA screening by using high-throughput RNA structural probing data. It treats the chemical reactivities embedded in the probing information as pairing attributes of the searching targets. This approach can avoid the time-consuming base pair matching in the secondary structure alignment. The application of ProbeAlign to the FragSeq datasets shows its capability of genome-wide ncRNAs analysis. For RNA tertiary structure analysis, we first developed a computational tool, named STAR3D, to find the global conservation in RNA 3D structures. STAR3D aims at finding the consensus of stacks by using 2D topology and 3D geometry together. Then, the loop regions can be ordered and aligned according to their relative positions in the consensus. This stack-guided alignment method adopts the divide-and-conquer strategy into RNA 3D structural alignment, which has improved its efficiency dramatically. Furthermore, we also have clustered all loop regions in non-redundant RNA 3D structures to de novo detect plausible RNA structural motifs. The computational pipeline, named RNAMSC, was extended to handle large-scale PDB datasets, and solid downstream analysis was performed to ensure the clustering results are valid and easily to be applied to further research. The final results contain many interesting variations of known motifs, such as GNAA tetraloop, kink-turn, sarcin-ricin and t-loops. We also discovered novel functional motifs that conserved in a wide range of ncRNAs, including ribosomal RNA, sgRNA, SRP RNA, GlmS riboswitch and twister ribozyme

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Data mining in computational proteomics and genomics

Author: Song Yang
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2015
Field of study

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification. The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the dataset. A new method, called PeakID, is proposed in this dissertation, which solves the elastic peak detection problem in 2D LC-MS data without yielding any false negative. PeakID employs a novel data structure, called a Shifted Aggregation Tree or AggTree for short, to find the different peaks in the dataset. This method works by first constructing an AggTree in a bottom-up manner from the dataset, and then searching the AggTree for the peaks in a top-down manner. PeakID uses a state-space algorithm to find the topology and structure of an efficient AggTree. Experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data. The second part of this dissertation focuses on RNA pseudoknot structure matching and alignment. RNA pseudoknot structures play important roles in many genomic processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs. A new method, called RKalign, is proposed in this dissertation for aligning two known RNA secondary structures with pseudoknots. RKalign adopts the partition function methodology to calculate the posterior log-odds scores of the alignments between bases or base pairs of the two RNAs with a dynamic programming algorithm. The posterior log-odds scores are then used to calculate the expected accuracy of an alignment between the RNAs. The goal is to find an optimal alignment with the maximum expected accuracy. RKalign employs a greedy algorithm to achieve this goal. The performance of RKalign is investigated and compared with existing tools for RNA structure alignment. An extension of the proposed method to multiple alignment of pseudoknot structures is also discussed. RKalign is implemented in Java and freely accessible on the Internet. As more and more pseudoknots are revealed, collected and stored in public databases, it is anticipated that a tool like RKalign will play a significant role in data comparison, annotation, analysis, and retrieval in these databases

Digital Commons @ New Jersey Institute of Technology (NJIT)

Dissection of Soil Waterlogging Tolerance in Soft Red Winter Wheat using Genomic Approaches

Author: Acuna-Galindo Marlovi Andrea
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2018
Field of study

Genomic methods including genome wide association analysis (GWAS), genomic selection (GS) and RNA-seq allow for faster selection of superior breeding lines and for identification and resolution of candidate genes. A panel of 240 soft red winter wheat (Triticum aestivum L.) cultivars and breeding lines were subjected to soil waterlogging stress over two seasons at Stuttgart, AR and St. Joseph, LA, US. Total concentrations of P, K, Ca, Mg, Mn, Fe, Al, B, Cu, Na, S and Zn were determined in wheat shoots post-waterlogging using inductively coupled plasma spectroscopy. Yield components kernel number per spike (KNPS), kernel weight per spike (KWS) and thousand kernel weight (TKW) were measured at plant maturity. Negative correlations between TKW and KWS with aluminum and iron concentrations indicated the impact of elemental toxicity on grain production. A ten-fold cross-validation (CV) analysis and ridge regression BLUP (RR-BLUP) model found GS prediction accuracies (rgs) of micro and macronutrient concentrations to range from rgs = 0.06 to 0.52 and improved as more site-years were included in the analysis. The ratio of genomic to phenotypic prediction accuracy (rgs /H1/2) was greater than 0.50 for eight of the twelve elements, indicating the potential for using GS to select for shoot micro and macronutrient concentrations in the absence of phenotypic data. GWAS identified forty-seven highly significant (p \u3c 0.00001), twenty-three very significant and consistent (p \u3c 0.0005) and eight significant and consistent (p \u3c 0.001) marker trait associations (MTA) for the twelve micro and macronutrients measured. Lastly, RNA-seq was used for transcriptome and gene expression analysis under waterlogged and non-waterlogged conditions in wheat cultivars ‘Pioneer Brand 26R61’ and ‘AGS 2000’. Around 300 million pair-end reads were developed, covering approximately 16 Gb of the wheat transcriptome. In total, 64,911 (AGS200) and 60,414 (26R61) were obtained and 58,753 expressed genes were observed across both cultivars and treatments. Overall, the results of this study have and will enable genomics assisted breeding for waterlogging tolerance within the University of Arkansas Wheat Breeding Program by allowing for selection of materials with reduced micro and macronutrient concentrations in new breeding lines in the absence of phenotypic dat

ScholarWorks@UARK

UARK (University of Arkansas )

Linked genetic variants on chromosome 10 control ear morphology and body mass among dog breeds

Author: A McKenna
A Vaysse
AH Freedman
AR Boyko
B Hare
BM Holdt von
BM Waller
C Vilà
CS Ng
D Pillas
D-H Lim
DF Gudbjartsson
DR Bergsma
E Axelsson
EF Schoenmakers
EK Karlsson
EK Karlsson
Erik Axelsson
F Cruz
F Geller
G Fatemifar
G Rosengren Pielberg
G Wang
Gerli Pielberg
GM Cooper
H Lee
H Li
H Li
H Thorvaldsdóttir
H Weissbach
HR Ashar
HR Taal
J-F Pang
JL Stein
K Lindblad-Toh
Kerstin Lindblad-Toh
KW Vance
LN Trut
M Rimbault
Marc P. Hoeppner
Matthew T. Webster
MB Gerstein
MG Grabherr
Michele Perloski
MN Weedon
MP Hoeppner
N Zamani
NB Sutter
NB Sutter
Nona Kamgari
P Flicek
P Jones
P Savolainen
R Reeves
S Björnerfeldt
S Purcell
T Hung
Y Kim
ZM Ahmed
Åke Hedhammar
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref