Search CORE

2 research outputs found

Data mining in computational proteomics and genomics

Author: Song Yang
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2015
Field of study

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification. The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the dataset. A new method, called PeakID, is proposed in this dissertation, which solves the elastic peak detection problem in 2D LC-MS data without yielding any false negative. PeakID employs a novel data structure, called a Shifted Aggregation Tree or AggTree for short, to find the different peaks in the dataset. This method works by first constructing an AggTree in a bottom-up manner from the dataset, and then searching the AggTree for the peaks in a top-down manner. PeakID uses a state-space algorithm to find the topology and structure of an efficient AggTree. Experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data. The second part of this dissertation focuses on RNA pseudoknot structure matching and alignment. RNA pseudoknot structures play important roles in many genomic processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs. A new method, called RKalign, is proposed in this dissertation for aligning two known RNA secondary structures with pseudoknots. RKalign adopts the partition function methodology to calculate the posterior log-odds scores of the alignments between bases or base pairs of the two RNAs with a dynamic programming algorithm. The posterior log-odds scores are then used to calculate the expected accuracy of an alignment between the RNAs. The goal is to find an optimal alignment with the maximum expected accuracy. RKalign employs a greedy algorithm to achieve this goal. The performance of RKalign is investigated and compared with existing tools for RNA structure alignment. An extension of the proposed method to multiple alignment of pseudoknot structures is also discussed. RKalign is implemented in Java and freely accessible on the Internet. As more and more pseudoknots are revealed, collected and stored in public databases, it is anticipated that a tool like RKalign will play a significant role in data comparison, annotation, analysis, and retrieval in these databases

Digital Commons @ New Jersey Institute of Technology (NJIT)

Novel features for identifying A-minors in three-dimensional RNA molecules

Author: Akhila Nagula
Apostolico
Ban
Bao
Bida
Breiman
Chang
Cheng
Christian Laing
Ciriello
Cortes
Daldrop
Firdaus-Raih
Golden
Griesmer
Han
Jason T.L. Wang
Jonikas
Jossinet
Kim
Klosterman
Laing
Laing
Lamiable
Leontis
Lescoute
Lilley
Liu
Maris
McDowell
Michel
Miguel Cervantes-Cervantes
Nasalean
Nissen
Orr
Ouellet
Palak Sheth
Popenda
Rose
Sarver
Schroeder
Shang
Shapiro
Shapiro
Tan
Wadley
Wang
Xin
Yang
Zhang
Zhong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref