Search CORE

37 research outputs found

Study on the production and accumulation of betalains in cultured cells of Beta vulgaris

Author: Kalkatawi Samia Jamal
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Identification of Polyadenylation Sites within Arabidopsis Thaliana

Author: Kalkatawi Manal M.
Publication venue
Publication date: 01/09/2011
Field of study

Machine Learning (ML) is a field of artificial intelligence focused on the design and implementation of algorithms that enable creation of models for clustering, classification, prediction, ranking and similar inference tasks based on information contained in data. Many ML algorithms have been successfully utilized in a variety of applications. The problem addressed in this thesis is from the field of bioinformatics and deals with the recognition of polyadenylation (poly(A)) sites in the genomic sequence of the plant Arabidopsis thaliana. During the RNA processing, a tail consisting of a number of consecutive adenine (A) nucleotides is added to the terminal nucleotide of the 3’- untranslated region (3’UTR) of the primary RNA. The process in which these A nucleotides are added is called polyadenylation. The location in the genomic DNA sequence that corresponds to the start of terminal A nucleotides (i.e. to the end of 3’UTR) is known as a poly(A) site. Recognition of the poly(A) sites in DNA sequence is important for better gene annotation and understanding of gene regulation. In this study, we built an artificial neural network (ANN) for the recognition of poly(A) sites in the Arabidopsis thaliana genome. Our study demonstrates that this model achieves improved accuracy compared to the existing predictive models for this purpose. The key factor contributing to the enhanced predictive performance of our ANN model is a distinguishing set of features used in creation of the model. These features include a number of physico-chemical characteristics of relevance, such as dinucleotide thermodynamic characteristics, electron-ion interaction potential, etc., but also many of the statistical properties of the DNA sequences from the region surrounding poly(A) site, such as nucleotide and polynucleotide properties, common motifs, etc. Our ANN model was compared in performance with several other ML models, as well as with the PAC tool that is specifically developed for poly(A) site recognition in Arabidopsis thaliana and rice. The comparison analysis shows that our model performs better compared to the others available, and achieves on average 93% accuracy

KAUST Digital Archive

INDIGO - INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles.

Author: Alam Intikhab
Antunes André
Ba Alawi Wail
Bajic Vladimir B
Kalkatawi Manal
Kamau Allan Anthony
Stingl Ulrich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Background: The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results: We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions: We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.IA and AAK were supported from the KAUST CBRC Base Fund of VBB. WBa and VBB were supported from the KAUST Base Funds of VBB. US was supported by the KAUST Base Fund of US. This study was partly supported by the Saudi Economic and Development Company (SEDCO) Research Excellence award to US and VBB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

CiteSeerX

Public Library of Science (PLOS)

Universidade do Minho: RepositoriUM

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

PubMed Central

Untranslated parts of genes interpreted: making heads or tails of high-throughput transcriptomic data via computational methods

Author: Ahmed
Akhtar
Akman
Angelini
Archer
Barrett
Bayerlová
Berkovits
Bicknell
Birol
Bolisetty
Bonfert
Brett
Brockman
Cambon
Cheng
Cui
Curinha
Dassi
David
Deamer
Derti
Dieudonné
Down
Erdman
Erdman
Fu
Garalde
Graber
Granovskaia
Grassi
Gruber
Haas
Harrison
Hashimoto
Havukkala
Hayer
Hoenen
Hollerer
Hoque
Hsin-Sung Yeh
Huber
Jain
Jan
Ji
Johannsen
Kalkatawi
Kanamori-Katayama
Kanitz
Katz
Kim
Laver
Le Pera
Lee
Lee
Legendre
Li
Li
Lianoglou
Loman
Love
Lu
MacDonald
Martin
Mayr
Mercer
Mignone
Mironov
Miura
Modrek
Mortazavi
Müller
Nagalakshmi
Nellore
Ohler
Ozsolak
Park
Pelechano
Pickrell
Plessy
Quick
Rasmussen
Roberts
Robinson
Rojas-Duran
Rot
Routh
Ruan
Salamov
Salisbury
Sandberg
Sharon
Shenker
Shepard
Shiraki
Sigurgeirsson
Smibert
Steijger
Suzuki
Tabaska
Tan
Tian
Trapnell
Tzanis
Valen
Velculescu
Wang
Wang
Wang
Wilkening
Winter
Wu
Wu
Xia
Xie
You
Zawada
Zhang
Zhang
Publication venue: 'Wiley'
Publication date: 20/10/2017
Field of study

The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by where transcription starts and ends on a genomic locus. The extent of alternative transcription start and alternative poly-adenylation has been revealed by sequencing methods focused on the ends of transcripts, but the application of these methods is not yet widely adopted by the community. In this review we highlight the importance of defining the untranslated parts of transcripts and suggest that computational methods applied to standard high-throughput technologies are a useful alternative to the expertise-demanding 5’ and 3’ sequencing. We present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

Crossref

Birkbeck Institutional Research Online

Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Author: Arturo Magana-Mora
Manal Kalkatawi
Vladimir B. Bajic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2017
Field of study

Abstract Background Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. Results In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. Conclusions The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/

Directory of Open Access Journals

DeepGSR: An optimized deep-learning structure for the recognition of genomic signals and regions

Author: Bajic Vladimir
Jankovic Boris
kalkatawi Manal
Magana-Mora Arturo
Publication venue
Publication date
Field of study

Recognition of different genomic signals and regions (GSRs) in the DNA is helpful in gaining knowledge to understand genome organization and gene regulation as well as gene function. Accurate recognition of GSRs enables better genome and gene annotation. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models. Recently, deep-learning (DL) methods have been shown to generate more accurate prediction models than the ‘shallow’ methods without the need to develop specialized features for the problems in question. Here, we explore the potential use of DL for the recognition of GSRs. We developed DeepGSR, an optimized DL architecture for the prediction of different types of GSRs. The performance of the DeepGSR structure is evaluated on the recognition of polyadenylation signals (PAS) and translation initiation sites (TIS) of different organisms: human, mouse, bovine and fruit fly. The results show that DeepGSR outperformed the state-of-the-art methods, reducing the classification error rate of the PAS and TIS prediction in the human genome by up to 29% and 86%, respectively. Moreover, the cross-organisms and genome-wide analyses we performed, confirmed the robustness of DeepGSR and provided new insights into the conservation of examined GSRs across species. README: DeepGSR: An optimized deep-learning structure for the recognition of genomic signals and regions. Version 1.1 13/Dec/2017 WHAT IS IT? ----------- DeepGSR is a deep-learning model that can be used for the recognition of genomic signals and regions with Eukaryotic DNA. It has been applied to polyadenylation signals (PAS) and translation initiation site (TIS). It uses fasta format DNA Sequences as input. But you can process the data using the provided code. COMMAND LINE VERSION -------------------- Here we include the source code of DeepGSR written in Python language and using Keras library with Theano backend. INSTALLATION ------------ DeepGSR is able to run on any linux platform. To run DeepGSR: Install scikit-learn (http://scikit-learn.org/), keras (https://keras.io/) and cuda for if you want faster processing using GPUs. The data that were used in the paper found in the (Data) folder. There are two types of DeepGSR usage, either for testing using pre-trained models or for training new models; each of these types found in a separate folder. Open a new terminal, then go to the directory that contains the python code. For example: cd Testing/ or cd Training/DeepGSR-2DCNN Running DeepGSR, command line options: python CNN_Testing.py –h or python 2DCNN.py –h EXAMPLE: -------- Note: all required data is included in this package Train DeepGSR on human genome for PAS recognition: python 2DCNN.py --inputFile ../../Data/Human/PAS_processed/hs_mixAATAAA_polyA.txt --DataName human_AATAAA --FileName human_AATAAA Train DeepGSR on human genome for TIS recognition: python 2DCNN.py --inputFile ../../Data/Human/TIS_processed/hs_mixATG_TIS.txt --DataName human_ATG --FileName human_ATG Test DeepGSR on mouse genome using human trained model for PAS recognition: python CNN_Testing.py --inputFile ../../Data/Mouse/PAS_processed/mm_mixAATAAA_polyA.txt –inputModel ../human_AATAAA_Model.h5 --DataName mouse_AATAAA --FileName mouse_human_AATAAA Test DeepGSR on mouse genome using human trained model for TIS recognition: python CNN_Testing.py --inputFile ../../Data/Mouse/TIS_processed/mm_mixATG_TIS.txt --inputModel ../human_ATG_Model.h5 --DataName mouse_ATG --FileName mouse_human_ATG CONTACTS -------- If you want to report bugs or have general queries email to: [email protected]

ZENODO

FigShare

Additional file 2: of BEACON: automated tool for Bacterial GEnome Annotation ComparisON

Author: Intikhab Alam (40779)
Manal Kalkatawi (494208)
Vladimir Bajic (3479747)
Publication venue
Publication date
Field of study

Contains the source code of BEACON in C++ for command line use along with makefile, a ReadMe file and the license text. (TGZ 42Â kb

FigShare

Additional file 3: Table S3. of Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Author: Arturo Magana-Mora (4358089)
Manal Kalkatawi (494208)
Vladimir Bajic (3479747)
Publication venue
Publication date
Field of study

False positive and false negative rates comparison between DPS, DNN, and Omni-polyA derived by using different feature sets. (PDF 105 kb

FigShare

Additional file 4: Figure S1. of Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Author: Arturo Magana-Mora (4358089)
Manal Kalkatawi (494208)
Vladimir Bajic (3479747)
Publication venue
Publication date
Field of study

Nucleotide distribution for PAS variants in the PAS-weak category. These plots show the frequency of nucleotides for true PAS sequences in the 10 variants from the PAS-weak category. (PDF 1696 kb

FigShare

Additional file 1: Table S1. of Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Author: Arturo Magana-Mora (4358089)
Manal Kalkatawi (494208)
Vladimir Bajic (3479747)
Publication venue
Publication date
Field of study

Omni-PolyA feature set. List of the 218 numerical features. (PDF 104 kb

FigShare