94,420 research outputs found
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Optimizing Splicing Junction Detection in Next Generation Sequencing Data on a Virtual-GRID Infrastructure
The new protocol for sequencing the messenger RNA in a cell, named RNA-seq produce millions of short sequence fragments. Next Generation Sequencing technology allows more accurate analysis but increase needs in term of computational resources. This paper describes the optimization of a RNA-seq analysis pipeline devoted to splicing variants detection, aimed at reducing computation time and providing a multi-user/multisample environment. This work brings two main contributions. First, we optimized a well-known algorithm called TopHat by parallelizing some sequential mapping steps. Second, we designed and implemented a hybrid virtual GRID infrastructure allowing to efficiently execute multiple instances of TopHat running on different samples or on behalf of different users, thus optimizing the overall execution time and enabling a flexible multi-user environmen
The future of laboratory medicine - A 2014 perspective.
Predicting the future is a difficult task. Not surprisingly, there are many examples and assumptions that have proved to be wrong. This review surveys the many predictions, beginning in 1887, about the future of laboratory medicine and its sub-specialties such as clinical chemistry and molecular pathology. It provides a commentary on the accuracy of the predictions and offers opinions on emerging technologies, economic factors and social developments that may play a role in shaping the future of laboratory medicine
RACS: Rapid Analysis of ChIP-Seq data for contig based genomes
Background: Chromatin immunoprecipitation coupled to next generation
sequencing (ChIP-Seq) is a widely used technique to investigate the function of
chromatin-related proteins in a genome-wide manner. ChIP-Seq generates large
quantities of data which can be difficult to process and analyse, particularly
for organisms with contig based genomes. Contig-based genomes often have poor
annotations for cis-elements, for example enhancers, that are important for
gene expression. Poorly annotated genomes make a comprehensive analysis of
ChIP-Seq data difficult and as such standardized analysis pipelines are
lacking. Methods: We report a computational pipeline that utilizes traditional
High-Performance Computing techniques and open source tools for processing and
analysing data obtained from ChIP-Seq. We applied our computational pipeline
"Rapid Analysis of ChIP-Seq data" (RACS) to ChIP-Seq data that was generated in
the model organism Tetrahymena thermophila, an example of an organism with a
genome that is available in contigs. Results: To test the performance and
efficiency of RACs, we performed control ChIP-Seq experiments allowing us to
rapidly eliminate false positives when analyzing our previously published data
set. Our pipeline segregates the found read accumulations between genic and
intergenic regions and is highly efficient for rapid downstream analyses.
Conclusions: Altogether, the computational pipeline presented in this report is
an efficient and highly reliable tool to analyze genome-wide ChIP-Seq data
generated in model organisms with contig-based genomes.
RACS is an open source computational pipeline available to download from:
https://bitbucket.org/mjponce/racs --or--
https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACSComment: Submitted to BMC Bioinformatics. Computational pipeline available at
https://bitbucket.org/mjponce/rac
Computing vs. Genetics
This chapter first presents the interrelations between computing and genetics, which both are based on information and, particularly, self-reproducing artificial systems. It goes on to examine genetic code from a computational viewpoint. This raises a number of important questions about genetic code. These questions are stated in the form of an as yet unpublished working hypothesis. This hypothesis suggests that many genetic alterations are caused by the last base of certain codons. If this conclusive hypothesis were to be confirmed through experiementation if would be a significant advance for treating many genetic diseases
- …