Search CORE

Public Library of Science (PLOS)

Nature Precedings

S-MART, A Software Toolbox to Aid RNA-seq Data Analysis

Author: Arkady B. Khodursky
Hadi Quesneville
Matthias Zytnicki
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci

HAL Descartes

ProdInra

Spatial patterns of transcriptional activity in the chromosome of Escherichia coli

Author: Ahn Jaeyong
Jeong Kyeong Soo
Khodursky Arkady B
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Although genes on the chromosome are organized in a fixed order, the spatial correlations in transcription have not been systematically evaluated. We used a combination of genomic and signal processing techniques to investigate the properties of transcription in the genome of Escherichia coli K12 as a function of the position of genes on the chromosome. RESULTS: Spectral analysis of transcriptional series revealed the existence of statistically significant patterns in the spatial series of transcriptional activity. These patterns could be classified into three categories: short-range, of up to 16 kilobases (kb); medium-range, over 100-125 kb; and long-range, over 600-800 kb. We show that the significant similarities in gene activities extend beyond the length of an operon and that local patterns of coexpression are dependent on DNA supercoiling. Unlike short-range patterns, the formation of medium and long-range transcriptional patterns does not strictly depend on the level of DNA supercoiling. The long-range patterns appear to correlate with the patterns of distribution of DNA gyrase on the bacterial chromosome. CONCLUSIONS: Localization of structural components in the transcriptional signal revealed an asymmetry in the distribution of transcriptional patterns along the bacterial chromosome. The demonstration that spatial patterns of transcription could be modulated pharmacologically and genetically, along with the identification of molecular correlates of transcriptional patterns, offer for the first time strong evidence of physiologically determined higher-order organization of transcription in the bacterial chromosome

Reconstruction of Escherichia coli transcriptional regulatory networks via regulon-based associations

Author: Kaveh Mostafa
Khodursky Arkady
Sangurdekar Dipen
Srivastava Poonam
Zare Hossein
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Network reconstruction methods that rely on covariance of expression of transcription regulators and their targets ignore the fact that transcription of regulators and their targets can be controlled differently and/or independently. Such oversight would result in many erroneous predictions. However, accurate prediction of gene regulatory interactions can be made possible through modeling and estimation of transcriptional activity of groups of co-regulated genes. Results Incomplete regulatory connectivity and expression data are used here to construct a consensus network of transcriptional regulation in <it>Escherichia coli </it>(<it>E. coli</it>). The network is updated via a covariance model describing the activity of gene sets controlled by common regulators. The proposed model-selection algorithm was used to annotate the likeliest regulatory interactions in <it>E. coli </it>on the basis of two independent sets of expression data, each containing many microarray experiments under a variety of conditions. The key regulatory predictions have been verified by an experiment and literature survey. In addition, the estimated activity profiles of transcription factors were used to describe their responses to environmental and genetic perturbations as well as drug treatments. Conclusion Information about transcriptional activity of documented co-regulated genes (a core regulon) should be sufficient for discovering new target genes, whose transcriptional activities significantly co-vary with the activity of the core regulon members. Our ability to derive a highly significant consensus network by applying the regulon-based approach to two very different data sets demonstrated the efficiency of this strategy. We believe that this approach can be used to reconstruct gene regulatory networks of other organisms for which partial sets of known interactions are available.</p

Operon information improves gene expression estimation for cDNA microarrays

Author: Khodursky Arkady B
Martinez-Vaz Betsy
Pan Wei
Xiao Guanghua
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In prokaryotic genomes, genes are organized in operons, and the genes within an operon tend to have similar levels of expression. Because of co-transcription of genes within an operon, borrowing information from other genes within the same operon can improve the estimation of relative transcript levels; the estimation of relative levels of transcript abundances is one of the most challenging tasks in experimental genomics due to the high noise level in microarray data. Therefore, techniques that can improve such estimations, and moreover are based on sound biological premises, are expected to benefit the field of microarray data analysis RESULTS: In this paper, we propose a hierarchical Bayesian model, which relies on borrowing information from other genes within the same operon, to improve the estimation of gene expression levels and, hence, the detection of differentially expressed genes. The simulation studies and the analysis of experiential data demonstrated that the proposed method outperformed other techniques that are routinely used to estimate transcript levels and detect differentially expressed genes, including the sample mean and SAM t statistics. The improvement became more significant as the noise level in microarray data increases. CONCLUSION: By borrowing information about transcriptional activity of genes within classified operons, we improved the estimation of gene expression levels and the detection of differentially expressed genes

Genome-wide localization of mobile elements: experimental, statistical and biological considerations

Author: Khodursky Arkady B
Martinez-Vaz Betsy M
Pan Wei
Xie Yang
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The distribution and location of insertion elements in a genome is an excellent tool to track the evolution of bacterial strains and a useful molecular marker to distinguish between closely related bacterial isolates. The information about the genomic locations of IS elements is available in public sequence databases. However, the locations of mobile elements may vary from strain to strain and within the population of an individual strain. Tools that allow de novo localization of IS elements and are independent of existing sequence information are essential to map insertion elements and advance our knowledge of the role that such elements play in gene regulation and genome plasticity in bacteria. RESULTS: In this study, we present an efficient and reliable method for linear mapping of mobile elements using whole-genome DNA microarrays. In addition, we describe an algorithm for analysis of microarray data that can be applied to find DNA sequences physically juxtaposed with a target sequence of interest. This approach was used to map the locations of the IS5 elements in the genome of Escherichia coli K12. All IS5 elements present in the E. coli genome known from GenBank sequence data were identified. Furthermore, previously unknown insertion sites were predicted with high sensitivity and specificity. Two variants of E. coli K-12 MG1655 within a population of this strain were predicted by our analysis. The only significant difference between these two isolates was the presence of an IS5 element upstream of the main flagella regulator, flhDC. Additional experiments confirmed this prediction and showed that these isolates were phenotypically distinct. The effect of IS5 on the transcriptional activity of motility and chemotaxis genes in the genome of E. coli strain MG1655 was examined. Comparative analysis of expression profiles revealed that the presence of IS5 results in a mild enhancement of transcription of the flagellar genes that translates into a slight increase in motility. CONCLUSION: In summary, this work presents a case study of an experimental and analytical application of DNA microarrays to map insertion elements in bacteria and gains an insight into biological processes that might otherwise be overlooked by relying solely on the available genome sequence data

RecA can stimulate the relaxation activity of topoisomerase I: Molecular basis of topoisomerase-mediated genome-wide transcriptional responses in Escherichia coli

Author: Hiasa Hiroshi
Jeong Kyeong Soo
Khodursky Arkady B.
Reckinger Amy R.
Publication venue: Oxford University Press
Publication date: 06/12/2006
Field of study

The superhelicity of the chromosome, which is controlled by DNA topoisomerases, modulates global gene expression. Investigations of transcriptional responses to the modulation of gyrase function have identified two types of topoisomerase-mediated transcriptional responses: (i) steady-state changes elicited by a mutation in gyrase, such as the D82G mutation in GyrA, and (ii) dynamic changes elicited by the inhibition of gyrase. We hypothesize that the steady-state effects are due to the changes in biochemical properties of gyrase, whereas the dynamic effects are due to an imbalance between supercoiling and relaxation activities, which appears to be influenced by the RecA activity. Herein, we present biochemical evidence for hypothesized mechanisms. GyrA D82G gyrase exhibits a reduced supercoiling activity. The RecA protein can influence the balance between supercoiling and relaxation activities either by interfering with the activity of DNA gyrase or by facilitating the relaxation reaction. RecA has no effect on the supercoiling activity of gyrase but stimulates the relaxation activity of topoisomerase I. This stimulation is specific and requires formation of an active RecA filament. These results suggest that the functional interaction between RecA and topoisomerase I is responsible for RecA-mediated modulation of the relaxation-dependent transcriptional activity of the Escherichia coli chromosome

A Case Study on Choosing Normalization Methods and Test Statistics for Two-Channel Microarray Data

Author: Carlin Bradley P.
Jeong Kyeong S.
Khodursky Arkady
Pan Wei
Xie Yang
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2004
Field of study

DNA microarray analysis is a biological technology which permits the whole genome to be monitored simultaneously on a single slide. Microarray technology not only opens an exciting research area for biologists, but also provides significant new challenges to statisticians. Two very common questions in the analysis of microarray data are, first, should we normalize arrays to remove potential systematic biases, and if so, what normalization method should we use? Second, how should we then implement tests of statistical significance? Straightforward and uniform answers to these questions remain elusive. In this paper, we use a real data example to illustrate a practical approach to addressing these questions. Our data is taken from a DNA–protein binding microarray experiment aimed at furthering our understanding of transcription regulation mechanisms, one of the most important issues in biology. For the purpose of preprocessing data, we suggest looking at descriptive plots first to decide whether we need preliminary normalization and, if so, how this should be accomplished. For subsequent comparative inference, we recommend use of an empirical Bayes method (the B statistic), since it performs much better than traditional methods, such as the sample mean (M statistic) and Student's t statistic, and it is also relatively easy to compute and explain compared to the others. The false discovery rate (FDR) is used to evaluate the different methods, and our comparative results lend support to our above suggestions

Persisters: a distinct physiological state of E. coli

Author: Arkady Khodursky
Bmc Microbiology
Devang Shah
Kim Lewis
Kristi Kurg
Niilo Kaldalu
Zhigang Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Bacterial populations contain persisters, phenotypic variants that constitute approximately 1% of cells in stationary phase and biofilm cultures. Multidrug tolerance of persisters is largely responsible for the inability of antibiotics to completely eradicate infections. Recent progress in understanding persisters is encouraging, but the main obstacle in understanding their nature was our inability to isolate these elusive cells from a wild-type population since their discovery in 1944. RESULTS: We hypothesized that persisters are dormant cells with a low level of translation, and used this to physically sort dim E. coli cells which do not contain sufficient amounts of unstable GFP expressed from a promoter whose activity depends on the growth rate. The dim cells were tolerant to antibiotics and exhibited a gene expression profile distinctly different from those observed for cells in exponential or stationary phases. Genes coding for toxin-antitoxin module proteins were expressed in persisters and are likely contributors to this condition. CONCLUSION: We report a method for persister isolation and conclude that these cells represent a distinct state of bacterial physiology

CiteSeerX