98 research outputs found

    A Rapid, Cost-Effective Method of Assembly and Purification of Synthetic DNA Probes >100 bp

    Get PDF
    Here we introduce a rapid, cost-effective method of generating molecular DNA probes in just under 15 minutes without the need for expensive, time-consuming gel-extraction steps. As an example, we enzymatically concatenated six variable strands (50 bp) with a common strand sequence (51 bp) in a single pool using Fast-Link DNA ligase to produce 101 bp targets (10 min). Unincorporated species were then filtered out by passing the crude reaction through a size-exclusion column (<5 min). We then compared full-length product yield of crude and purified samples using HPLC analysis; the results of which clearly show our method yields three-quarters that of the crude sample (50% higher than by gel-extraction). And while we substantially reduced the amount of unligated product with our filtration process, higher purity and yield, with an increase in number of stands per reaction (>12) could be achieved with further optimization. Moreover, for large-scale assays, we envision this method to be fully automated with the use of robotics such as the Biomek FX; here, potentially thousands of samples could be pooled, ligated and purified in either a 96, 384 or 1536-well platform in just minutes

    Inference algorithms for gene networks: a statistical mechanics analysis

    Full text link
    The inference of gene regulatory networks from high throughput gene expression data is one of the major challenges in systems biology. This paper aims at analysing and comparing two different algorithmic approaches. The first approach uses pairwise correlations between regulated and regulating genes; the second one uses message-passing techniques for inferring activating and inhibiting regulatory interactions. The performance of these two algorithms can be analysed theoretically on well-defined test sets, using tools from the statistical physics of disordered systems like the replica method. We find that the second algorithm outperforms the first one since it takes into account collective effects of multiple regulators

    Indirect two-sided relative ranking: a robust similarity measure for gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights.</p> <p>Results</p> <p>In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries.</p> <p>Conclusions</p> <p>We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.</p

    Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

    Get PDF
    Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses

    Functional analysis of multiple genomic signatures demonstrates that classification algorithms choose phenotype-related genes

    Get PDF
    Gene expression signatures of toxicity and clinical response benefit both safety assessment and clinical practice; however, difficulties in connecting signature genes with the predicted end points have limited their application. The Microarray Quality Control Consortium II (MAQCII) project generated 262 signatures for ten clinical and three toxicological end points from six gene expression data sets, an unprecedented collection of diverse signatures that has permitted a wide-ranging analysis on the nature of such predictive models. A comprehensive analysis of the genes of these signatures and their nonredundant unions using ontology enrichment, biological network building and interactome connectivity analyses demonstrated the link between gene signatures and the biological basis of their predictive power. Different signatures for a given end point were more similar at the level of biological properties and transcriptional control than at the gene level. Signatures tended to be enriched in function and pathway in an end point and model-specific manner, and showed a topological bias for incoming interactions. Importantly, the level of biological similarity between different signatures for a given end point correlated positively with the accuracy of the signature predictions. These findings will aid the understanding, and application of predictive genomic signatures, and support their broader application in predictive medicine

    A tryptophan-rich peptide acts as a transcription activation domain

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic transcription activators normally consist of a sequence-specific DNA-binding domain (DBD) and a transcription activation domain (AD). While many sequence patterns and motifs have been defined for DBDs, ADs do not share easily recognizable motifs or structures.</p> <p>Results</p> <p>We report herein that the N-terminal domain of yeast valyl-tRNA synthetase can function as an AD when fused to a DNA-binding protein, LexA, and turn on reporter genes with distinct LexA-responsive promoters. The transcriptional activity was mainly attributed to a five-residue peptide, WYDWW, near the C-terminus of the N domain. Remarkably, the pentapeptide <it>per se </it>retained much of the transcriptional activity. Mutations which substituted tryptophan residues for both of the non-tryptophan residues in the pentapeptide (resulting in W<sub>5</sub>) significantly enhanced its activity (~1.8-fold), while mutations which substituted aromatic residues with alanine residues severely impaired its activity. Accordingly, a much more active peptide, pentatryptophan (W<sub>7</sub>), was produced, which elicited ~3-fold higher activity than that of the native pentapeptide and the N domain. Further study indicated that W<sub>7 </sub>mediates transcription activation through interacting with the general transcription factor, TFIIB.</p> <p>Conclusions</p> <p>Since W<sub>7 </sub>shares no sequence homology or features with any known transcription activators, it may represent a novel class of AD.</p

    Analysis and Computational Dissection of Molecular Signature Multiplicity

    Get PDF
    Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities

    Application of Biomarkers in Cancer Risk Management: Evaluation from Stochastic Clonal Evolutionary and Dynamic System Optimization Points of View

    Get PDF
    Aside from primary prevention, early detection remains the most effective way to decrease mortality associated with the majority of solid cancers. Previous cancer screening models are largely based on classification of at-risk populations into three conceptually defined groups (normal, cancer without symptoms, and cancer with symptoms). Unfortunately, this approach has achieved limited successes in reducing cancer mortality. With advances in molecular biology and genomic technologies, many candidate somatic genetic and epigenetic “biomarkers” have been identified as potential predictors of cancer risk. However, none have yet been validated as robust predictors of progression to cancer or shown to reduce cancer mortality. In this Perspective, we first define the necessary and sufficient conditions for precise prediction of future cancer development and early cancer detection within a simple physical model framework. We then evaluate cancer risk prediction and early detection from a dynamic clonal evolution point of view, examining the implications of dynamic clonal evolution of biomarkers and the application of clonal evolution for cancer risk management in clinical practice. Finally, we propose a framework to guide future collaborative research between mathematical modelers and biomarker researchers to design studies to investigate and model dynamic clonal evolution. This approach will allow optimization of available resources for cancer control and intervention timing based on molecular biomarkers in predicting cancer among various risk subsets that dynamically evolve over time

    The population biology and evolutionary significance of Ty elements in Saccharomyces cerevisiae

    Full text link
    The basic structure and properties of Ty elements are considered with special reference to their role as agents of evolutionary change. Ty elements may generate genetic variation for fitness by their action as mutagens, as well as by providing regions of portable homology for recombination. The mutational spectra generated by Ty 1 transposition events may, due to their target specificity and gene regulatory capabilities, possess a higher frequency of adaptively favorable mutations than spectra resulting from other types of mutational processes. Laboratory strains contain between 25–35 elements, and in both these and industrial strains the insertions appear quite stable. In contrast, a wide variation in Ty number is seen in wild isolates, with a lower average number/genome. Factors which may determine Ty copy number in populations include transposition rates (dependent on Ty copy number and mating type), and stabilization of Ty elements in the genome as well as selection for and against Ty insertions in the genome. Although the average effect of Ty transpositions are deleterious, populations initiated with a single clone containing a single Ty element steadily accumulated Ty elements over 1,000 generations. Direct evidence that Ty transposition events can be selectively favored is provided by experiments in which populations containing large amounts of variability for Ty1 copy number were maintained for ∼100 generations in a homogeneous environment. At their termination, the frequency of clones containing 0 Ty elements had decreased to ∼0.0, and the populations had became dominated by a small number of clones containing >0 Ty elements. No such reduction in variability was observed in populations maintained in a structured environment, though changes in Ty number were observed. The implications of genetic (mating type and ploidy) changes and environmental fluctuations for the long-term persistence of Ty elements within the S. cerevisiae species group are discussed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42799/1/10709_2004_Article_BF00133718.pd
    corecore