11,978 research outputs found

    Specific and non specific hybridization of oligonucleotide probes on microarrays

    Get PDF
    Gene expression analysis by means of microarrays is based on the sequence specific binding of mRNA to DNA oligonucleotide probes and its measurement using fluorescent labels. The binding of RNA fragments involving other sequences than the intended target is problematic because it adds a "chemical background" to the signal, which is not related to the expression degree of the target gene. The paper presents a molecular signature of specific and non specific hybridization with potential consequences for gene expression analysis. We analyzed the signal intensities of perfect match (PM) and mismatch (MM) probes of GeneChip microarrays to specify the effect of specific and non specific hybridization. We found that these events give rise to different relations between the PM and MM intensities as function of the middle base of the PMs, namely a triplet- (C>G=T>A>0) and a duplet-like (C=T>0>G=A) pattern of the PM-MM log-intensity difference upon binding of specific and non specific RNA fragments, respectively. The systematic behaviour of the intensity difference can be rationalized on the level of base pairings of DNA/RNA oligonucleotide duplexes in the middle of the probe sequence. Non-specific binding is characterized by the reversal of the central Watson Crick (WC) pairing for each PM/MM probe pair, whereas specific binding refers to the combination of a WC and a self complementary (SC) pairing in PM and MM probes, respectively. The intensity of complementary MM introduces a systematic source of variation which decreases the precision of expression measures based on the MM intensities

    Gene selection algorithms for microarray data based on least squares support vector machine

    Get PDF
    BACKGROUND: In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes. RESULTS: A gene selection method searches for an optimal or near optimal subset of genes with respect to a given evaluation criterion. In this paper, we propose a new evaluation criterion, named the leave-one-out calculation (LOOC, A list of abbreviations appears just above the list of references) measure. A gene selection method, named leave-one-out calculation sequential forward selection (LOOCSFS) algorithm, is then presented by combining the LOOC measure with the sequential forward selection scheme. Further, a novel gene selection algorithm, the gradient-based leave-one-out gene selection (GLGS) algorithm, is also proposed. Both of the gene selection algorithms originate from an efficient and exact calculation of the leave-one-out cross-validation error of the least squares support vector machine (LS-SVM). The proposed approaches are applied to two microarray datasets and compared to other well-known gene selection methods using codes available from the second author. CONCLUSION: The proposed gene selection approaches can provide gene subsets leading to more accurate classification results, while their computational complexity is comparable to the existing methods. The GLGS algorithm can also better scale to datasets with a very large number of genes

    Exploring transcriptional signalling mediated by OsWRKY13, a potential regulator of multiple physiological processes in rice

    Get PDF
    BACKGROUND Rice transcription regulator OsWRKY13 influences the functioning of more than 500 genes in multiple signalling pathways, with roles in disease resistance, redox homeostasis, abiotic stress responses, and development. RESULTS To determine the putative transcriptional regulation mechanism of OsWRKY13, the putative cis-acting elements of OsWRKY13-influenced genes were analyzed using the whole genome expression profiling of OsWRKY13-activated plants generated with the Affymetrix GeneChip Rice Genome Array. At least 39 transcription factor genes were influenced by OsWRKY13, and 30 of them were downregulated. The promoters of OsWRKY13-upregulated genes were overrepresented with W-boxes for WRKY protein binding, whereas the promoters of OsWRKY13-downregulated genes were enriched with cis-elements putatively for binding of MYB and AP2/EREBP types of transcription factors. Consistent with the distinctive distribution of these cis-elements in up- and downregulated genes, nine WRKY genes were influenced by OsWRKY13 and the promoters of five of them were bound by OsWRKY13 in vitro; all seven differentially expressed AP2/EREBP genes and six of the seven differentially expressed MYB genes were suppressed by in OsWRKY13-activated plants. A subset of OsWRKY13-influenced WRKY genes were involved in host-pathogen interactions. CONCLUSION These results suggest that OsWRKY13-mediated signalling pathways are partitioned by different transcription factors. WRKY proteins may play important roles in the monitoring of OsWRKY13-upregulated genes and genes involved in pathogen-induced defence responses, whereas MYB and AP2/EREBP proteins may contribute most to the control of OsWRKY13-downregulated genes.This work was supported by grants from the National Program of High Technology Development of China, the National Program on the Development of Basic Research in China, and the National Natural Science Foundation of China

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

    Full text link
    Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table

    miR-196b target screen reveals mechanisms maintaining leukemia stemness with therapeutic potential.

    Get PDF
    We have shown that antagomiR inhibition of miRNA miR-21 and miR-196b activity is sufficient to ablate MLL-AF9 leukemia stem cells (LSC) in vivo. Here, we used an shRNA screening approach to mimic miRNA activity on experimentally verified miR-196b targets to identify functionally important and therapeutically relevant pathways downstream of oncogenic miRNA in MLL-r AML. We found Cdkn1b (p27Kip1) is a direct miR-196b target whose repression enhanced an embryonic stem cell–like signature associated with decreased leukemia latency and increased numbers of leukemia stem cells in vivo. Conversely, elevation of p27Kip1 significantly reduced MLL-r leukemia self-renewal, promoted monocytic differentiation of leukemic blasts, and induced cell death. Antagonism of miR-196b activity or pharmacologic inhibition of the Cks1-Skp2–containing SCF E3-ubiquitin ligase complex increased p27Kip1 and inhibited human AML growth. This work illustrates that understanding oncogenic miRNA target pathways can identify actionable targets in leukemia

    Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.</p> <p>Results</p> <p>To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others.</p> <p>Conclusions</p> <p>On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.</p
    corecore