11,978 research outputs found
Specific and non specific hybridization of oligonucleotide probes on microarrays
Gene expression analysis by means of microarrays is based on the sequence
specific binding of mRNA to DNA oligonucleotide probes and its measurement
using fluorescent labels. The binding of RNA fragments involving other
sequences than the intended target is problematic because it adds a "chemical
background" to the signal, which is not related to the expression degree of the
target gene. The paper presents a molecular signature of specific and non
specific hybridization with potential consequences for gene expression
analysis. We analyzed the signal intensities of perfect match (PM) and mismatch
(MM) probes of GeneChip microarrays to specify the effect of specific and non
specific hybridization. We found that these events give rise to different
relations between the PM and MM intensities as function of the middle base of
the PMs, namely a triplet- (C>G=T>A>0) and a duplet-like (C=T>0>G=A) pattern of
the PM-MM log-intensity difference upon binding of specific and non specific
RNA fragments, respectively. The systematic behaviour of the intensity
difference can be rationalized on the level of base pairings of DNA/RNA
oligonucleotide duplexes in the middle of the probe sequence. Non-specific
binding is characterized by the reversal of the central Watson Crick (WC)
pairing for each PM/MM probe pair, whereas specific binding refers to the
combination of a WC and a self complementary (SC) pairing in PM and MM probes,
respectively. The intensity of complementary MM introduces a systematic source
of variation which decreases the precision of expression measures based on the
MM intensities
Gene selection algorithms for microarray data based on least squares support vector machine
BACKGROUND: In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes. RESULTS: A gene selection method searches for an optimal or near optimal subset of genes with respect to a given evaluation criterion. In this paper, we propose a new evaluation criterion, named the leave-one-out calculation (LOOC, A list of abbreviations appears just above the list of references) measure. A gene selection method, named leave-one-out calculation sequential forward selection (LOOCSFS) algorithm, is then presented by combining the LOOC measure with the sequential forward selection scheme. Further, a novel gene selection algorithm, the gradient-based leave-one-out gene selection (GLGS) algorithm, is also proposed. Both of the gene selection algorithms originate from an efficient and exact calculation of the leave-one-out cross-validation error of the least squares support vector machine (LS-SVM). The proposed approaches are applied to two microarray datasets and compared to other well-known gene selection methods using codes available from the second author. CONCLUSION: The proposed gene selection approaches can provide gene subsets leading to more accurate classification results, while their computational complexity is comparable to the existing methods. The GLGS algorithm can also better scale to datasets with a very large number of genes
Exploring transcriptional signalling mediated by OsWRKY13, a potential regulator of multiple physiological processes in rice
BACKGROUND Rice transcription regulator OsWRKY13 influences the functioning of more than 500 genes in multiple signalling pathways, with roles in disease resistance, redox homeostasis, abiotic stress responses, and development. RESULTS To determine the putative transcriptional regulation mechanism of OsWRKY13, the putative cis-acting elements of OsWRKY13-influenced genes were analyzed using the whole genome expression profiling of OsWRKY13-activated plants generated with the Affymetrix GeneChip Rice Genome Array. At least 39 transcription factor genes were influenced by OsWRKY13, and 30 of them were downregulated. The promoters of OsWRKY13-upregulated genes were overrepresented with W-boxes for WRKY protein binding, whereas the promoters of OsWRKY13-downregulated genes were enriched with cis-elements putatively for binding of MYB and AP2/EREBP types of transcription factors. Consistent with the distinctive distribution of these cis-elements in up- and downregulated genes, nine WRKY genes were influenced by OsWRKY13 and the promoters of five of them were bound by OsWRKY13 in vitro; all seven differentially expressed AP2/EREBP genes and six of the seven differentially expressed MYB genes were suppressed by in OsWRKY13-activated plants. A subset of OsWRKY13-influenced WRKY genes were involved in host-pathogen interactions. CONCLUSION These results suggest that OsWRKY13-mediated signalling pathways are partitioned by different transcription factors. WRKY proteins may play important roles in the monitoring of OsWRKY13-upregulated genes and genes involved in pathogen-induced defence responses, whereas MYB and AP2/EREBP proteins may contribute most to the control of OsWRKY13-downregulated genes.This work was supported by grants from the National Program of High Technology Development of China, the National Program on the Development of Basic Research in China, and the National Natural Science Foundation of China
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
Recommended from our members
Aptamers in oncology: a diagnostic perspective
Nucleic acid sequences can produce a wide variety of three-dimensional conformations. Some of these structural forms are able to interact with proteins and small molecules with high affinity and specificity. These sequences, comprising either double or single stranded oligonucleotides, are called 'aptamers' based on the Greek word aptus, which means 'to fit'. Using an efficient selection process, randomised oligonucleotide libraries can be rapidly screened for aptamers with the appropriate binding characteristics. This technology has spawned the development of a new class of oligonucleotide therapeutic products. However, while interest among pharmaceutical companies continues to grow with some candidates already in clinical trials and one in the market, there appears to be some reluctance to fully explore the diagnostic potential of this technology. This article will review aptamer developments in diagnostics, compare them with other oligonucleotide therapeutics and highlight both potentials and pitfalls of technological development in this area
Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition
Sparse representation based classification (SRC) methods have achieved
remarkable results. SRC, however, still suffer from requiring enough training
samples, insufficient use of test samples and instability of representation. In
this paper, a stable inverse projection representation based classification
(IPRC) is presented to tackle these problems by effectively using test samples.
An IPR is firstly proposed and its feasibility and stability are analyzed. A
classification criterion named category contribution rate is constructed to
match the IPR and complete classification. Moreover, a statistical measure is
introduced to quantify the stability of representation-based classification
methods. Based on the IPRC technique, a robust tumor recognition framework is
presented by interpreting microarray gene expression data, where a two-stage
hybrid gene selection method is introduced to select informative genes.
Finally, the functional analysis of candidate's pathogenicity-related genes is
given. Extensive experiments on six public tumor microarray gene expression
datasets demonstrate the proposed technique is competitive with
state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table
miR-196b target screen reveals mechanisms maintaining leukemia stemness with therapeutic potential.
We have shown that antagomiR inhibition of miRNA miR-21 and miR-196b activity is sufficient to ablate MLL-AF9 leukemia stem cells (LSC) in vivo. Here, we used an shRNA screening approach to mimic miRNA activity on experimentally verified miR-196b targets to identify functionally important and therapeutically relevant pathways downstream of oncogenic miRNA in MLL-r AML. We found Cdkn1b (p27Kip1) is a direct miR-196b target whose repression enhanced an embryonic stem cell–like signature associated with decreased leukemia latency and increased numbers of leukemia stem cells in vivo. Conversely, elevation of p27Kip1 significantly reduced MLL-r leukemia self-renewal, promoted monocytic differentiation of leukemic blasts, and induced cell death. Antagonism of miR-196b activity or pharmacologic inhibition of the Cks1-Skp2–containing SCF E3-ubiquitin ligase complex increased p27Kip1 and inhibited human AML growth. This work illustrates that understanding oncogenic miRNA target pathways can identify actionable targets in leukemia
Gene selection and classification for cancer microarray data based on machine learning and similarity measures
<p>Abstract</p> <p>Background</p> <p>Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.</p> <p>Results</p> <p>To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others.</p> <p>Conclusions</p> <p>On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.</p
- …