122 research outputs found

    Match-Only Integral Distribution (MOID) Algorithm for high-density oligonucleotide array analysis

    Get PDF
    BACKGROUND: High-density oligonucleotide arrays have become a valuable tool for high-throughput gene expression profiling. Increasing the array information density and improving the analysis algorithms are two important computational research topics. RESULTS: A new algorithm, Match-Only Integral Distribution (MOID), was developed to analyze high-density oligonucleotide arrays. Using known data from both spiking experiments and no-change experiments performed with Affymetrix GeneChip(ยฎ) arrays, MOID and the Affymetrix algorithm implemented in Microarray Suite 4.0 (MAS4) were compared. While MOID gave similar performance to MAS4 in the spiking experiments, better performance was observed in the no-change experiments. MOID also provides a set of alternative statistical analysis tools to MAS4. There are two main features that distinguish MOID from MAS4. First, MOID uses continuous P values for the likelihood of gene presence, while MAS4 resorts to discrete absolute calls. Secondly, MOID uses heuristic confidence intervals for both gene expression levels and fold change values, while MAS4 categorizes the significance of gene expression level changes into discrete fold change calls. CONCLUSIONS: The results show that by using MOID, Affymetrix GeneChip(ยฎ) arrays may need as little as ten probes per gene without compromising analysis accuracy

    HopPG: Self-Iterative Program Generation for Multi-Hop Question Answering over Heterogeneous Knowledge

    Full text link
    The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However,traditional semantic parsing methods usually generate a complete program before executing it, which struggles with multi-hop question answering over heterogeneous knowledge. Firstly,a complete multi-hop program relies on multiple heterogeneous supporting facts, and it is difficult for models to receive these facts simultaneously. Secondly,these methods ignore the interaction information between the previous-hop execution result and the current-hop program generation. To alleviate these challenges, we propose a self-iterative framework for multi-hop program generation (HopPG) over heterogeneous knowledge, which leverages the previous-hop execution results to retrieve supporting facts and generate subsequent programs iteratively. We evaluate our model on MMQA-T^2. The experimental results show that HopPG outperforms existing semantic-parsing-based baselines, especially on the multi-hop questions

    Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays

    Get PDF
    BACKGROUND: To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays. RESULTS: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2โ€“3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6โ€“9 replicates in detecting at least two-fold change. CONCLUSIONS: Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data

    Study of gene function based on spatial co-expression in a high-resolution mouse brain atlas

    Get PDF
    BACKGROUND: The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions. RESULTS: In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories. CONCLUSION: Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide valuable data source for characterizing brain-related gene functions when armed with powerful image querying tools like HRC

    In silico discovery of transcription regulatory elements in Plasmodium falciparum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the sequence of the <it>Plasmodium falciparum </it>genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of new anti-malarials. To date, relatively little is known regarding the specific mechanisms the parasite employs to regulate gene expression at the mRNA level, with studies of the <it>P. falciparum </it>genome sequence having revealed few <it>cis</it>-regulatory elements and associated transcription factors. Although it is possible the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotic organisms, the extreme AT-rich nature of <it>P. falciparum </it>intergenic regions (~90% AT) presents significant challenges to <it>in silico cis</it>-regulatory element discovery.</p> <p>Results</p> <p>We have developed an algorithm called Gene Enrichment Motif Searching (GEMS) that uses a hypergeometric-based scoring function and a position-weight matrix optimization routine to identify with high-confidence regulatory elements in the nucleotide-biased and repeat sequence-rich <it>P. falciparum </it>genome. When applied to promoter regions of genes contained within 21 co-expression gene clusters generated from <it>P. falciparum </it>life cycle microarray data using the semi-supervised clustering algorithm Ontology-based Pattern Identification, GEMS identified 34 putative <it>cis</it>-regulatory elements associated with a variety of parasite processes including sexual development, cell invasion, antigenic variation and protein biosynthesis. Among these candidates were novel motifs, as well as many of the elements for which biological experimental evidence already exists in the <it>Plasmodium </it>literature. To provide evidence for the biological relevance of a cell invasion-related element predicted by GEMS, reporter gene and electrophoretic mobility shift assays were conducted.</p> <p>Conclusion</p> <p>This GEMS analysis demonstrates that <it>in silico </it>regulatory element discovery can be successfully applied to challenging repeat-sequence-rich, base-biased genomes such as that of <it>P. falciparum</it>. The fact that regulatory elements were predicted from a diverse range of functional gene clusters supports the hypothesis that <it>cis</it>-regulatory elements play a role in the transcriptional control of many <it>P. falciparum </it>biological processes. The putative regulatory elements described represent promising candidates for future biological investigation into the underlying transcriptional control mechanisms of gene regulation in malaria parasites.</p

    Identification of Small Molecule and Genetic Modulators of AON-Induced Dystrophin Exon Skipping by High-Throughput Screening

    Get PDF
    One therapeutic approach to Duchenne Muscular Dystrophy (DMD) recently entering clinical trials aims to convert DMD phenotypes to that of a milder disease variant, Becker Muscular Dystrophy (BMD), by employing antisense oligonucleotides (AONs) targeting splice sites, to induce exon skipping and restore partial dystrophin function. In order to search for small molecule and genetic modulators of AON-dependent and independent exon skipping, we screened โˆผ10,000 known small molecule drugs, >17,000 cDNA clones, and >2,000 kinase- targeted siRNAs against a 5.6 kb luciferase minigene construct, encompassing exon 71 to exon 73 of human dystrophin. As a result, we identified several enhancers of exon skipping, acting on both the reporter construct as well as endogenous dystrophin in mdx cells. Multiple mechanisms of action were identified, including histone deacetylase inhibition, tubulin modulation and pre-mRNA processing. Among others, the nucleolar protein NOL8 and staufen RNA binding protein homolog 2 (Stau2) were found to induce endogenous exon skipping in mdx cells in an AON-dependent fashion. An unexpected but recurrent theme observed in our screening efforts was the apparent link between the inhibition of cell cycle progression and the induction of exon skipping

    A Systems-Based Analysis of Plasmodium vivax Lifecycle Transcription from Human to Mosquito

    Get PDF
    Most of the 250 million malaria cases outside of Africa are caused by the parasite Plasmodium vivax. Although drugs can be used to treat P. vivax malaria, drug resistance is spreading and there is no available vaccine. Because this species cannot be readily grown in the laboratory there are added challenges to understanding the function of the many hypothetical genes in the genome. We isolated transcriptional messages from parasites growing in human blood and in mosquitoes, labeled the messages and measured how their levels for different parasite growth conditions. The data for 5,419 parasite genes shows extensive changes as the parasite moves between human and mosquito and reveals highly expressed genes whose proteins might represent new therapeutic targets for experimental vaccines. We discover sets of genes that are likely to play a role in the earliest stages of hepatocyte infection. We find intriguing differences in the expression patterns of different blood stage parasites that may be related to host-response status
    • โ€ฆ
    corecore