178 research outputs found

    Beyond the E-value: stratified statistics for protein domain prediction

    Full text link
    E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we formally show that for "stratified" multiple hypothesis testing problems, controlling the local False Discovery Rate (lFDR) per stratum, or partition, yields the most predictions across the data at any given threshold on the FDR or E-value over all strata combined. For the important problem of protein domain prediction, a key step in characterizing protein structure, function and evolution, we show that stratifying statistical tests by domain family yields excellent results. We develop the first FDR-estimating algorithms for domain prediction, and evaluate how well thresholds based on q-values, E-values and lFDRs perform in domain prediction using five complementary approaches for estimating empirical FDRs in this context. We show that stratified q-value thresholds substantially outperform E-values. Contradicting our theoretical results, q-values also outperform lFDRs; however, our tests reveal a small but coherent subset of domain families, biased towards models for specific repetitive patterns, for which FDRs are greatly underestimated due to weaknesses in random sequence models. Usage of lFDR thresholds outperform q-values for the remaining families, which have as-expected noise, suggesting that further improvements in domain predictions can be achieved with improved modeling of random sequences. Overall, our theoretical and empirical findings suggest that the use of stratified q-values and lFDRs could result in improvements in a host of structured multiple hypothesis testing problems arising in bioinformatics, including genome-wide association studies, orthology prediction, motif scanning, and multi-microarray analyses.Comment: 31 pages, 8 figures, does not include supplementary file

    Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.</p> <p>Results</p> <p>Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae </it>and the human malaria parasite species <it>Plasmodium falciparum</it>. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for <it>P. falciparum </it>whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.</p> <p>Conclusion</p> <p>OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.</p

    Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains

    Get PDF
    Gene expression patterns have been demonstrated to be highly variable between similar cell types, for example lab strains and wild strains of Saccharomyces cerevisiae cultured under identical growth conditions exhibit a wide range of expression differences. We have used a genome-wide approach to characterize transcriptional differences between strains of Plasmodium falciparum by characterizing the transcriptome of the 48 h intraerythrocytic developmental cycle (IDC) for two strains, 3D7 and Dd2 and compared these results to our prior work using the HB3 strain. These three strains originate from geographically diverse locations and possess distinct drug sensitivity phenotypes. Our goal was to identify transcriptional differences related to phenotypic properties of these strains including immune evasion and drug sensitivity. We find that the highly streamlined transcriptome is remarkably well conserved among all three strains, and differences in gene expression occur mainly in genes coding for surface antigens involved in parasite–host interactions. Our analysis also detects several transcripts that are unique to individual strains as well as identifying large chromosomal deletions and highly polymorphic regions across strains. The majority of these genes are uncharacterized and have no homology to other species. These tractable transcriptional differences provide important phenotypes for these otherwise highly related strains of Plasmodium

    Adaptive interfaces for people with special needs

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-02481-8_117Proceedings of 10th International Work-Conference on Artificial Neural Networks, IWANN 2009 Workshops, Salamanca, Spain, June 10-12, 2009This paper covers those aspects of modern interfaces which expand and enhance the way in which people interact with computers, like multi-touch table systems, presence-detection led displays and interactive virtualized real-life environments. It elaborates on how disabled or conditioned people take great advantage of natural interaction as interfaces adapt to their needs; interfaces which can be focused towards memory, cognitive or physical deficiencies. Applications size-up to serve specific users with customized tools and options, and are aware while taking into account the state and situation of the individual.This work has been partly funded by HADA project number TIN2007 – 64718

    Transcriptional profiling defines histone acetylation as a regulator of gene expression during human-to-mosquito transmission of the malaria parasite Plasmodium falciparum

    Get PDF
    Transmission of the malaria parasite Plasmodium falciparum from the human to the mosquito is mediated by the intraerythrocytic gametocytes, which, once taken up during a blood meal, become activated to initiate sexual reproduction. Because gametocytes are the only parasite stages able to establish an infection in the mosquito, they are crucial for spreading the tropical disease. During gametocyte maturation, different repertoires of genes are switched on and off in a well-coordinated sequence, pointing to regulatory mechanisms of gene expression. While epigenetic gene control has been studied during erythrocytic schizogony of P. falciparum, little is known about this process during human-to-mosquito transmission of the parasite. To unveil the potential role of histone acetylation during gene expression in gametocytes, we carried out a microarray-based transcriptome analysis on gametocytes treated with the histone deacetylase inhibitor trichostatin A (TSA). TSA-treatment impaired gametocyte maturation and lead to histone hyper-acetylation in these stages. Comparative transcriptomics identified 294 transcripts, which were more than 2-fold up-regulated during gametocytogenesis following TSA-treatment. In activated gametocytes, which were less sensitive to TSA, the transcript levels of 48 genes were increased. TSA-treatment further led to repression of ~145 genes in immature and mature gametocytes and 7 genes in activated gametocytes. Up-regulated genes are mainly associated with functions in invasion, cytoadherence, and protein export, while down-regulated genes could particularly be assigned to transcription and translation. Chromatin immunoprecipitation demonstrated a link between gene activation and histone acetylation for selected genes. Among the genes up-regulated in TSA-treated mature gametocytes was a gene encoding the ring finger (RING)-domain protein PfRNF1, a putative E3 ligase of the ubiquitin-mediated signaling pathway. Immunochemistry demonstrated PfRNF1 expression mainly in the sexual stages of P. falciparum with peak expression in stage II gametocytes, where the protein localized to the nucleus and cytoplasm. Pfrnf1 promoter and coding regions associated with acetylated histones, and TSA-treatment resulted in increased PfRNF1 levels. Our combined data point to an essential role of histone acetylation for gene regulation in gametocytes, which can be exploited for malaria transmission-blocking interventions

    Unprecedented plant species loss after a decade in fragmented subtropical Chaco Serrano forests.

    Get PDF
    Current biodiversity loss is mostly caused by anthropogenic habitat loss and fragmentation, climate change, and resource exploitation. Measuring the balance of species loss and gain in remaining fragmented landscapes throughout time entails a central research challenge. We resurveyed in 2013 plant species richness in the same plots of a previous sampling conducted in 2003 across 18 forest fragments of different sizes of the Chaco Serrano forest in Argentina. While the area of these forest remnants was kept constant, their surrounding forest cover changed over this time period. We compared plant species richness of both sampling years and calculated the proportion of species loss and gain at forest edges and interiors. As in 2003, we found a positive relationship between fragment area and plant richness in 2013 and both years showed a similar slope. However, we detected a net decrease of 24% of species’ richness across all forest fragments, implying an unprecedentedly high rate and magnitude of species loss driven mainly by non-woody, short-lived species. There was a higher proportion of lost and gained species at forest edges than in forest interiors. Importantly, fragment area interacted with percent change in surrounding forest cover to explain the proportion of species lost. Small forest fragments showed a relatively constant proportion of species loss regardless of any changes in surrounding forest cover, whereas in larger fragments the proportion of species lost increased when surrounding forest cover decreased. We show that despite preserving fragment area, habitat quality and availability in the surroundings is of fundamental importance in shaping extinction and immigration dynamics of plant species at any given forest remnant. Because the Chaco Serrano forest has already lost 94% of its original cover, we argue that plant extinctions will continue through the coming decades unless active management actions are taken to increase native forest areas.publishedVersio
    corecore