178 research outputs found
Recommended from our members
Using context to improve protein domain identification
<p>Abstract</p> <p>Background</p> <p>Identifying domains in protein sequences is an important step in protein structural and functional annotation. Existing domain recognition methods typically evaluate each domain prediction independently of the rest. However, the majority of proteins are multidomain, and pairwise domain co-occurrences are highly specific and non-transitive.</p> <p>Results</p> <p>Here, we demonstrate how to exploit domain co-occurrence to boost weak domain predictions that appear in previously observed combinations, while penalizing higher confidence domains if such combinations have never been observed. Our framework, Domain Prediction Using Context (dPUC), incorporates pairwise "context" scores between domains, along with traditional domain scores and thresholds, and improves domain prediction across a variety of organisms from bacteria to protozoa and metazoa. Among the genomes we tested, dPUC is most successful at improving predictions for the poorly-annotated malaria parasite <it>Plasmodium falciparum</it>, for which over 38% of the genome is currently unannotated. Our approach enables high-confidence annotations in this organism and the identification of orthologs to many core machinery proteins conserved in all eukaryotes, including those involved in ribosomal assembly and other RNA processing events, which surprisingly had not been previously known.</p> <p>Conclusions</p> <p>Overall, our results demonstrate that this new context-based approach will provide significant improvements in domain and function prediction, especially for poorly understood genomes for which the need for additional annotations is greatest. Source code for the algorithm is available under a GPL open source license at <url>http://compbio.cs.princeton.edu/dpuc/</url>. Pre-computed results for our test organisms and a web server are also available at that location.</p
Beyond the E-value: stratified statistics for protein domain prediction
E-values have been the dominant statistic for protein sequence analysis for
the past two decades: from identifying statistically significant local sequence
alignments to evaluating matches to hidden Markov models describing protein
domain families. Here we formally show that for "stratified" multiple
hypothesis testing problems, controlling the local False Discovery Rate (lFDR)
per stratum, or partition, yields the most predictions across the data at any
given threshold on the FDR or E-value over all strata combined. For the
important problem of protein domain prediction, a key step in characterizing
protein structure, function and evolution, we show that stratifying statistical
tests by domain family yields excellent results. We develop the first
FDR-estimating algorithms for domain prediction, and evaluate how well
thresholds based on q-values, E-values and lFDRs perform in domain prediction
using five complementary approaches for estimating empirical FDRs in this
context. We show that stratified q-value thresholds substantially outperform
E-values. Contradicting our theoretical results, q-values also outperform
lFDRs; however, our tests reveal a small but coherent subset of domain
families, biased towards models for specific repetitive patterns, for which
FDRs are greatly underestimated due to weaknesses in random sequence models.
Usage of lFDR thresholds outperform q-values for the remaining families, which
have as-expected noise, suggesting that further improvements in domain
predictions can be achieved with improved modeling of random sequences.
Overall, our theoretical and empirical findings suggest that the use of
stratified q-values and lFDRs could result in improvements in a host of
structured multiple hypothesis testing problems arising in bioinformatics,
including genome-wide association studies, orthology prediction, motif
scanning, and multi-microarray analyses.Comment: 31 pages, 8 figures, does not include supplementary file
Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy
<p>Abstract</p> <p>Background</p> <p>The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.</p> <p>Results</p> <p>Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae </it>and the human malaria parasite species <it>Plasmodium falciparum</it>. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for <it>P. falciparum </it>whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.</p> <p>Conclusion</p> <p>OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.</p
Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains
Gene expression patterns have been demonstrated to be highly variable between similar cell types, for example lab strains and wild strains of Saccharomyces cerevisiae cultured under identical growth conditions exhibit a wide range of expression differences. We have used a genome-wide approach to characterize transcriptional differences between strains of Plasmodium falciparum by characterizing the transcriptome of the 48 h intraerythrocytic developmental cycle (IDC) for two strains, 3D7 and Dd2 and compared these results to our prior work using the HB3 strain. These three strains originate from geographically diverse locations and possess distinct drug sensitivity phenotypes. Our goal was to identify transcriptional differences related to phenotypic properties of these strains including immune evasion and drug sensitivity. We find that the highly streamlined transcriptome is remarkably well conserved among all three strains, and differences in gene expression occur mainly in genes coding for surface antigens involved in parasite–host interactions. Our analysis also detects several transcripts that are unique to individual strains as well as identifying large chromosomal deletions and highly polymorphic regions across strains. The majority of these genes are uncharacterized and have no homology to other species. These tractable transcriptional differences provide important phenotypes for these otherwise highly related strains of Plasmodium
Adaptive interfaces for people with special needs
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-02481-8_117Proceedings of 10th International Work-Conference on Artificial Neural Networks, IWANN 2009 Workshops, Salamanca, Spain, June 10-12, 2009This paper covers those aspects of modern interfaces which expand and enhance the way in which people interact with computers, like multi-touch table systems, presence-detection led displays and interactive virtualized real-life environments. It elaborates on how disabled or conditioned people take great advantage of natural interaction as interfaces adapt to their needs; interfaces which can be focused towards memory, cognitive or physical deficiencies. Applications size-up to serve specific users with customized tools and options, and are aware while taking into account the state and situation of the individual.This work has been partly funded by HADA project number TIN2007 – 64718
Transcriptional profiling defines histone acetylation as a regulator of gene expression during human-to-mosquito transmission of the malaria parasite Plasmodium falciparum
Transmission of the malaria parasite Plasmodium falciparum from the human to the mosquito is mediated by the intraerythrocytic gametocytes, which, once taken up during a blood meal, become activated to initiate sexual reproduction. Because gametocytes are the only parasite stages able to establish an infection in the mosquito, they are crucial for spreading the tropical disease. During gametocyte maturation, different repertoires of genes are switched on and off in a well-coordinated sequence, pointing to regulatory mechanisms of gene expression. While epigenetic gene control has been studied during erythrocytic schizogony of P. falciparum, little is known about this process during human-to-mosquito transmission of the parasite. To unveil the potential role of histone acetylation during gene expression in gametocytes, we carried out a microarray-based transcriptome analysis on gametocytes treated with the histone deacetylase inhibitor trichostatin A (TSA). TSA-treatment impaired gametocyte maturation and lead to histone hyper-acetylation in these stages. Comparative transcriptomics identified 294 transcripts, which were more than 2-fold up-regulated during gametocytogenesis following TSA-treatment. In activated gametocytes, which were less sensitive to TSA, the transcript levels of 48 genes were increased. TSA-treatment further led to repression of ~145 genes in immature and mature gametocytes and 7 genes in activated gametocytes. Up-regulated genes are mainly associated with functions in invasion, cytoadherence, and protein export, while down-regulated genes could particularly be assigned to transcription and translation. Chromatin immunoprecipitation demonstrated a link between gene activation and histone acetylation for selected genes. Among the genes up-regulated in TSA-treated mature gametocytes was a gene encoding the ring finger (RING)-domain protein PfRNF1, a putative E3 ligase of the ubiquitin-mediated signaling pathway. Immunochemistry demonstrated PfRNF1 expression mainly in the sexual stages of P. falciparum with peak expression in stage II gametocytes, where the protein localized to the nucleus and cytoplasm. Pfrnf1 promoter and coding regions associated with acetylated histones, and TSA-treatment resulted in increased PfRNF1 levels. Our combined data point to an essential role of histone acetylation for gene regulation in gametocytes, which can be exploited for malaria transmission-blocking interventions
Unprecedented plant species loss after a decade in fragmented subtropical Chaco Serrano forests.
Current biodiversity loss is mostly caused by anthropogenic habitat loss and fragmentation, climate change, and resource exploitation. Measuring the balance of species loss and gain in remaining fragmented landscapes throughout time entails a central research challenge. We resurveyed in 2013 plant species richness in the same plots of a previous sampling conducted in 2003 across 18 forest fragments of different sizes of the Chaco Serrano forest in Argentina. While the area of these forest remnants was kept constant, their surrounding forest cover changed over this time period. We compared plant species richness of both sampling years and calculated the proportion of species loss and gain at forest edges and interiors. As in 2003, we found a positive relationship between fragment area and plant richness in 2013 and both years showed a similar slope. However, we detected a net decrease of 24% of species’ richness across all forest fragments, implying an unprecedentedly high rate and magnitude of species loss driven mainly by non-woody, short-lived species. There was a higher proportion of lost and gained species at forest edges than in forest interiors. Importantly, fragment area interacted with percent change in surrounding forest cover to explain the proportion of species lost. Small forest fragments showed a relatively constant proportion of species loss regardless of any changes in surrounding forest cover, whereas in larger fragments the proportion of species lost increased when surrounding forest cover decreased. We show that despite preserving fragment area, habitat quality and availability in the surroundings is of fundamental importance in shaping extinction and immigration dynamics of plant species at any given forest remnant. Because the Chaco Serrano forest has already lost 94% of its original cover, we argue that plant extinctions will continue through the coming decades unless active management actions are taken to increase native forest areas.publishedVersio
- …