167 research outputs found
Parametric inference of recombination in HIV genomes
Recombination is an important event in the evolution of HIV. It affects the
global spread of the pandemic as well as evolutionary escape from host immune
response and from drug therapy within single patients. Comprehensive
computational methods are needed for detecting recombinant sequences in large
databases, and for inferring the parental sequences.
We present a hidden Markov model to annotate a query sequence as a
recombinant of a given set of aligned sequences. Parametric inference is used
to determine all optimal annotations for all parameters of the model. We show
that the inferred annotations recover most features of established hand-curated
annotations. Thus, parametric analysis of the hidden Markov model is feasible
for HIV full-length genomes, and it improves the detection and annotation of
recombinant forms.
All computational results, reference alignments, and C++ source code are
available at http://bio.math.berkeley.edu/recombination/.Comment: 20 pages, 5 figure
Crafty Sailors, Unruly Seas: Margaret Cohen’s Oceanic History of the Novel
The Novel and the Sea by Margaret Cohen. Translation/Transnation, edited by Emily Apter. Princeton, NJ: University of Princeton Press, 2010. Pp. xiii + 306, 30 illustrations. $39.50 cloth
Parametric Alignment of Drosophila Genomes
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a
maximum a posteriori probability alignment for a pair hidden Markov model
(PHMM). In order to process large genomes that have undergone complex genome
rearrangements, almost all existing whole genome alignment methods apply fast
heuristics to divide genomes into small pieces which are suitable for
Needleman--Wunsch alignment. In these alignment methods, it is standard
practice to fix the parameters and to produce a single alignment for subsequent
analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of
Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment
resolves the issue of robustness to changes in parameters by finding all
optimal alignments for all possible parameters in a PHMM. Our alignment draws
on existing heuristics for dividing whole genomes into small pieces for
alignment, and it relies on advances we have made in computing convex polytopes
that allow us to parametrically align non-coding regions using biologically
realistic models. We demonstrate the utility of our parametric alignment for
biological inference by showing that cis-regulatory elements are more conserved
between Drosophila melanogaster and Drosophila pseudoobscura than previously
thought. We also show how whole genome parametric alignment can be used to
quantitatively assess the dependence of branch length estimates on alignment
parameters.
The alignment polytopes, software, and supplementary material can be
downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure
Gene tree reconciliation: new developments in Bayesian concordance analysis with BUCKy
When phylogenetic trees inferred from different genes are incongruent, several methods are available to reconcile gene trees and extract the shared phylogenetic information from the sequence data. Bayesian Concordance Analysis, implemented in BUCKy, aims to extract the vertical signal and to infer clusters of genes that share the same tree topology. The new version of BUCKy includes a quartet-based estimate of the species tree with branch lengths in coalescent units
Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns
BACKGROUND: The signals that determine the specificity and efficiency of splicing are multiple and complex, and are not fully understood. Among other factors, the relative contributions of different mechanisms appear to depend on intron size inasmuch as long introns might hinder the activity of the spliceosome through interference with the proper positioning of the intron-exon junctions. Indeed, it has been shown that the information content of splice sites positively correlates with intron length in the nematode, Drosophila, and fungi. We explored the connections between the length of vertebrate introns, the strength of splice sites, exonic splicing signals, and evolution of flanking exons. RESULTS: A compensatory relationship is shown to exist between different types of signals, namely, the splice sites and the exonic splicing enhancers (ESEs). In the range of relatively short introns (approximately, < 1.5 kilobases in length), the enhancement of the splicing signals for longer introns was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and instead, an increase in the strength of the donor and acceptor splice sites was observed. Conceivably, accumulation of A-rich ESE motifs beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution, which leads to compensation in the form of evolution of the splice sites themselves toward greater strength. In addition, however, a correlation between sequence conservation in the exon ends and intron length, particularly, in synonymous positions, was observed throughout the entire length range of introns. Thus, splicing signals other than the currently defined ESEs, i.e., potential new classes of ESEs, might exist in exon sequences, particularly, those that flank long introns. CONCLUSION: Several weak but statistically significant correlations were observed between vertebrate intron length, splice site strength, and potential exonic splicing signals. Taken together, these findings attest to a compensatory relationship between splice sites and exonic splicing signals, depending on intron length
Detectability of Varied Hybridization Scenarios using Genome-Scale Hybrid Detection Methods
Hybridization events complicate the accurate reconstruction of phylogenies,
as they lead to patterns of genetic heritability that are unexpected under
traditional, bifurcating models of species trees. This has led to the
development of methods to infer these varied hybridization events, both methods
that reconstruct networks directly, and summary methods that predict individual
hybridization events. However, a lack of empirical comparisons between methods
- especially pertaining to large networks with varied hybridization scenarios -
hinders their practical use. Here, we provide a comprehensive review of popular
summary methods: TICR, MSCquartets, HyDe, Patterson's D-Statistic (ABBA-BABA),
D3, and Dp. TICR and MSCquartets are based on quartet concordance factors
gathered from gene tree topologies and Patterson's D-Statistic, D3, and Dp use
site pattern frequencies to identify hybridization events. We then use
simulated data to address questions of method accuracy and ideal use scenarios
by testing methods against complex networks which depict gene flow events that
differ in depth (timing), quantity (single vs. multiple, overlapping
hybridizations), and rate of gene flow. We find that deeper or multiple
hybridization events may introduce noise and weaken the signal of
hybridization, leading to higher false negative rates across methods. Despite
some forms of hybridization eluding quartet-based detection methods,
MSCquartets displays high precision in most scenarios. While HyDe results in
high false negative rates when tested on hybridizations involving ghost
lineages, HyDe is the only method to be able to separate hybrid vs parent
signals. Lastly, we test the methods on ultraconserved elements from the bee
subfamily Nomiinae, finding the possibility of hybridization events between
clades which correspond to regions of poor support in the species tree
estimated in the original study
Reflective Diary for Professional Development of Novice Teachers
Many starting teachers of computer science have great professional skill but often lack pedagogical training. Since providing expert mentorship directly during their lessons would be quite costly, institutions usually offer separate teacher training sessions for novice instructors. However, the reflection on teaching performed with a significant delay after the taught lesson limits the possible impact on teachers. To bridge this gap, we introduced a weekly semi-structured reflective practice to supplement the teacher training sessions at our faculty. We created a paper diary that guides the starting teachers through the process of reflection. Over the course of the semester, the diary poses questions of increasing complexity while also functioning as a reference to the topics covered in teacher training. Piloting the diary on a group of 25 novice teaching assistants resulted in overwhelmingly positive responses and provided the teacher training sessions with valuable input for discussion. The diary also turned out to be applicable in a broader context: it was appreciated and used by several experienced university teachers from multiple faculties and even some high-school teachers. The diary is freely available online, including source and print versions
Introducing Solid Foods to Infants in the Asia Pacific Region
For infants’ optimal growth and development, the introduction of nutritionally suitable solid foods at the appropriate time is essential. However, less attention has been paid to this stage of infant life when compared with studies on breastfeeding initiation and duration. The practice of introducing solid foods, including the types of foods given to infants, in the Asia Pacific region was reviewed. In total nine studies using the same questionnaire on infant feeding practices were analysed to gain a better understanding of trends in the introduction of solid foods in this region. All studies showed less than optimal duration of exclusive breastfeeding indicating an earlier time of introduction of solid foods than recommended by the WHO. Most mothers commonly used rice or rice products as the first feed. In many studies, the timing of introducing solid foods was associated with breastfeeding duration. Compared with the Recommended Nutrient Intakes for infants aged above six months, rice/rice products are of lower energy density and have insufficient micronutrients unless they have been fortified. Although the timing of introducing solid foods to infants is important in terms of preventing later health problems, the quality of the foods should also be considered. Recommendations to improve the introduction of solid foods include measures to discourage prelacteal feeding, facilitating breastfeeding education and providing better information on healthier food choices for infants
A Genome-Wide Map of Conserved MicroRNA Targets in C. elegans
SummaryBackgroundMetazoan miRNAs regulate protein-coding genes by binding the 3′ UTR of cognate mRNAs. Identifying targets for the 115 known C. elegans miRNAs is essential for understanding their function.ResultsBy using a new version of PicTar and sequence alignments of three nematodes, we predict that miRNAs regulate at least 10% of C. elegans genes through conserved interactions. We have developed a new experimental pipeline to assay 3′ UTR-mediated posttranscriptional gene regulation via an endogenous reporter expression system amenable to high-throughput cloning, demonstrating the utility of this system using one of the most intensely studied miRNAs, let-7. Our expression analyses uncover several new potential let-7 targets and suggest a new let-7 activity in head muscle and neurons. To explore genome-wide trends in miRNA function, we analyzed functional categories of predicted target genes, finding that one-third of C. elegans miRNAs target gene sets are enriched for specific functional annotations. We have also integrated miRNA target predictions with other functional genomic data from C. elegans.ConclusionsAt least 10% of C. elegans genes are predicted miRNA targets, and a number of nematode miRNAs seem to regulate biological processes by targeting functionally related genes. We have also developed and successfully utilized an in vivo system for testing miRNA target predictions in likely endogenous expression domains. The thousands of genome-wide miRNA target predictions for nematodes, humans, and flies are available from the PicTar website and are linked to an accessible graphical network-browsing tool allowing exploration of miRNA target predictions in the context of various functional genomic data resources
RNA-Seq gene expression estimation with read mapping uncertainty
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically
- …