439 research outputs found

    Lassoing and corraling rooted phylogenetic trees

    Full text link
    The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information which raises the following fundamental question. For what subset of its leaf set can we reconstruct uniquely the dendogram from the distances that it induces on that subset. By formalizing a dendogram in terms of an edge-weighted, rooted phylogenetic tree on a pre-given finite set X with |X|>2 whose edge-weighting is equidistant and a set of partial distances on X in terms of a set L of 2-subsets of X, we investigate this problem in terms of when such a tree is lassoed, that is, uniquely determined by the elements in L. For this we consider four different formalizations of the idea of "uniquely determining" giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary then all four types of lasso must coincide

    Hot topics, urgent priorities, and ensuring success for racial/ethnic minority young investigators in academic pediatrics.

    Get PDF
    BackgroundThe number of racial/ethnic minority children will exceed the number of white children in the USA by 2018. Although 38% of Americans are minorities, only 12% of pediatricians, 5% of medical-school faculty, and 3% of medical-school professors are minorities. Furthermore, only 5% of all R01 applications for National Institutes of Health grants are from African-American, Latino, and American Indian investigators. Prompted by the persistent lack of diversity in the pediatric and biomedical research workforces, the Academic Pediatric Association Research in Academic Pediatrics Initiative on Diversity (RAPID) was initiated in 2012. RAPID targets applicants who are members of an underrepresented minority group (URM), disabled, or from a socially, culturally, economically, or educationally disadvantaged background. The program, which consists of both a research project and career and leadership development activities, includes an annual career-development and leadership conference which is open to any resident, fellow, or junior faculty member from an URM, disabled, or disadvantaged background who is interested in a career in academic general pediatrics.MethodsAs part of the annual RAPID conference, a Hot Topic Session is held in which the young investigators spend several hours developing a list of hot topics on the most useful faculty and career-development issues. These hot topics are then posed in the form of six "burning questions" to the RAPID National Advisory Committee (comprised of accomplished, nationally recognized senior investigators who are seasoned mentors), the RAPID Director and Co-Director, and the keynote speaker.Results/conclusionsThe six compelling questions posed by the 10 young investigators-along with the responses of the senior conference leadership-provide a unique resource and "survival guide" for ensuring the academic success and optimal career development of young investigators in academic pediatrics from diverse backgrounds. A rich conversation ensued on the topics addressed, consisting of negotiating for protected research time, career trajectories as academic institutions move away from an emphasis on tenure-track positions, how "non-academic" products fit into career development, racism and discrimination in academic medicine and how to address them, coping with isolation as a minority faculty member, and how best to mentor the next generation of academic physicians

    Evolution of Genes Neighborhood Within Reconciled Phylogenies: An Ensemble Approach

    Get PDF
    Context The reconstruction of evolutionary scenarios for whole genomes in terms of genome rearrangements is a fundamental problem in evolutionary and comparative genomics. The DeCo algorithm, recently introduced by Bérard et al., computes parsimonious evolutionary scenarios for gene adjacencies, from pairs of reconciled gene trees. However, as for many combinatorial optimization algorithms, there can exist many co-optimal, or slightly sub-optimal, evolutionary scenarios that deserve to be considered. Contribution We extend the DeCo algorithm to sample evolutionary scenarios from the whole solution space under the Boltzmann distribution, and also to compute Boltzmann probabilities for specific ancestral adjacencies. Results We apply our algorithms to a dataset of mammalian gene trees and adjacencies, and observe a significant reduction of the number of syntenic conflicts observed in the resulting ancestral gene adjacencies

    Likelihood Geometry

    Full text link
    We study the critical points of monomial functions over an algebraic subset of the probability simplex. The number of critical points on the Zariski closure is a topological invariant of that embedded projective variety, known as its maximum likelihood degree. We present an introduction to this theory and its statistical motivations. Many favorite objects from combinatorial algebraic geometry are featured: toric varieties, A-discriminants, hyperplane arrangements, Grassmannians, and determinantal varieties. Several new results are included, especially on the likelihood correspondence and its bidegree. These notes were written for the second author's lectures at the CIME-CIRM summer course on Combinatorial Algebraic Geometry at Levico Terme in June 2013.Comment: 45 pages; minor changes and addition

    Shape-based peak identification for ChIP-Seq

    Get PDF
    We present a new algorithm for the identification of bound regions from ChIP-seq experiments. Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We demonstrate the accuracy of our method on existing datasets, and we show that it can discover previously missed regions and can more clearly discriminate between multiple binding events. The software T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://math.berkeley.edu/~vhower/tpic.htmlComment: 12 pages, 6 figure

    Viral population estimation using pyrosequencing

    Get PDF
    The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

    Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock

    Get PDF
    While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns
    • …
    corecore