521 research outputs found

    Convex Rank Tests and Semigraphoids

    Get PDF
    Convex rank tests are partitions of the symmetric group which have desirable geometric properties. The statistical tests defined by such partitions involve counting all permutations in the equivalence classes. Each class consists of the linear extensions of a partially ordered set specified by data. Our methods refine existing rank tests of non-parametric statistics, such as the sign test and the runs test, and are useful for exploratory analysis of ordinal data. We establish a bijection between convex rank tests and probabilistic conditional independence structures known as semigraphoids. The subclass of submodular rank tests is derived from faces of the cone of submodular functions, or from Minkowski summands of the permutohedron. We enumerate all small instances of such rank tests. Of particular interest are graphical tests, which correspond to both graphical models and to graph associahedra

    Lassoing and corraling rooted phylogenetic trees

    Full text link
    The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information which raises the following fundamental question. For what subset of its leaf set can we reconstruct uniquely the dendogram from the distances that it induces on that subset. By formalizing a dendogram in terms of an edge-weighted, rooted phylogenetic tree on a pre-given finite set X with |X|>2 whose edge-weighting is equidistant and a set of partial distances on X in terms of a set L of 2-subsets of X, we investigate this problem in terms of when such a tree is lassoed, that is, uniquely determined by the elements in L. For this we consider four different formalizations of the idea of "uniquely determining" giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary then all four types of lasso must coincide

    The Mystery of Two Straight Lines in Bacterial Genome Statistics. Release 2007

    Full text link
    In special coordinates (codon position--specific nucleotide frequencies) bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial genomes available in Genbank in April 2007, belong to these lines with high accuracy. The main challenge now is to explain the observed high accuracy. The new phenomenon of complementary symmetry for codon position--specific nucleotide frequencies is observed. The results of analysis of several codon usage models are presented. We demonstrate that the mean--field approximation, which is also known as context--free, or complete independence model, or Segre variety, can serve as a reasonable approximation to the real codon usage. The first two principal components of codon usage correlate strongly with genomic G+C content and the optimal growth temperature respectively. The variation of codon usage along the third component is related to the curvature of the mean-field approximation. First three eigenvalues in codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and archaeal genomes codon usage is clearly distributed along two third order curves with genomic G+C content as a parameter.Comment: Significantly extended version with new data for all the 348 distinct bacterial genomes available in Genbank in April 200

    Likelihood Geometry

    Full text link
    We study the critical points of monomial functions over an algebraic subset of the probability simplex. The number of critical points on the Zariski closure is a topological invariant of that embedded projective variety, known as its maximum likelihood degree. We present an introduction to this theory and its statistical motivations. Many favorite objects from combinatorial algebraic geometry are featured: toric varieties, A-discriminants, hyperplane arrangements, Grassmannians, and determinantal varieties. Several new results are included, especially on the likelihood correspondence and its bidegree. These notes were written for the second author's lectures at the CIME-CIRM summer course on Combinatorial Algebraic Geometry at Levico Terme in June 2013.Comment: 45 pages; minor changes and addition

    Hot topics, urgent priorities, and ensuring success for racial/ethnic minority young investigators in academic pediatrics.

    Get PDF
    BackgroundThe number of racial/ethnic minority children will exceed the number of white children in the USA by 2018. Although 38% of Americans are minorities, only 12% of pediatricians, 5% of medical-school faculty, and 3% of medical-school professors are minorities. Furthermore, only 5% of all R01 applications for National Institutes of Health grants are from African-American, Latino, and American Indian investigators. Prompted by the persistent lack of diversity in the pediatric and biomedical research workforces, the Academic Pediatric Association Research in Academic Pediatrics Initiative on Diversity (RAPID) was initiated in 2012. RAPID targets applicants who are members of an underrepresented minority group (URM), disabled, or from a socially, culturally, economically, or educationally disadvantaged background. The program, which consists of both a research project and career and leadership development activities, includes an annual career-development and leadership conference which is open to any resident, fellow, or junior faculty member from an URM, disabled, or disadvantaged background who is interested in a career in academic general pediatrics.MethodsAs part of the annual RAPID conference, a Hot Topic Session is held in which the young investigators spend several hours developing a list of hot topics on the most useful faculty and career-development issues. These hot topics are then posed in the form of six "burning questions" to the RAPID National Advisory Committee (comprised of accomplished, nationally recognized senior investigators who are seasoned mentors), the RAPID Director and Co-Director, and the keynote speaker.Results/conclusionsThe six compelling questions posed by the 10 young investigators-along with the responses of the senior conference leadership-provide a unique resource and "survival guide" for ensuring the academic success and optimal career development of young investigators in academic pediatrics from diverse backgrounds. A rich conversation ensued on the topics addressed, consisting of negotiating for protected research time, career trajectories as academic institutions move away from an emphasis on tenure-track positions, how "non-academic" products fit into career development, racism and discrimination in academic medicine and how to address them, coping with isolation as a minority faculty member, and how best to mentor the next generation of academic physicians

    Recognizing Treelike k-Dissimilarities

    Full text link
    A k-dissimilarity D on a finite set X, |X| >= k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edge-weighted trees T with leaf-set X: Given a subset Y of X of size k, D(Y) is defined to be the total length of the smallest subtree of T with leaf-set Y . In case k = 2, it is well-known that 2-dissimilarities arising in this way can be characterized by the so-called "4-point condition". However, in case k > 2 Pachter and Speyer recently posed the following question: Given an arbitrary k-dissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k >= 3 a k-dissimilarity on a set X arises from a tree if and only if its restriction to every 2k-element subset of X arises from some tree, and that 2k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomial-time algorithm to determine when a k-dissimilarity arises from a tree. We also give a 6-point condition for determining when a 3-dissimilarity arises from a tree, that is similar to the aforementioned 4-point condition.Comment: 18 pages, 4 figure

    Shape-based peak identification for ChIP-Seq

    Get PDF
    We present a new algorithm for the identification of bound regions from ChIP-seq experiments. Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We demonstrate the accuracy of our method on existing datasets, and we show that it can discover previously missed regions and can more clearly discriminate between multiple binding events. The software T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://math.berkeley.edu/~vhower/tpic.htmlComment: 12 pages, 6 figure

    Comparing theory and non-theory based implementation approaches to improving referral practices in cancer genetics: A cluster randomised trial protocol

    Get PDF
    © 2019 The Author(s). Background: Lynch syndrome (LS) is an inherited, cancer predisposition syndrome associated with an increased risk of colorectal, endometrial and other cancer types. Identifying individuals with LS allows access to cancer risk management strategies proven to reduce cancer incidence and improve survival. However, LS is underdiagnosed and genetic referral rates are poor. Improving LS referral is complex, and requires multisystem behaviour change. Although barriers have been identified, evidence-based strategies to facilitate behaviour change are lacking. The aim of this study is to compare the effectiveness of a theory-based implementation approach against a non-theory based approach for improving detection of LS amongst Australian patients with colorectal cancer (CRC). Methods: A two-arm parallel cluster randomised trial design will be used to compare two identical, structured implementation approaches, distinguished only by the use of theory to identify barriers and design targeted intervention strategies, to improve LS referral practices in eight large Australian hospital networks. Each hospital network will be randomly allocated to a trial arm, with stratification by state. A trained healthcare professional will lead the following phases at each site: (1) undertake baseline clinical practice audits, (2) form multidisciplinary Implementation Teams, (3) identify target behaviours for practice change, (4) identify barriers to change, (5) generate intervention strategies, (6) support staff to implement interventions and (7) evaluate the effectiveness of the intervention using post-implementation clinical data. The theoretical and non-theoretical components of each trial arm will be distinguished in phases 4-5. Study outcomes include a LS referral process map for each hospital network, with evaluation of the proportion of patients with risk-appropriate completion of the LS referral pathway within 2 months of CRC resection pre and post implementation. Discussion: This trial will determine the more effective approach for improving the detection of LS amongst patients with CRC, whilst also advancing understanding of the impact of theory-based implementation approaches in complex health systems and the feasibility of training healthcare professionals to use them. Insights gained will guide the development of future interventions to improve LS identification on a larger scale and across different contexts, as well as efforts to address the gap between evidence and practice in the rapidly evolving field of genomic research. Trial registration: ANZCTR, ACTRN12618001072202. Registered on 27 June 2018
    corecore