131 research outputs found

    Methods for Joint Normalization and Comparison of Hi-C data

    Get PDF
    The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)

    MetroSets: Visualizing Sets as Metro Maps

    Full text link
    We propose MetroSets, a new, flexible online tool for visualizing set systems using the metro map metaphor. We model a given set system as a hypergraph H=(V,S)\mathcal{H} = (V, \mathcal{S}), consisting of a set VV of vertices and a set S\mathcal{S}, which contains subsets of VV called hyperedges. Our system then computes a metro map representation of H\mathcal{H}, where each hyperedge EE in S\mathcal{S} corresponds to a metro line and each vertex corresponds to a metro station. Vertices that appear in two or more hyperedges are drawn as interchanges in the metro map, connecting the different sets. MetroSets is based on a modular 4-step pipeline which constructs and optimizes a path-based hypergraph support, which is then drawn and schematized using metro map layout algorithms. We propose and implement multiple algorithms for each step of the MetroSet pipeline and provide a functional prototype with \new{easy-to-use preset configurations.} % many real-world datasets. Furthermore, \new{using several real-world datasets}, we perform an extensive quantitative evaluation of the impact of different pipeline stages on desirable properties of the generated maps, such as octolinearity, monotonicity, and edge uniformity.Comment: 19 pages; accepted for IEEE INFOVIS 2020; for associated live system, see http://metrosets.ac.tuwien.ac.a

    Euler diagrams drawn with ellipses area‑proportionally (Edeap)

    Get PDF
    Background: Area-proportional Euler diagrams are frequently used to visualize data from Microarray experiments, but are also applied to a wide variety of other data from biosciences, social networks and other domains. Results: This paper details Edeap, a new simple, scalable method for drawing areaproportional Euler diagrams with ellipses. We use a search-based technique optimizing a multi-criteria objective function that includes measures for both area accuracy and usability, and which can be extended to further user-defned criteria. The Edeap software is available for use on the web, and the code is open source. In addition to describing our system, we present the frst extensive evaluation of software for producing area-proportional Euler diagrams, comparing Edeap to the current state-of-the-art; circle-based method, venneuler, and an alternative ellipse-based method, eulerr. Conclusions: Our evaluation—using data from the Gene Ontology database via GoMiner, Twitter data from the SNAP database, and randomly generated data sets—shows an ordering for accuracy (from best to worst) of eulerr, followed by Edeap and then venneuler. In terms of runtime, the results are reversed with venneuler being the fastest, followed by Edeap and fnally eulerr. Regarding scalability, eulerr cannot draw non-trivial diagrams beyond 11 sets, whereas no such limitation is present in Edeap or venneuler, both of which draw diagrams up to the tested limit of 20 sets

    hzAnalyzer: detection, quantification, and visualization of contiguous homozygosity in high-density genotyping datasets

    Get PDF
    The analysis of contiguous homozygosity (runs of homozygous loci) in human genotyping datasets is critical in the search for causal disease variants in monogenic disorders, studies of population history and the identification of targets of natural selection. Here, we report methods for extracting homozygous segments from high-density genotyping datasets, quantifying their local genomic structure, identifying outstanding regions within the genome and visualizing results for comparative analysis between population samples

    KelpFusion: A Hybrid Set Visualization Technique

    Full text link

    Essays on multidimensional poverty measurement and the dependence among well-being dimensions

    Get PDF
    Evaluating the welfare of nations is high on the research agenda of the economists, practitioners and policy-makers. The literature contributions of the last decades triggered a multivariate perception of the well-being, which is suggested to go beyond the GDP, and created a need for more complex approaches to evaluate the welfare as well as poverty. The first essay investigates the approaches to multivariate poverty measurement and focuses on the composite index approach and the steps involved in it. An important aspect of the multivariate perspective in well-being is the dependence among the underlying indicators. There is a growing evidence in the literature that well-being dimensions are interrelated. This dependence among attributes matters for multidimensional poverty measurement, since income is no longer the only indicator to be considered. However, the reviewed approaches to multivariate poverty measurement do not commonly capture this interdependence. The second essay suggests a copula function as a flexible tool to estimate the dependence among welfare variables. Moreover, it proposes to incorporate the evaluated dependence in the composite indicator. The trade-off among attributes, which is established via the weighting of dimensions, is identified as a possible channel to include the interdependence in the composite indicator. The third essay of this dissertation defines bivariate and multivariate copula-based measures of dependence and applies them using the recent data from the EU-SILC. The results suggest that key dimensions of well-being, i.e. income, education and health, are positively interdependent. In addition, the strength of pairwise and multivariate dependence reinforced in the post-crises period in some European countries. Finally, the last essay proposes a new class of the copula-based multidimensional poverty indices by innovating over the weighting approach. The weighting scheme proposed in this dissertation incorporates the estimated copula-based dependence and contains necessary normative controls to be chosen by the practitioner. The findings of the last essay suggest that the overall poverty is driven not only by the individual shortfalls, but also I by the degree of interdependence among well-being indicators. Considering the proposed copula-based weighting scheme and the proposal of the new class of copula-based poverty indices, this dissertation contributes to the multivariate poverty measurement by suggesting the channel to enclose the dependence structure in the composite indicators. The proposed copula-based methodology will advance the multidimensional poverty analysis and the poverty-reducing policy, which can be designed to address the problem of interdependence of individual achievements

    Essays on multidimensional poverty measurement and the dependence among well-being dimensions

    Get PDF
    Evaluating the welfare of nations is high on the research agenda of the economists, practitioners and policy-makers. The literature contributions of the last decades triggered a multivariate perception of the well-being, which is suggested to go beyond the GDP, and created a need for more complex approaches to evaluate the welfare as well as poverty. The first essay investigates the approaches to multivariate poverty measurement and focuses on the composite index approach and the steps involved in it. An important aspect of the multivariate perspective in well-being is the dependence among the underlying indicators. There is a growing evidence in the literature that well-being dimensions are interrelated. This dependence among attributes matters for multidimensional poverty measurement, since income is no longer the only indicator to be considered. However, the reviewed approaches to multivariate poverty measurement do not commonly capture this interdependence. The second essay suggests a copula function as a flexible tool to estimate the dependence among welfare variables. Moreover, it proposes to incorporate the evaluated dependence in the composite indicator. The trade-off among attributes, which is established via the weighting of dimensions, is identified as a possible channel to include the interdependence in the composite indicator. The third essay of this dissertation defines bivariate and multivariate copula-based measures of dependence and applies them using the recent data from the EU-SILC. The results suggest that key dimensions of well-being, i.e. income, education and health, are positively interdependent. In addition, the strength of pairwise and multivariate dependence reinforced in the post-crises period in some European countries. Finally, the last essay proposes a new class of the copula-based multidimensional poverty indices by innovating over the weighting approach. The weighting scheme proposed in this dissertation incorporates the estimated copula-based dependence and contains necessary normative controls to be chosen by the practitioner. The findings of the last essay suggest that the overall poverty is driven not only by the individual shortfalls, but also I by the degree of interdependence among well-being indicators. Considering the proposed copula-based weighting scheme and the proposal of the new class of copula-based poverty indices, this dissertation contributes to the multivariate poverty measurement by suggesting the channel to enclose the dependence structure in the composite indicators. The proposed copula-based methodology will advance the multidimensional poverty analysis and the poverty-reducing policy, which can be designed to address the problem of interdependence of individual achievements
    • …
    corecore