106 research outputs found
Liam tackles complex multimodal single-cell data integration challenges
Multi-omics characterization of single cells holds outstanding potential for profiling gene regulatory states of thousands of cells and their dynamics and relations. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data. Liam learns a joint low-dimensional representation of two concurrently measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tuneable combination of conditional and adversarial training and can be optimized using replicate information while retaining selected biological variation. We demonstrate liam’s superior performance on multiple multimodal data sets, including Multiome and CITE-seq data. Detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
Here we present an exome-wide rare genetic variant association study for 30 biomarkers in 191,640 individuals in the UK Biobank. We perform gene-based association tests for separate functional variant categories to increase interpretability and identify 201 significant gene-biomarker associations, which include novel associations such as GIGYF1 with diabetes markers. In addition to performing gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for missense variants, splicing and the binding of RNA-binding proteins. For these tests we present a powerful and computationally efficient combination of the likelihood-ratio and score tests that found 32% more associations than the score test alone. Kernel-based tests identified 12-31% more associations than their gene-based collapsing counterparts with large overlaps, and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use this approach to identify potential novel gain of function variants in PIEZO1, and interpret a position-specific association of ABCA1-variants with inflammation marker CRP. Our results show the benefits of separately investigating different functional mechanisms when performing rare-variant association tests, and highlight the strengths of biomarker panels for large biobanks
Recommended from our members
Relative abundance of desert tortoises on the Nevada Test Site
Seven hundred fifty-nine transects having a total length of 1,191 km were walked during 1981--1986 to determine the distribution and relative abundance of desert tortoises (Gopherus agassizii) on the Nevada Test Site (NTS). The abundance of tortoises on NTS was low to very low relative to other populations in the Mojave Desert. Sign of tortoises was found from 880 to 1,570 m elevation and was more abundant above 1,200 m than has been reported previously for Nevada. Tortoises were more abundant on NTS on the upper alluvial fans and slopes of mountains than in valley bottoms. They also were more common on or near limestone and dolomite mountains than on mountains of volcanic origin
Intricacies of single-cell multi-omics data integration
A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches
- …