Search CORE

8 research outputs found

Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Author: Dann Emma
Kumasaka Natsuhiko
Lalchand Vidhi
Lawrence Neil D.
Lindeboom Rik G. H.
Madad Shaista
Ravuri Aditya
Sumanaweera Dinithi
Teichmann Sarah A.
Publication venue
Publication date: 05/11/2022
Field of study

Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct latent signatures of innate immunity recovered in Kumasaka et al. (2021) with 9x lower training time. We further analyze a COVID dataset and demonstrate across a cohort of 130 individuals, that this framework enables data integration while capturing interpretable signatures of infection. Specifically, we explore COVID severity as a latent dimension to refine patient stratification and capture disease-specific gene expression.Comment: Machine Learning and Computational Biology Symposium (Oral), 202

arXiv.org e-Print Archive

Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors

Author: Allison Lloyd
Konagurthu Arun
Sumanaweera Dinithi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/07/2019
Field of study

Crossref

Monash University Research Portal

Bridging the gaps in statistical models of protein alignment

Author: Allison Lloyd
Konagurthu Arun S.
Sumanaweera Dinithi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/06/2022
Field of study

SUMMARY: Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together. To overcome this gap, this article demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed using a time-parameterized substitution matrix and a time-parameterized alignment state machine. Methods to derive all parameters of such a model from any benchmark collection of aligned protein sequences are described here. This has not only allowed us to generate a unified statistical model for each of the nine widely used substitution matrices (PAM, JTT, BLOSUM, JO, WAG, VTML, LG, MIQS and PFASUM), but also resulted in a new unified model, MMLSUM. Our underlying methodology measures the Shannon information content using each model to explain losslessly any given collection of alignments, which has allowed us to quantify the performance of all the above models on six comprehensive alignment benchmarks. Our results show that MMLSUM results in a new and clear overall best performance, followed by PFASUM, VTML, BLOSUM and MIQS, respectively, amongst the top five. We further analyze the statistical properties of MMLSUM model and contrast it with others. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

PubMed Central

Monash University Research Portal

On the reliability and the limits of inference of amino acid sequence alignments

Author: Abramson David
Allison Lloyd
Garcia De La Banda Maria
Konagurthu Arun S.
Lesk Arthur M.
Rajapaksa Sandun
Stuckey Peter J.
Sumanaweera Dinithi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/06/2022
Field of study

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the ‘daylight’, ‘twilight’ and ‘midnight’ zones for interpreting residue–residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

PubMed Central

Monash University Research Portal

Recommended from our members

Human SARS-CoV-2 challenge uncovers local and systemic response dynamics

The COVID-19 pandemic is an ongoing global health threat, yet our understanding of the dynamics of early cellular responses to this disease remains limited1. Here in our SARS-CoV-2 human challenge study, we used single-cell multi-omics profiling of nasopharyngeal swabs and blood to temporally resolve abortive, transient and sustained infections in seronegative individuals challenged with pre-Alpha SARS-CoV-2. Our analyses revealed rapid changes in cell-type proportions and dozens of highly dynamic cellular response states in epithelial and immune cells associated with specific time points and infection status. We observed that the interferon response in blood preceded the nasopharyngeal response. Moreover, nasopharyngeal immune infiltration occurred early in samples from individuals with only transient infection and later in samples from individuals with sustained infection. High expression of HLA-DQA2 before inoculation was associated with preventing sustained infection. Ciliated cells showed multiple immune responses and were most permissive for viral replication, whereas nasopharyngeal T cells and macrophages were infected non-productively. We resolved 54 T cell states, including acutely activated T cells that clonally expanded while carrying convergent SARS-CoV-2 motifs. Our new computational pipeline Cell2TCR identifies activated antigen-responding T cells based on a gene expression signature and clusters these into clonotype groups and motifs. Overall, our detailed time series data can serve as a Rosetta stone for epithelial and immune cell responses and reveals early dynamic responses associated with protection against infection

Apollo (Cambridge)

Recommended from our members

Human SARS-CoV-2 challenge uncovers local and systemic response dynamics.

Apollo (Cambridge)

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Author: Alkallas Rached
Anghel Catalina
Atassi Nazem
Avril Jeanne
Bacardit Jaume
Balagurusamy Venkat
Balser Barbara
Balser John
Bar-Sinai Yoav
Ben-David Noa
Ben-Zion Eyal
Bliss Robin
Bronfeld Maya
Cai Jialu
Chang Huan-Jui
Chernyshev Anatoly
Chiang Jung-Hsien
Chicco Davide
Chio Adriano
Corriveau Bhavna Ahuja Nicole
Cudkowicz Merit
Dai Junqiang
Deshpande Yash
Desplats Eve
Di Camillo Barbara
Dillenberger Donna
Durgin Joseph S
Espiritu Shadrielle Melijah G
Fan Fan
Fang Wen-Chieh
Fevrier Philippe
Fridley Brooke L
Garcia-Garcia Javier
Godzik Adam
Golinska Agnieszka
Gordon Jonathan
Graw Stefan
Guo Yuelong
Hardiman Orla
Herpelinck Tim
Hoff Bruce
Hopkins Julia
Huang Barbara
Jacobsen Jeremy
Jahandideh Samad
Jeon Jouhyun
Ji Wenkai
Jung Kenneth
Karanevich Alex
Knight Joshua
Koestler Devin C
Kozak Michael
Kueffner Robert
Kurz Christoph
Lalansingh Christopher
Larrieu Thomas
Lazzarini Nicola
Leitner Melanie L
Lerner Boaz
Lesinski Wojciech
Li Guang
Liang Xiaotao
Lin Xihui
Lowe Jarrett
Mackey Lester
Mangravite Lara
Meier Richard
Min Wenwen
Mnich Krzysztof
Nahmias Violette
Noel-MacDonnell Janelle
Norel Raquel
Norman Thea
O'Donnell Adrienne
Paadre Susan
Park Ji
Peng Jian
Polewko-Klim Aneta
Raghavan Rama
Rudnicki Witold
Saghapour Ehsan
Salomond Jean-Bernard
Sankaran Kris
Sendorek Dorota
Sharan Vatsal
Shiah Yu-Jia
Sirois Jean-Karl
Stolovitzky Gustavo
Sumanaweera Dinithi N
Usset Joseph
Vang Yeeleng S
Vens Celine
Wadden Dave
Wang David
Wang Liuxia
Wong Wing Chung
Xiao Jinfeng
Xie Xiaohui
Xu Zhiqing
Yang Chen
Yang Hsih-Te
Yu Xiang
Zach Neta
Zhang Haichen
Zhang Li
Zhang Shihua
Zhu Shanfeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease where substantial heterogeneity in clinical presentation urgently requires a better stratification of patients for the development of drug trials and clinical care. In this study we explored stratification through a crowdsourcing approach, the DREAM Prize4Life ALS Stratification Challenge. Using data from >10,000 patients from ALS clinical trials and 1479 patients from community-based patient registers, more than 30 teams developed new approaches for machine learning and clustering, outperforming the best current predictions of disease outcome. We propose a new method to integrate and analyze patient clusters across methods, showing a clear pattern of consistent and clinically relevant sub-groups of patients that also enabled the reliable classification of new patients. Our analyses reveal novel insights in ALS and describe for the first time the potential of a crowdsourcing to uncover hidden patient sub-populations, and to accelerate disease understanding and therapeutic development

Lirias

Crossref

Directory of Open Access Journals

eScholarship - University of California

UPF Digital Repository

Newcastle University E-Prints

Archivio istituzionale della ricerca - Università di Padova

Institutional Research Information System University of Turin