1,226 research outputs found
Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction.
Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer
Statistical methods for clinical genome interpretation with specific application to inherited cardiac conditions
Background: While next-generation sequencing has enabled us to rapidly identify sequence variants, clinical application is limited by our ability to determine which rare variants impact disease risk.
Aim: Developing computational methods to identify clinically important variants
Methods and Results:
(1) I built a disease-specific variant classifier for inherited cardiac conditions (ICCs), which outperforms genome-wide tools in a wide range of benchmarking. It discriminates pathogenic variants from benign variants with global accuracy improved by 4-24% over existing tools. Variants classified with >90% confidence are significantly associated with both disease status and clinical outcomes.
(2) To better interpret missense variants, I examined evolutionarily equivalent residues across protein domain families, to identify positions intolerant of variations. Homologous residue constraint is a strong predictor of variant pathogenicity. It can identify a subset of de novo missense variants with comparable impact on developmental disorders as protein-truncating variants. Independent from existing approaches, it can also improve the prioritisation of disease-relevant gene for both developmental disorders and inherited hypertrophic cardiomyopathy.
(3) TTN-truncating variants are known to cause dilated cardiomyopathy, but the effect of missense variants is poorly understood. Using the approach in (2), I studied the role of TTN missense variants on DCM. Our prioritised residues are enriched with known pathogenic variants, including the two known to cause DCM and others involved in skeletal myopathies. I also found a significant association between constrained variants of TTN I-set domains and DCM in a case-control burden test of Caucasian samples (OR=3.2, 95%CI=1.3-9.4). Within subsets of DCM, the association is replicated in alcoholic cardiomyopathy.
(4) Finally, I also developed a tool to annotate 5’UTR variants creating or disrupting upstream open reading frames (uORF). Its utility is demonstrated to detect high-impact uORF-disturbing variants from ClinVar, gnomAD and Genomics England.
Conclusion:
These studies established broadly applicable methods and improved understanding of ICCs.Open Acces
PCFGaze: Physics-Consistent Feature for Appearance-based Gaze Estimation
Although recent deep learning based gaze estimation approaches have achieved
much improvement, we still know little about how gaze features are connected to
the physics of gaze. In this paper, we try to answer this question by analyzing
the gaze feature manifold. Our analysis revealed the insight that the geodesic
distance between gaze features is consistent with the gaze differences between
samples. According to this finding, we construct the Physics- Consistent
Feature (PCF) in an analytical way, which connects gaze feature to the physical
definition of gaze. We further propose the PCFGaze framework that directly
optimizes gaze feature space by the guidance of PCF. Experimental results
demonstrate that the proposed framework alleviates the overfitting problem and
significantly improves cross-domain gaze estimation accuracy without extra
training data. The insight of gaze feature has the potential to benefit other
regression tasks with physical meanings
Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation
Recently, deep neural networks have greatly advanced histopathology image
segmentation but usually require abundant annotated data. However, due to the
gigapixel scale of whole slide images and pathologists' heavy daily workload,
obtaining pixel-level labels for supervised learning in clinical practice is
often infeasible. Alternatively, weakly-supervised segmentation methods have
been explored with less laborious image-level labels, but their performance is
unsatisfactory due to the lack of dense supervision. Inspired by the recent
success of self-supervised learning methods, we present a label-efficient
tissue prototype dictionary building pipeline and propose to use the obtained
prototypes to guide histopathology image segmentation. Particularly, taking
advantage of self-supervised contrastive learning, an encoder is trained to
project the unlabeled histopathology image patches into a discriminative
embedding space where these patches are clustered to identify the tissue
prototypes by efficient pathologists' visual examination. Then, the encoder is
used to map the images into the embedding space and generate pixel-level pseudo
tissue masks by querying the tissue prototype dictionary. Finally, the pseudo
masks are used to train a segmentation network with dense supervision for
better performance. Experiments on two public datasets demonstrate that our
human-machine interactive tissue prototype learning method can achieve
comparable segmentation performance as the fully-supervised baselines with less
annotation burden and outperform other weakly-supervised methods. Codes will be
available upon publication.Comment: IPMI2023 camera read
Active Learning for Graphs with Noisy Structures
Graph Neural Networks (GNNs) have seen significant success in tasks such as
node classification, largely contingent upon the availability of sufficient
labeled nodes. Yet, the excessive cost of labeling large-scale graphs led to a
focus on active learning on graphs, which aims for effective data selection to
maximize downstream model performance. Notably, most existing methods assume
reliable graph topology, while real-world scenarios often present noisy graphs.
Given this, designing a successful active learning framework for noisy graphs
is highly needed but challenging, as selecting data for labeling and obtaining
a clean graph are two tasks naturally interdependent: selecting high-quality
data requires clean graph structure while cleaning noisy graph structure
requires sufficient labeled data. Considering the complexity mentioned above,
we propose an active learning framework, GALClean, which has been specifically
designed to adopt an iterative approach for conducting both data selection and
graph purification simultaneously with best information learned from the prior
iteration. Importantly, we summarize GALClean as an instance of the
Expectation-Maximization algorithm, which provides a theoretical understanding
of its design and mechanisms. This theory naturally leads to an enhanced
version, GALClean+. Extensive experiments have demonstrated the effectiveness
and robustness of our proposed method across various types and levels of noisy
graphs
- …