784 research outputs found
A Simple FPTAS for Counting Edge Covers
An edge cover of a graph is a set of edges such that every vertex has at
least an adjacent edge in it. Previously, approximation algorithm for counting
edge covers is only known for 3 regular graphs and it is randomized. We design
a very simple deterministic fully polynomial-time approximation scheme (FPTAS)
for counting the number of edge covers for any graph. Our main technique is
correlation decay, which is a powerful tool to design FPTAS for counting
problems. In order to get FPTAS for general graphs without degree bound, we
make use of a stronger notion called computationally efficient correlation
decay, which is introduced in [Li, Lu, Yin SODA 2012].Comment: To appear in SODA 201
Extracting fetal heart beats from maternal abdominal recordings: Selection of the optimal principal components
This study presents a systematic comparison of different approaches to the automated selection of the principal components (PC) which optimise the detection of maternal and fetal heart beats from non-invasive maternal abdominal recordings. A public database of 75 4-channel non-invasive maternal abdominal recordings was used for training the algorithm. Four methods were developed and assessed to determine the optimal PC: (1) power spectral distribution, (2) root mean square, (3) sample entropy, and (4) QRS template. The sensitivity of the performance of the algorithm to large-amplitude noise removal (by wavelet de-noising) and maternal beat cancellation methods were also assessed. The accuracy of maternal and fetal beat detection was assessed against reference annotations and quantified using the detection accuracy score F1 [2*PPV*Se / (PPV + Se)], sensitivity (Se), and positive predictive value (PPV). The best performing implementation was assessed on a test dataset of 100 recordings and the agreement between the computed and the reference fetal heart rate (fHR) and fetal RR (fRR) time series quantified. The best performance for detecting maternal beats (F1 99.3%, Se 99.0%, PPV 99.7%) was obtained when using the QRS template method to select the optimal maternal PC and applying wavelet de-noising. The best performance for detecting fetal beats (F1 89.8%, Se 89.3%, PPV 90.5%) was obtained when the optimal fetal PC was selected using the sample entropy method and utilising a fixed-length time window for the cancellation of the maternal beats. The performance on the test dataset was 142.7 beats2/min2 for fHR and 19.9 ms for fRR, ranking respectively 14 and 17 (out of 29) when compared to the other algorithms presented at the Physionet Challenge 2013
Modelling arterial pressure waveforms using Gaussian functions and two-stage particle swarm optimizer
Changes of arterial pressure waveform characteristics have been accepted as risk indicators of cardiovascular diseases. Waveform modelling using Gaussian functions has been used to decompose arterial pressure pulses into different numbers of subwaves and hence quantify waveform characteristics. However, the fitting accuracy and computation efficiency of current modelling approaches need to be improved. This study aimed to develop a novel two-stage particle swarm optimizer (TSPSO) to determine optimal parameters of Gaussian functions. The evaluation was performed on carotid and radial artery pressure waveforms (CAPW and RAPW) which were simultaneously recorded from twenty normal volunteers. The fitting accuracy and calculation efficiency of our TSPSO were compared with three published optimization methods: the Nelder-Mead, the modified PSO (MPSO), and the dynamic multiswarm particle swarm optimizer (DMS-PSO). The results showed that TSPSO achieved the best fitting accuracy with a mean absolute error (MAE) of 1.1% for CAPW and 1.0% for RAPW, in comparison with 4.2% and 4.1% for Nelder-Mead, 2.0% and 1.9% for MPSO, and 1.2% and 1.1% for DMS-PSO. In addition, to achieve target MAE of 2.0%, the computation time of TSPSO was only 1.5 s, which was only 20% and 30% of that for MPSO and DMS-PSO, respectively
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
We show that label noise exists in adversarial training. Such label noise is
due to the mismatch between the true label distribution of adversarial examples
and the label inherited from clean examples - the true label distribution is
distorted by the adversarial perturbation, but is neglected by the common
practice that inherits labels from clean examples. Recognizing label noise
sheds insights on the prevalence of robust overfitting in adversarial training,
and explains its intriguing dependence on perturbation radius and data quality.
Also, our label noise perspective aligns well with our observations of the
epoch-wise double descent in adversarial training. Guided by our analyses, we
proposed a method to automatically calibrate the label to address the label
noise and robust overfitting. Our method achieves consistent performance
improvements across various models and datasets without introducing new
hyper-parameters or additional tuning.Comment: Neurips 2022 (Oral); A previous version of this paper (v1) used the
title `Double Descent in Adversarial Training: An Implicit Label Noise
Perspective
PerPAS: Topology-Based Single Sample Pathway Analysis Method
Identification of intracellular pathways that play key roles in cancer progression and drug resistance is a prerequisite for developing targeted cancer treatments. The era of personalized medicine calls for computational methods that can function with one sample or very small set of samples. Developing such methods is challenging because standard statistical approaches pose several limiting assumptions, such as number of samples, that prevent their application when n approaches to one. We have developed a novel pathway analysis method called PerPAS to estimate pathway activity at a single sample level by integrating pathway topology and transcriptomics data. In addition, PerPAS is able to identify altered pathways between cancer and control samples as well as to identify key nodes that contribute to the pathway activity. In our case study using breast cancer data, we show that PerPAS can identify highly altered pathways that are associated with patient survival. PerPAS identified four pathways that were associated with patient survival and were successfully validated in three independent breast cancer cohorts. In comparison to two other pathway analysis methods that function at a single sample level, PerPAS had superior performance in both synthetic and breast cancer expression datasets. PerPAS is a free R package (http://csbi.ltdk.helsinki.fi/pub/czliu/perpas/).Peer reviewe
Computational Integrative Analysis of Biological Networks in Cancer
Cancer is one of the most lethal diseases. By 2030, deaths caused by cancers are estimated to reach 13 million per year worldwide. Cancer is a collection of related diseases distinguished by uncontrolled cell division that is driven by genomic alterations. Cancer is heterogeneous and shows an extraordinary genomic diversity between patients with transcriptionally and histologically similar cancer subtypes, and even between tumors from the same anatomical position. The heterogeneity poses great challenges in understanding cancer mechanisms and drug resistance; this understanding is critical for precise prognosis and improved treatments.
Emergence of high-throughput technologies, such as microarrays and next-generation sequencing, has motivated the investigation of cancer cells on a genome-wide scale. Over the last decade, an unprecedented amount of high-throughput data has been generated. The challenge is to turn such a vast amount of raw data into clinically valuable information to benefit cancer patients. Single omics data have failed to fully uncover mechanisms behind cancer phenotypes. Accordingly, integrative approaches have been introduced to systematically analyze and interpret multi-omics data, among which network-based integrative approaches have achieved substantial advances in basic biological studies and cancer treatments.
In this thesis, the development and application of network-based integrative methods are included to address challenges in analyzing cancer samples. Two novel methods are introduced to integrate disparate omics data and biological networks at the single-patient level: PerPAS, which takes pathway topology into account and integrates gene expression and clinical data with pathway information; and DERA, which elevates gene expression analysis to the network level and identifies network-based biomarkers that provide functional interpretation. The performance of both methods was demonstrated using biological experiment data, and the results were validated in independent cohorts.
The application part of this thesis focuses on understanding cancer mechanisms and identifying clinical biomarkers in breast cancer and diffuse large B-cell lymphoma using PerPAS, DERA, and an existing method SPIA. Our experimental results provided insights into underlying cancer mechanisms and potential prognostic biomarkers for breast cancer, and identified therapeutic targets for diffuse large B-cell lymphoma. The potential of the therapeutic targets was verified in in vitro experiments.癌症是一种复杂的疾病,也是现今最致命的疾病之一。据推算未来二十年后, 在世界范围内, 每年将有一千三百万人死于癌症。癌症是异质性疾病,表现出极大的基因组多样性。取自不同病人但属于相似亚组的基因组样品呈现出显著的差异性, 甚至取自同一个病人同一个位置的基因组样品也是具有差异性。理解癌症致病机理和发展过程才能更好地提供精确诊断及治疗。
高通量技术的出现激发了系统分析学和计算工具的发展。但是单一平台的数据不足以全面揭示癌症机理, 导致理解癌症机理一直是个极大的挑战。基于网络的整合方法的出现促进了基础生物的研究和病人的诊治。这篇论文包括两个部分: 整合方法的开发与应用。在开发新的整合方法方面, 我们研发了新的整合方法来应对整合数据的挑战并回答癌症研究中的问题。两个新开发的整合方法有: 1) PerPAS, 是一个体化治疗分析工具, 支持单个病人样品的分析, 并且能整合信号通路和基因表达数据。2) DERA, 是一个整合细胞网络和基因表达数据的工具。它能把基因表达数据的分析提升到网络层面并能进行单个样品的分析。这两种新型方法的可用性已经在生物数据应用中得以展示, 并且用独立数据验证了发现的结果。
整合方法的应用部分集中在全面整合分析mRNA, miRNA, 信号通路数据, 并在弥漫大B细胞淋巴瘤中识别出新的治疗靶点。在此方法的应用下, 我们发现了几个调控重要的临床存活的细胞通路的靶点。并且这些靶点的可靠性已经被实验验证
- …