12 research outputs found
A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS)
Feature selection represents a measure to reduce the complexity of
high-dimensional datasets and gain insights into the systematic variation in
the data. This aspect is of specific importance in domains that rely on model
interpretability, such as life sciences. We propose UBayFS, an ensemble feature
selection technique embedded in a Bayesian statistical framework. Our approach
considers two sources of information: data and domain knowledge. We build a
meta-model from an ensemble of elementary feature selectors and aggregate this
information in a multinomial likelihood. The user guides UBayFS by weighting
features and penalizing specific feature blocks or combinations, implemented
via a Dirichlet-type prior distribution and a regularization term. In a
quantitative evaluation, we demonstrate that our framework (a) allows for a
balanced trade-off between user knowledge and data observations, and (b)
achieves competitive performance with state-of-the-art methods
Principal component-based image segmentation: a new approach to outline in vitro cell colonies
publishedVersio
Principal component-based image segmentation: a new approach to outline in vitro cell colonies
The in vitro clonogenic assay is a technique to study the ability of a cell
to form a colony in a culture dish. By optical imaging, dishes with stained
colonies can be scanned and assessed digitally. Identification, segmentation
and counting of stained colonies play a vital part in high-throughput screening
and quantitative assessment of biological assays. Image processing of such
pictured/scanned assays can be affected by image/scan acquisition artifacts
like background noise and spatially varying illumination, and contaminants in
the suspension medium. Although existing approaches tackle these issues, the
segmentation quality requires further improvement, particularly on noisy and
low contrast images. In this work, we present an objective and versatile
machine learning procedure to amend these issues by characterizing, extracting
and segmenting inquired colonies using principal component analysis, k-means
clustering and a modified watershed segmentation algorithm. The intention is to
automatically identify visible colonies through spatial texture assessment and
accordingly discriminate them from background in preparation for successive
segmentation. The proposed segmentation algorithm yielded a similar quality as
manual counting by human observers. High F1 scores (>0.9) and low
root-mean-square errors (around 14%) underlined good agreement with ground
truth data. Moreover, it outperformed a recent state-of-the-art method. The
methodology will be an important tool in future cancer research applications
A Gaussian Sliding Windows Regression Model for Hydrological Inference
Statistical models are an essential tool to model, forecast and understand
the hydrological processes in watersheds. In particular, the modeling of time
lags associated with the time between rainfall occurrence and subsequent
changes in streamflow, is of high practical importance. Since water can take a
variety of flowpaths to generate streamflow, a series of distinct runoff pulses
from different flowpath may combine to create the observed streamflow time
series. Current state-of-the-art models are not able to sufficiently confront
the problem complexity with interpretable parametrization, which would allow
insights into the dynamics of the distinct flow paths for hydrological
inference. The proposed Gaussian Sliding Windows Regression Model targets this
problem by combining the concept of multiple windows sliding along the time
axis with multiple linear regression. The window kernels, which indicate the
weights applied to different time lags, are implemented via Gaussian-shaped
kernels. As a result, each window can represent one flowpath and, thus, offers
the potential for straightforward process inference. Experiments on simulated
and real-world scenarios underline that the proposed model achieves accurate
parameter estimates and competitive predictive performance, while fostering
explainable and interpretable hydrological modeling
Towards Understanding the Survival of Patients with High-Grade Gastroenteropancreatic Neuroendocrine Neoplasms: An Investigation of Ensemble Feature Selection in the Prediction of Overall Survival
Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. Recently developed ensemble feature selectors like the Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) allow the user to identify such features in datasets with low sample sizes. While RENT is purely data-driven, UBayFS is capable of integrating expert knowledge a priori in the feature selection process. In this work we compare both feature selectors on a dataset comprising of 63 patients and 134 features from multiple sources, including basic patient characteristics, baseline blood values, tumor histology, imaging, and treatment information. Our experiments involve data-driven and expert-driven setups, as well as combinations of both. We use findings from clinical literature as a source of expert knowledge. Our results demonstrate that both feature selectors allow accurate predictions, and that expert knowledge has a stabilizing effect on the feature set, while the impact on predictive performance is limited. The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study.submittedVersio
RENT—Repeated Elastic Net Technique for Feature Selection
publishedVersio
Optimal bayesian experimental design for crossover semiconductor lifetime studies
Stefan Schrunner, B.Sc.Alpen Adria Universität Klagenfurt, Masterarbeit, 2016(VLID)241284