16,910 research outputs found
Hellinger Distance Trees for Imbalanced Streams
Classifiers trained on data sets possessing an imbalanced class distribution
are known to exhibit poor generalisation performance. This is known as the
imbalanced learning problem. The problem becomes particularly acute when we
consider incremental classifiers operating on imbalanced data streams,
especially when the learning objective is rare class identification. As
accuracy may provide a misleading impression of performance on imbalanced data,
existing stream classifiers based on accuracy can suffer poor minority class
performance on imbalanced streams, with the result being low minority class
recall rates. In this paper we address this deficiency by proposing the use of
the Hellinger distance measure, as a very fast decision tree split criterion.
We demonstrate that by using Hellinger a statistically significant improvement
in recall rates on imbalanced data streams can be achieved, with an acceptable
increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International
Conference on Pattern Recognition (ICPR) 201
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Optimal greenhouse cultivation control: survey and perspectives
Abstract: A survey is presented of the literature on greenhouse climate control, positioning the various solutions and paradigms in the framework of optimal control. A separation of timescales allows the separation of the economic optimal control problem of greenhouse cultivation into an off-line problem at the tactical level, and an on-line problem at the operational level. This paradigm is used to classify the literature into three categories: focus on operational control, focus on the tactical level, and truly integrated control. Integrated optimal control warrants the best economical result, and provides a systematic way to design control systems for the innovative greenhouses of the future. Research issues and perspectives are listed as well
BaseFs - Basically Acailable, Soft State, Eventually Consistent Filesystem for Cluster Management
A peer-to-peer distributed filesystem for community cloud management. https://github.com/glic3rinu/basef
Stratified decision forests for accurate anatomical landmark localization in cardiac images
Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy
- …