18,355 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
StackInsights: Cognitive Learning for Hybrid Cloud Readiness
Hybrid cloud is an integrated cloud computing environment utilizing a mix of
public cloud, private cloud, and on-premise traditional IT infrastructures.
Workload awareness, defined as a detailed full range understanding of each
individual workload, is essential in implementing the hybrid cloud. While it is
critical to perform an accurate analysis to determine which workloads are
appropriate for on-premise deployment versus which workloads can be migrated to
a cloud off-premise, the assessment is mainly performed by rule or policy based
approaches. In this paper, we introduce StackInsights, a novel cognitive system
to automatically analyze and predict the cloud readiness of workloads for an
enterprise. Our system harnesses the critical metrics across the entire stack:
1) infrastructure metrics, 2) data relevance metrics, and 3) application
taxonomy, to identify workloads that have characteristics of a) low sensitivity
with respect to business security, criticality and compliance, and b) low
response time requirements and access patterns. Since the capture of the data
relevance metrics involves an intrusive and in-depth scanning of the content of
storage objects, a machine learning model is applied to perform the business
relevance classification by learning from the meta level metrics harnessed
across stack. In contrast to traditional methods, StackInsights significantly
reduces the total time for hybrid cloud readiness assessment by orders of
magnitude
Visual Integration of Data and Model Space in Ensemble Learning
Ensembles of classifier models typically deliver superior performance and can
outperform single classifier models given a dataset and classification task at
hand. However, the gain in performance comes together with the lack in
comprehensibility, posing a challenge to understand how each model affects the
classification outputs and where the errors come from. We propose a tight
visual integration of the data and the model space for exploring and combining
classifier models. We introduce a workflow that builds upon the visual
integration and enables the effective exploration of classification outputs and
models. We then present a use case in which we start with an ensemble
automatically selected by a standard ensemble selection algorithm, and show how
we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Classification of Carpiodes Using Fourier Descriptors: A Content Based Image Retrieval Approach
Taxonomic classification has always been important to the study of any biological system. Many biological species will go unclassified and become lost forever at the current rate of classification. The current state of computer technology makes image storage and retrieval possible on a global level. As a result, computer-aided taxonomy is now possible. Content based image retrieval techniques utilize visual features of the image for classification. By utilizing image content and computer technology, the gap between taxonomic classification and species destruction is shrinking. This content based study utilizes the Fourier Descriptors of fifteen known landmark features on three Carpiodes species: C.carpio, C.velifer, and C.cyprinus. Classification analysis involves both unsupervised and supervised machine learning algorithms. Fourier Descriptors of the fifteen known landmarks provide for strong classification power on image data. Feature reduction analysis indicates feature reduction is possible. This proves useful for increasing generalization power of classification
- …