48,022 research outputs found
Embedding Feature Selection for Large-scale Hierarchical Classification
Large-scale Hierarchical Classification (HC) involves datasets consisting of
thousands of classes and millions of training instances with high-dimensional
features posing several big data challenges. Feature selection that aims to
select the subset of discriminant features is an effective strategy to deal
with large-scale HC problem. It speeds up the training process, reduces the
prediction time and minimizes the memory requirements by compressing the total
size of learned model weight vectors. Majority of the studies have also shown
feature selection to be competent and successful in improving the
classification accuracy by removing irrelevant features. In this work, we
investigate various filter-based feature selection methods for dimensionality
reduction to solve the large-scale HC problem. Our experimental evaluation on
text and image datasets with varying distribution of features, classes and
instances shows upto 3x order of speed-up on massive datasets and upto 45% less
memory requirements for storing the weight vectors of learned model without any
significant loss (improvement for some datasets) in the classification
accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.Comment: IEEE International Conference on Big Data (IEEE BigData 2016
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
Remote sensing (RS) image retrieval is of great significant for geological
information mining. Over the past two decades, a large amount of research on
this task has been carried out, which mainly focuses on the following three
core issues: feature extraction, similarity metric and relevance feedback. Due
to the complexity and multiformity of ground objects in high-resolution remote
sensing (HRRS) images, there is still room for improvement in the current
retrieval approaches. In this paper, we analyze the three core issues of RS
image retrieval and provide a comprehensive review on existing methods.
Furthermore, for the goal to advance the state-of-the-art in HRRS image
retrieval, we focus on the feature extraction issue and delve how to use
powerful deep representations to address this task. We conduct systematic
investigation on evaluating correlative factors that may affect the performance
of deep features. By optimizing each factor, we acquire remarkable retrieval
results on publicly available HRRS datasets. Finally, we explain the
experimental phenomenon in detail and draw conclusions according to our
analysis. Our work can serve as a guiding role for the research of
content-based RS image retrieval
- …