33,944 research outputs found
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Spatial Random Sampling: A Structure-Preserving Data Sketching Tool
Random column sampling is not guaranteed to yield data sketches that preserve
the underlying structures of the data and may not sample sufficiently from
less-populated data clusters. Also, adaptive sampling can often provide
accurate low rank approximations, yet may fall short of producing descriptive
data sketches, especially when the cluster centers are linearly dependent.
Motivated by that, this paper introduces a novel randomized column sampling
tool dubbed Spatial Random Sampling (SRS), in which data points are sampled
based on their proximity to randomly sampled points on the unit sphere. The
most compelling feature of SRS is that the corresponding probability of
sampling from a given data cluster is proportional to the surface area the
cluster occupies on the unit sphere, independently from the size of the cluster
population. Although it is fully randomized, SRS is shown to provide
descriptive and balanced data representations. The proposed idea addresses a
pressing need in data science and holds potential to inspire many novel
approaches for analysis of big data
Metaheuristic design of feedforward neural networks: a review of two decades of research
Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era
Learning a Complete Image Indexing Pipeline
To work at scale, a complete image indexing system comprises two components:
An inverted file index to restrict the actual search to only a subset that
should contain most of the items relevant to the query; An approximate distance
computation mechanism to rapidly scan these lists. While supervised deep
learning has recently enabled improvements to the latter, the former continues
to be based on unsupervised clustering in the literature. In this work, we
propose a first system that learns both components within a unifying neural
framework of structured binary encoding
- …