243 research outputs found
A Survey on Feature Selection Algorithms
One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations.
DOI: 10.17762/ijritcc2321-8169.16043
Recommended from our members
An investigation of multilevel refinement in routing and location problems
Multilevel refinement is a collaborative hierarchical solution technique. The multilevel technique aims to enhance the solution process of optimisation problems by improving the asymptotic convergence in the quality of solutions produced by its underlying local search heuristics and/or improving the convergence rate of these heuristics. To these aims, the central methodologies of the multilevel technique are filtering solutions from the search space (via coarsening), reducing the amount of problem detail considered at each level of the solution process and providing a mechanism to the underlying local search heuristics for efficiently making large moves around the search space. The neighbourhoods accessible by these moves are typically inaccessible if the local search heuristics are applied to the un-coarsened problems. The methodologies combine to meet the multilevel technique's aims, because, as the multilevel technique iteratively coarsens, extends and refines a given problem, it reduces the possibility of the local search heuristic becoming trapped in local optima of poor quality.
The research presented in this thesis investigates the application of multilevel refinement to classes of location and routing problems and develops numerous multilevel algorithms. Some of these algorithms are collaborative techniques for metaheuristics and others are collaborative techniques for local search heuristics. Additionally, new methods of coarsening for location and routing problems and enhancements for the multilevel technique are developed. It is demonstrated that the multilevel technique is suited to a wide array of problems. By extending the investigations of the multilevel technique across routing and location problems, the research was able to present generalisations regarding the multilevel technique's suitability, for these and similar types of problems.
Finally, results on a number of well known benchmarking suites for location and routing problem are presented, comparing equivalent single-level and multilevel algorithms. These results demonstrate that the multilevel technique provides significant gains over its single-level counterparts. In all cases, the multilevel algorithm was able to improve the asymptotic convergence in the quality of solutions produced by the standard (single-level) local search heuristics or metaheuristics. The multilevel technique did not improve the convergence rate of the single-level's local search heuristics in all cases. However, for large-scale problems the multilevel variants scaled in a manner superior to the single-level techniques. The research also demonstrated that for sufficiently large problems, the multilevel technique was able to improve the asymptotic convergence in the quality of solutions at a sufficiently fast rate, such that the multilevel algorithms were able to produce superior results compared to the single-level versions, without refining the solution down to the most detailed level
Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare
Nature-Inspired Computing or NIC for short is a relatively young field that
tries to discover fresh methods of computing by researching how natural
phenomena function to find solutions to complicated issues in many contexts. As
a consequence of this, ground-breaking research has been conducted in a variety
of domains, including synthetic immune functions, neural networks, the
intelligence of swarm, as well as computing of evolutionary. In the domains of
biology, physics, engineering, economics, and management, NIC techniques are
used. In real-world classification, optimization, forecasting, and clustering,
as well as engineering and science issues, meta-heuristics algorithms are
successful, efficient, and resilient. There are two active NIC patterns: the
gravitational search algorithm and the Krill herd algorithm. The study on using
the Krill Herd Algorithm (KH) and the Gravitational Search Algorithm (GSA) in
medicine and healthcare is given a worldwide and historical review in this
publication. Comprehensive surveys have been conducted on some other
nature-inspired algorithms, including KH and GSA. The various versions of the
KH and GSA algorithms and their applications in healthcare are thoroughly
reviewed in the present article. Nonetheless, no survey research on KH and GSA
in the healthcare field has been undertaken. As a result, this work conducts a
thorough review of KH and GSA to assist researchers in using them in diverse
domains or hybridizing them with other popular algorithms. It also provides an
in-depth examination of the KH and GSA in terms of application, modification,
and hybridization. It is important to note that the goal of the study is to
offer a viewpoint on GSA with KH, particularly for academics interested in
investigating the capabilities and performance of the algorithm in the
healthcare and medical domains.Comment: 35 page
Edited nearest neighbour for selecting keyframe summaries of egocentric videos
A keyframe summary of a video must be concise, comprehensive and diverse. Current video summarisation methods may not be able to enforce diversity of the summary if the events have highly similar visual content, as is the case of egocentric videos. We cast the problem of selecting a keyframe summary as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a dataset in some feature space, we propose a Greedy Tabu Selector algorithm (GTS) which picks one frame to represent each class. An experiment with the UT (Egocentric) video database and seven feature representations illustrates the proposed keyframe summarisation method. GTS leads to improved match to the user ground truth compared to the closest-to-centroid baseline summarisation method. Best results were obtained with feature spaces obtained from a convolutional neural network (CNN).Leverhulme Trust, UKSao Paulo Research Foundation - FAPESPBangor Univ, Sch Comp Sci, Dean St, Bangor LL57 1UT, Gwynedd, WalesFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilLeverhulme: RPG-2015-188FAPESP: 2016/06441-7Web of Scienc
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends
Computer vision (CV) is a big and important field
in artificial intelligence covering a wide range of applications.
Image analysis is a major task in CV aiming to extract, analyse
and understand the visual content of images. However, imagerelated
tasks are very challenging due to many factors, e.g., high
variations across images, high dimensionality, domain expertise
requirement, and image distortions. Evolutionary computation
(EC) approaches have been widely used for image analysis with
significant achievement. However, there is no comprehensive
survey of existing EC approaches to image analysis. To fill
this gap, this paper provides a comprehensive survey covering
all essential EC approaches to important image analysis tasks
including edge detection, image segmentation, image feature
analysis, image classification, object detection, and others. This
survey aims to provide a better understanding of evolutionary
computer vision (ECV) by discussing the contributions of different
approaches and exploring how and why EC is used for
CV and image analysis. The applications, challenges, issues, and
trends associated to this research field are also discussed and
summarised to provide further guidelines and opportunities for
future research
Improving the efficiency of Bayesian Network Based EDAs and their application in Bioinformatics
Estimation of distribution algorithms (EDAs) is a relatively new trend of stochastic optimizers which have received a lot of attention during last decade. In each generation, EDAs build probabilistic models of promising solutions of an optimization problem to guide the search process. New sets of solutions are obtained by sampling the corresponding probability distributions. Using this approach, EDAs are able to provide the user a set of models that reveals the dependencies between variables of the optimization problems while solving them. In order to solve a complex problem, it is necessary to use a probabilistic model which is able to capture the dependencies. Bayesian networks are usually used for modeling multiple dependencies between variables. Learning Bayesian networks, especially for large problems with high degree of dependencies among their variables is highly computationally expensive which makes it the bottleneck of EDAs. Therefore introducing efficient Bayesian learning algorithms in EDAs seems necessary in order to use them for large problems. In this dissertation, after comparing several Bayesian network learning algorithms, we propose an algorithm, called CMSS-BOA, which uses a recently introduced heuristic called max-min parent children (MMPC) in order to constrain the model search space. This algorithm does not consider a fixed and small upper bound on the order of interaction between variables and is able solve problems with large numbers of variables efficiently. We compare the efficiency of CMSS-BOA with the standard Bayesian network based EDA for solving several benchmark problems and finally we use it to build a predictor for predicting the glycation sites in mammalian proteins
Adaptive Feature Engineering Modeling for Ultrasound Image Classification for Decision Support
Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually significantly underrepresented compared to the non-target class. This makes it difficult to train standard classification models like Support Vector Machine (SVM), Decision Trees, and Nearest Neighbor techniques on biomedical datasets because they assume an equal class distribution or an equal misclassification cost. Resampling techniques by either oversampling the minority class or under-sampling the majority class have been proposed to mitigate the class imbalance problem but with minimal success. We propose a method of resolving the class imbalance problem with the design of a novel data-adaptive feature engineering model for extracting, selecting, and transforming textural features into a feature space that is inherently relevant to the application domain.
We hypothesize that by maximizing the variance and preserving as much variability in well-engineered features prior to applying a classifier model will boost the differentiation of the thyroid nodules (benign or malignant) through effective model building. Our proposed a hybrid approach of applying Regression and Rule-Based techniques to build our Feature Engineering and a Bayesian Classifier respectively.
In the Feature Engineering model, we transformed images pixel intensity values into a high dimensional structured dataset and fitting a regression analysis model to estimate relevant kernel parameters to be applied to the proposed filter method. We adopted an Elastic Net Regularization path to control the maximum log-likelihood estimation of the Regression model. Finally, we applied a Bayesian network inference to estimate a subset for the textural features with a significant conditional dependency in the classification of the thyroid lesion. This is performed to establish the conditional influence on the textural feature to the random factors generated through our feature engineering model and to evaluate the success criterion of our approach.
The proposed approach was tested and evaluated on a public dataset obtained from thyroid cancer ultrasound diagnostic data. The analyses of the results showed that the classification performance had a significant improvement overall for accuracy and area under the curve when then proposed feature engineering model was applied to the data. We show that a high performance of 96.00% accuracy with a sensitivity and specificity of 99.64%) and 90.23% respectively was achieved for a filter size of 13 × 13
- …