Search CORE

277,030 research outputs found

MultiMiTar: A Novel Multi Objective Optimization based miRNA-Target Prediction Method

Author: A Grimson
A Krek
CC Chang
D Betel
F Xiao
GL Papadopoulos
K Deb
M Kertesz
M Maragkakis
M Peter
M Selbach
M Sturm
M Yousef
P Alexiou
R Friedman
Ramkrishna Mitra
S Bandyopadhyay
S Bandyopadhyay
S Wu
Sanghamitra Bandyopadhyay
SD Hsu
SM Johnson
T Ishida
Timothy Ravasi
V Rusinov
VN Vapnik
VN Vapnik
X Wang
Publication venue: Public Library of Science
Publication date: 15/09/2011
Field of study

BACKGROUND: Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. METHODOLOGY/PRINCIPAL FINDING: In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. CONCLUSIONS/SIGNIFICANCE: MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

En-PaFlower: An Ensemble Approach using PSO and Flower Pollination Algorithm for Cancer Diagnosis

Author: Mahapatra Satyasundara
Senapati Sudhir Kumar
Shrivastava Manish
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Machine learning now is used across many sectors and provides consistently precise predictions. The machine learning system is able to learn effectively because the training dataset contains examples of previously completed tasks. After learning how to process the necessary data, researchers have proven that machine learning algorithms can carry out the whole work autonomously. In recent years, cancer has become a major cause of the worldwide increase in mortality. Therefore, early detection of cancer improves the chance of a complete recovery, and Machine Learning (ML) plays a significant role in this perspective. Cancer diagnostic and prognosis microarray dataset is available with the biopsy dataset. Because of its importance in making diagnoses and classifying cancer diseases, the microarray data represents a massive amount. It may be challenging to do an analysis on a large number of datasets, though. As a result, feature selection is crucial, and machine learning provides classification techniques. These algorithms choose the relevant features that help build a more precise categorization model. Accurately classifying diseases is facilitated as a result, which aids in disease prevention. This work aims to synthesize existing knowledge on cancer diagnosis using machine learning techniques into a compact report.  Current research work aims to propose an ensemble-based machine learning model En-PaFlower using Particle Swarm Optimization (PSO) as the feature selection algorithm, Flower Pollination algorithm (FPA) as the optimization algorithm with the majority voting algorithm. Finally, the performance of the proposed algorithm is evaluated over three different types of cancer disease datasets with accuracy, precision, recall, specificity, and F-1 Score etc as the evaluation parameters. The empirical analysis shows that the proposed methodology shows highest accuracy as 95.65%

International Journal on Recent and Innovation Trends in Computing and Communication

Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer’s disease: a feature selection ensemble combining stability and predictability

Author: A Ben
A Kalousis
AL Blum
AL Spedding
Alexandre de Mendonça
Alzheimer Association
American Psychiatric Association
AV Carreiro
B Seijo-Pardo
BC Dickerson
C Bastin
C Cabral
C Salvatore
D Silva
DE Barnes
Dina Silva
DMW Powers
E Grober
E Moradi
F Portet
Francisco L. Ferreira
G Zhao
H Amieva
I Guyon
I Guyon
I Kononenko
J Demsar
J Li
J Maroco
J Ye
JL Lustgarten
L Nanni
L Vandewater
M Guerreiro
M Irish
M Prince
M Prince
Manuela Guerreiro
MJ Summers
N Meinshausen
NM Samtani
NV Chawla
OM Doyle
P Johnson
P Langley
P Scheltens
P Willett
P Yang
RC Petersen
RE Schapire
S Belleville
S Nogueira
Sandra Cardoso
Sara C. Madeira
SF Eskildsen
SG Mueller
SI Dimitriadis
SJ Lee
T Hastie
T Pereira
Telma Pereira
V Bolón-canedo
Y Saeys
Z-H Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer’s Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models. Methods We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. Results The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD. Conclusions The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.PTDC/EEI-SII/1937/2014; SFRH/BD/95846/2013; SFRH/BD/118872/2016info:eu-repo/semantics/publishedVersio

Crossref

Directory of Open Access Journals

Universidade de Lisboa: Repositório.UL

Sapientia

Recommended from our members

Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach

Author: Grosan C
Snasel V
Zawbaa H
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

Selection of a representative set of features is still a crucial and challeng- ing problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of at- tributes (large dimensionality); a very small number of instances or time points (small-instance set). The rst situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with rea- sonable computational resources. The second aspect poses the problem of having insu cient data to learn from (insu cient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired from nature (in particular, from biology). We pro- pose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classi cation accuracy of the data. The methods used are antlion optimization (ALO), grey wolf opti- mization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO)

Brunel University Research Archive

Evaluating the impact of topological protein features on the negative examples selection

Author: D. Malchiodi
M. Frasca
P. Boldi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives

AIR Universita degli studi di Milano

Directory of Open Access Journals

Feature Selection from Clinical Surveys Using Semantic Textual Similarity

Author: Warner Benjamin
Publication venue: Washington University Open Scholarship
Publication date: 14/05/2023
Field of study

Survey data collected from human subjects can contain a high number of features while having a comparatively low quantity of examples. Machine learning models that attempt to predict outcomes from survey data under these conditions can overfit and result in poor generalizability. One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon. A relatively unexplored source of information in the feature selection process is the usage of textual names of features, which may be semantically indicative of which features are relevant to a target outcome. The relationships between feature names and target names can be evaluated using large language models (LLMs) such as ClinicalBERT to produce semantic textual similarity (STS) scores, which can then be used to select features. This thesis introduces two new variations upon the minimal-redundancy-maximal-relevance (mRMR) algorithm that integrate semantic textual similarity (STS) into selection. The performance of STS as a feature selection metric is evaluated against preliminary survey data collected as a part of a clinical study on persistent post-surgical pain (PPSP). The results suggest that features selected with STS can result in higher performance models compared to those with the baseline mRMR algorithm

Washington University St. Louis: Open Scholarship

Feature selection algorithms: a survey and experimental evaluation

Author: Belanche Muñoz Luis Antonio
Molina Luis
Nebot Castells M. Àngela
Publication venue
Publication date: 01/01/2003
Field of study

In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enables to adequately decide which algorithm to use in certain situations. This work reviews several fundamental algorithms found in the literature and assesses their performance in a controlled scenario. A scoring measure ranks the algorithms by taking into account the amount of relevance, irrelevance and redundance on sample data sets. This measure computes the degree of matching between the output given by the algorithm and the known optimal solution. Sample size effects are also studied.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Machine learning for automatic prediction of the quality of electrophysiological recordings

Author: AB Wiltschko
BT Priest
C Mathes
CG Galizia
Dominique Martinez
F Franke
H Lei
Jean-Pierre Rospars
Johannes Reisert
M Asmild
MS Lewicki
R Friedrich
R Kohavi
S Panzeri
S Takahashi
SB Wilson
Shereen Elbanna
Sylvia Anton
T Nowotny
Thomas Nowotny
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters

Public Library of Science (PLOS)

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Sussex Research Online

FigShare