58,098 research outputs found
Predicting Pancreatic Cancer Using Support Vector Machine
This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately
Exploring the similarity of medical imaging classification problems
Supervised learning is ubiquitous in medical image analysis. In this paper we
consider the problem of meta-learning -- predicting which methods will perform
well in an unseen classification problem, given previous experience with other
classification problems. We investigate the first step of such an approach: how
to quantify the similarity of different classification problems. We
characterize datasets sampled from six classification problems by performance
ranks of simple classifiers, and define the similarity by the inverse of
Euclidean distance in this meta-feature space. We visualize the similarities in
a 2D space, where meaningful clusters start to emerge, and show that the
proposed representation can be used to classify datasets according to their
origin with 89.3\% accuracy. These findings, together with the observations of
recent trends in machine learning, suggest that meta-learning could be a
valuable tool for the medical imaging community
VEWS: A Wikipedia Vandal Early Warning System
We study the problem of detecting vandals on Wikipedia before any human or
known vandalism detection system reports flagging potential vandals so that
such users can be presented early to Wikipedia administrators. We leverage
multiple classical ML approaches, but develop 3 novel sets of features. Our
Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing
patterns as features to classify some users as vandals. Our Wikipedia
Transition Probability Matrix (WTPM) approach uses a set of features derived
from a transition probability matrix and then reduces it via a neural net
auto-encoder to classify some users as vandals. The VEWS approach merges the
previous two approaches. Without using any information (e.g. reverts) provided
by other users, these algorithms each have over 85% classification accuracy.
Moreover, when temporal recency is considered, accuracy goes to almost 90%. We
carry out detailed experiments on a new data set we have created consisting of
about 33K Wikipedia users (including both a black list and a white list of
editors) and containing 770K edits. We describe specific behaviors that
distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG
and STiki, the best known algorithms today for vandalism detection. Moreover,
VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39
edits before ClueBot NG when both detect the vandal. However, we show that the
combination of VEWS and ClueBot NG can give a fully automated vandal early
warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of
Knowledge Discovery and Data Mining (KDD 2015
- …