Search CORE

1,019 research outputs found

Predicting Pancreatic Cancer Using Support Vector Machine

Author: Bodkhe Akshay
Publication venue: SJSU ScholarWorks
Publication date: 26/05/2017
Field of study

This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately

SJSU ScholarWorks

An experiment with association rules and classification: post-bagging and conviction

Author: A. Jorge
B. Liu
B. Liu
D. Meretakis
I. Kononenko
I.H. Witten
K. Ali
L. Breiman
M.J. Zaki
P. Domingos
R. Ihaka
R.J. Bayardo
T. Hastie
T.K. Ho
U.M. Fayyad
V. Jovanoski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and X². We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.Programa de Financiamento Plurianual de Unidades de I & D.Comunidade Europeia (CE). Fundo Europeu de Desenvolvimento Regional (FEDER).Fundação para a Ciência e a Tecnologia (FCT) - POSI/SRI/39630/2001/Class Project

CiteSeerX

FEATURE ANALYSIS OF HYPERSPECTRAL IMAGES FOR PLANT CLASSIFICATION

Author: Bin Ghaith Alsuwaidi Ali Rashed Saeed A
Publication venue
Publication date: 31/12/2018
Field of study

A Global Discretization Approach to Handle Numerical Attributes as Preprocessing

Author: Wu Xun
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2015
Field of study

Discretization is a common technique to handle numerical attributes in data mining, and it divides continuous values into several intervals by defining multiple thresholds. Decision tree learning algorithms, such as C4.5 and random forests, are able to deal with numerical attributes by applying discretization technique and transforming them into nominal attributes based on one impurity-based criterion, such as information gain or Gini gain. However, there is no doubt that a considerable amount of distinct values are located in the same interval after discretization, through which digital information delivered by the original continuous values are lost. In this thesis, we proposed a global discretization method that can keep the information within the original numerical attributes by expanding them into multiple nominal ones based on each of the candidate cut-point values. The discretized data set, which includes only nominal attributes, evolves from the original data set. We analyzed the problem by applying two decision tree learning algorithms (C4.5 and random forests) respectively to each of the twelve pairs of data sets (original and discretized data sets) and evaluating the performances (prediction accuracy rate) of the obtained classification models in Weka Experimenter. This is followed by two separate Wilcoxon tests (each test for one learning algorithm) to decide whether there is a level of statistical significance among these paired data sets. Results of both tests indicate that there is no clear difference in terms of performances by using the discretized data sets compared to the original ones. But in some cases, the discretized models of both classifiers slightly outperform their paired original models

BAC: A bagged associative classifier for big data frameworks

Author: Apiletti Daniele
Garza Paolo
Venturini Luca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Meta-Analysis of Vaterite Secondary Data Revealed the Synthesis Conditions for Polymorphic Control

Author: Afzal Waheed
Carballo-Meilan Maria
Mcdonald Lewis
Pragot Wanawan
Saleemi Ali Nauman
Starnawski Lukasz Michal
Publication venue
Publication date: 06/10/2023
Field of study

Acknowledgements This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Peer reviewedPostprin