Search CORE

14,979 research outputs found

A difference boosting neural network for automated star-galaxy classification

Author: A. Kembhavi
Andreon
Bayes
Bertin
Folkes
Jannuzi
K. B. Joseph
Kohonen
Miller
N. S. Philip
Odewahn
Philip
Rogers
Singh
Weir
Y. Wadadekar
Publication venue: 'EDP Sciences'
Publication date: 01/01/2002
Field of study

In this paper we describe the use of a new artificial neural network, called the difference boosting neural network (DBNN), for automated classification problems in astronomical data analysis. We illustrate the capabilities of the network by applying it to star galaxy classification using recently released, deep imaging data. We have compared our results with classification made by the widely used Source Extractor (SExtractor) package. We show that while the performance of the DBNN in star-galaxy classification is comparable to that of SExtractor, it has the advantage of significantly higher speed and flexibility during training as well as classification.Comment: 9 pages, 1figure, 7 tables, accepted for publication in Astronomy and Astrophysic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

HAL-INSU

CERN Document Server

Predicting Pancreatic Cancer Using Support Vector Machine

Author: Bodkhe Akshay
Publication venue: SJSU ScholarWorks
Publication date: 26/05/2017
Field of study

This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately

SJSU ScholarWorks

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Hierarchic Bayesian models for kernel learning

Author: Girolami M.
Rogers S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method

CiteSeerX

Enlighten