Search CORE

931,417 research outputs found

Stacking for machine learning redshifts applied to SDSS galaxies

Author: Hoyle Ben
Paech Kerstin
Rau Markus Michael
Seitz Stella
Weller Jochen
Zitlau Roman
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/06/2016
Field of study

We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4% and 2.5% for the explored metrics and comes at almost no additional computational cost.Comment: 13 pages, 3 tables, 7 figures version accepted by MNRAS, minor text updates. Results and conclusions unchange

arXiv.org e-Print Archive

MPG.PuRe

Industry-scale application and evaluation of deep learning for drug target prediction

Author: Ashby Thomas J.
Böhm Stanislav
Ceulemans Hugo
Chen Hongming
Chupakhin Vladimir
Cima Vojtěch
Engkvist Ola
Golib-Dzib Jose-Felipe
Greene Nigel
Hochreiter Sepp
Jeliazkova Nina
Klambauer Günter
Martinovič Jan
Mayr Andreas
Sturm Noe
Van Thanh Le
Vander Aa Tom
Vandriessche Yves
Wegner Joerg
Publication venue: Springer Nature
Publication date: 05/06/2019
Field of study

Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.Web of Science121art. no. 2

DSpace at VSB Technical University of Ostrava

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Encog: Library of Interchangeable Machine Learning Models for Java and C#

Author: Heaton Jeff
Publication venue
Publication date: 15/06/2015
Field of study

This paper introduces the Encog library for Java and C#, a scalable, adaptable, multiplatform machine learning framework that was 1st released in 2008. Encog allows a variety of machine learning models to be applied to datasets using regression, classification, and clustering. Various supported machine learning models can be used interchangeably with minimal recoding. Encog uses efficient multithreaded code to reduce training time by exploiting modern multicore processors. The current version of Encog can be downloaded from http://www.encog.org

arXiv.org e-Print Archive

CiteSeerX

Knowledge representation issues in control knowledge learning

Author: Aler Ricardo
Borrajo Millán Daniel
Isasi Pedro
Publication venue: Morgan Kaufmann
Publication date: 01/01/2000
Field of study

Seventeenth International Conference on Machine Learning. Stanford, CA, USA, 29 June-2 July, 2000Knowledge representation is a key issue for any machine learning task. There have already been many comparative studies about knowledge representation with respect to machine learning in classication tasks. However, apart from some work done on reinforcement learning techniques in relation to state representation, very few studies have concentrated on the eect of knowledge representation for machine learning applied to problem solving, and more specically, to planning. In this paper, we present an experimental comparative study of the eect of changing the input representation of planning domain knowledge on control knowledge learning. We show results in two classical domains using three dierent machine learning systems, that have previously shown their eectiveness on learning planning control knowledge: a pure ebl mechanism, a combination of ebl and induction (hamlet), and a Genetic Programming based system (evock).Publicad

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Anomaly detection for machine learning redshifts applied to SDSS galaxies

Author: Bonnett Christopher
Hoyle Ben
Paech Kerstin
Rau Markus Michael
Seitz Stella
Weller Jochen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2015
Field of study

We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80% when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.Comment: 13 pages, 8 figures, 1 table, minor text updates to macth MNRAS accepted versio

arXiv.org e-Print Archive

MPG.PuRe

Feature importance for machine learning redshifts applied to SDSS galaxies

Author: Hoyle Ben
Rau Markus Michael
Seitz Stella
Weller Jochen
Zitlau Roman
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/03/2015
Field of study

We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine Adaboost (hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or `features') and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNN) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18% and decreases the catastrophic outlier rate by 32%. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the SDSS. We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a machine learning redshift while reducing both the catastrophic outlier rate by up to 43%, and the redshift error by up to 25%. When compared to the SDSS photometric redshifts, the RDF machine learning redshifts both decreases the standard deviation of residuals scaled by 1/(1+z) by 36% from 0.066 to 0.041, and decreases the fraction of catastrophic outliers by 57% from 2.32% to 0.99%.Comment: 10 pages, 4 figures, updated to match version accepted in MNRA

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe