Search CORE

262 research outputs found

Estimating labels from label proportions

Author: Caetano TS
Le QV
Quadrianto N
Smola AJ
Publication venue: Microtome Publishing
Publication date: 01/01/2008
Field of study

Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like e-commerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice.

CiteSeerX

Crossref

Sussex Research Online

CUED - Cambridge University Engineering Department

RIDI: Robust IMU Double Integration

Author: AJ Davison
AJ Smola
C Cadena
CH Lim
J Engel
JA Hesch
R Mur-Artal
S Leutenegger
Publication venue
Publication date: 30/12/2017
Field of study

This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research

arXiv.org e-Print Archive

Crossref

Multivariate dynamic kernels for financial time series forecasting

Author: AJ Smola
EF Fama
FE Tay
K-J Kim
L Wang
R Tsay
W Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44781-0_40We propose a forecasting procedure based on multivariate dynamic kernels, with the capability of integrating information measured at different frequencies and at irregular time intervals in financial markets. A data compression process redefines the original financial time series into temporal data blocks, analyzing the temporal information of multiple time intervals. The analysis is done through multivariate dynamic kernels within support vector regression. We also propose two kernels for financial time series that are computationally efficient without a sacrifice on accuracy. The efficacy of the methodology is demonstrated by empirical experiments on forecasting the challenging S&P500 market.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Text-based similarity searching for hit- and lead-candidate identification

Author: AJ Smola
AK Hartmann
CE Bonferroni
R Durbin
R Shaw
Volker Hähnke
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep Learning for Forecasting Stock Returns in the Cross-Section

Author: A Subrahmanyam
AJ Smola
C Krauss
CR Harvey
D Olson
GS Atsalakis
I Goodfellow
L Breiman
L Kryzanowski
Q Cao
RD McLean
S Soni
Y LeCun
Publication venue
Publication date: 12/06/2018
Field of study

Many studies have been undertaken by using machine learning techniques, including neural networks, to predict stock returns. Recently, a method known as deep learning, which achieves high performance mainly in image recognition and speech recognition, has attracted attention in the machine learning field. This paper implements deep learning to predict one-month-ahead stock returns in the cross-section in the Japanese stock market and investigates the performance of the method. Our results show that deep neural networks generally outperform shallow neural networks, and the best networks also outperform representative machine learning models. These results indicate that deep learning shows promise as a skillful machine learning method to predict stock returns in the cross-section.Comment: 12 pages, 2 figures, 8 tables, accepted at PAKDD 201

arXiv.org e-Print Archive

Crossref

Robust artificial neural networks and outlier detection. Technical report

Author: Andrei Kelarev
Cederman D
Gleb Beliakov
Huber PJ
John Yearwood
Makela MM
Mammadov MA
Masters T
Powell MJD
Press AH
Rousseeuw PJ
Rusiecki A
Sengupta S
Smola AJ
Publication venue: 'Informa UK Limited'
Publication date: 02/10/2011
Field of study

Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Federation ResearchOnline

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Robot intelligent trajectory planning based on PCM guided reinforcement learning

Author: AJ Ijspeert
AJ Smola
C Daniel
E Theodorou
J Cai
J Cai
J Kober
J Liu
MD Gregory
PL Lions
T Melzer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/08/2019
Field of study

Crossref

Portsmouth University Research Portal (Pure)

VIP-STB farm: scale-up village to county/province level to support science and technology at backyard (STB) program.

Author: AE Hoerl
AJ Smola
D Silver
J Tschannerl
J Tschannerl
J Zabalza
L Breiman
N Padfield
Q Wang
W Zhang
X Xie
X Xu
Y Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/07/2019
Field of study

In this paper, we introduce a new concept in VIP-STB, a funded project through Agri-Tech in China: Newton Network+ (ATCNN), in developing feasible solutions towards scaling-up STB from village level to upper level via some generic models and systems. There are three tasks in this project, i.e. normalized difference vegetation index (NDVI) estimation, wheat density estimation and household-based small farms (HBSF) engagement. In the first task, several machine learning models have been used to evaluate the performance of NDVI estimation. In the second task, integrated software via Python and Twilio is developed to improve communication services and engagement for HBSFs, and provides technical capabilities. In the third task, crop density/population is predicted by conventional image processing techniques. The objectives and strategy for VIP-STB are described, experimental results on each task are presented, and more details on each model that has been implemented are also provided with future development guidance

Crossref

Open Access Institutional Repository at Robert Gordon University

University of Dundee Online Publications

Predicting sentence translation quality using extrinsic and language independent features

Author: AJ Smola
Declan Groves
Ergun Biçici
FJ Och
I Guyon
I Guyon
Josef van Genabith
JS Albrecht
L Specia
L Wasserman
P Koehn
PF Brown
T Hastie
TM Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/12/2013
Field of study

We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe \texttt{mono} feature functions that are based on statistics of only one side of the parallel training corpora and \texttt{duo} feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the

2

nd best performance overall according to the official results of the Quality Estimation Task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model

Crossref

Irish Universities

DCU Online Research Access Service