Search CORE

45 research outputs found

Predicting Future Instance Segmentation by Forecasting Convolutional Features

Author: A Yang
B Romera-Paredes
J Walker
KM Kitani
PO Pinheiro
R Sutton
T Lan
T-Y Lin
Publication venue
Publication date: 08/09/2018
Field of study

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the "detection head'" of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Portail HAL UNIV-RENNES

Multitask and transfer learning for multi-aspect data

Author: Romera Paredes B
Publication venue: UCL (University College London)
Publication date: 28/12/2014
Field of study

Supervised learning aims to learn functional relationships between inputs and outputs. Multitask learning tackles supervised learning tasks by performing them simultaneously to exploit commonalities between them. In this thesis, we focus on the problem of eliminating negative transfer in order to achieve better performance in multitask learning. We start by considering a general scenario in which the relationship between tasks is unknown. We then narrow our analysis to the case where data are characterised by a combination of underlying aspects, e.g., a dataset of images of faces, where each face is determined by a person's facial structure, the emotion being expressed, and the lighting conditions. In machine learning there have been numerous efforts based on multilinear models to decouple these aspects but these have primarily used techniques from the field of unsupervised learning. In this thesis we take inspiration from these approaches and hypothesize that supervised learning methods can also benefit from exploiting these aspects. The contributions of this thesis are as follows: 1. A multitask learning and transfer learning method that avoids negative transfer when there is no prescribed information about the relationships between tasks. 2. A multitask learning approach that takes advantage of a lack of overlapping features between known groups of tasks associated with different aspects. 3. A framework which extends multitask learning using multilinear algebra, with the aim of learning tasks associated with a combination of elements from different aspects. 4. A novel convex relaxation approach that can be applied both to the suggested framework and more generally to any tensor recovery problem. Through theoretical validation and experiments on both synthetic and real-world datasets, we show that the proposed approaches allow fast and reliable inferences. Furthermore, when performing learning tasks on an aspect of interest, accounting for secondary aspects leads to significantly more accurate results than using traditional approaches

UCL Discovery

Zero-Shot Hashing via Transferring Supervised Knowledge

Author: Frome A.
Gionis A.
Huang E. H.
Jayaraman D.
Kang W.-C.
Krizhevsky A.
Larochelle H.
Liu W.
Norouzi M.
Petrović S.
Romera-Paredes B.
Socher R.
Turian J.
Weiss Y.
Wen Z.
Wu T. T.
Xia R.
Zhang H.
Publication venue
Publication date: 01/01/2016
Field of study

Hashing has shown its efficiency and effectiveness in facilitating large-scale multimedia applications. Supervised knowledge e.g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions. However, confronted with the rapid growth of newly-emerging concepts and multimedia data on the Web, existing supervised hashing approaches may easily suffer from the scarcity and validity of supervised information due to the expensive cost of manual labelling. In this paper, we propose a novel hashing scheme, termed \emph{zero-shot hashing} (ZSH), which compresses images of "unseen" categories to binary codes with hash functions learned from limited training data of "seen" categories. Specifically, we project independent data labels i.e. 0/1-form label vectors) into semantic embedding space, where semantic relationships among all the labels can be precisely characterized and thus seen supervised knowledge can be transferred to unseen classes. Moreover, in order to cope with the semantic shift problem, we rotate the embedded space to more suitably align the embedded semantics with the low-level visual feature space, thereby alleviating the influence of semantic gap. In the meantime, to exert positive effects on learning high-quality hash functions, we further propose to preserve local structural property and discrete nature in binary codes. Besides, we develop an efficient alternating algorithm to solve the ZSH model. Extensive experiments conducted on various real-life datasets show the superior zero-shot image retrieval performance of ZSH as compared to several state-of-the-art hashing methods.Comment: 11 page

arXiv.org e-Print Archive

Crossref

UQ eSpace (University of Queensland)

Emotion Recognition by Two View SVM_2K Classifier on Dynamic Facial Expression Features

Author: Bianchi-Berthouze N
Meng H
Romera-Paredes B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2011
Field of study

A novel emotion recognition system has been proposed for classifying facial expression in videos. Firstly, two types of basic facial appearance descriptors were extracted. The first type of descriptor, called Motion History Histogram (MHH), was used to detect temporal changes of each pixels of the face. The second type of descriptor, called Histogram of Local Binary Patterns (LBP), was applied to each frame of the video and was used to capture local textural patterns. Secondly, based on these two basic types of descriptors, two new dynamic facial expression features called MHH_EOH and LBP_MCF were proposed. These two features incorporate both dynamic and local information. Finally, the Two View SVK_2K classifier was built to integrate these two dynamic features in an efficient way. The experimental results showed that this method outperformed the baseline results set by the FERA'11 challenge

Crossref

UCL Discovery

Tensor completion in hierarchical tensor representations

Compressed sensing extends from the recovery of sparse vectors from undersampled measurements via efficient algorithms to the recovery of matrices of low rank from incomplete information. Here we consider a further extension to the reconstruction of tensors of low multi-linear rank in recently introduced hierarchical tensor formats from a small number of measurements. Hierarchical tensors are a flexible generalization of the well-known Tucker representation, which have the advantage that the number of degrees of freedom of a low rank tensor does not scale exponentially with the order of the tensor. While corresponding tensor decompositions can be computed efficiently via successive applications of (matrix) singular value decompositions, some important properties of the singular value decomposition do not extend from the matrix to the tensor case. This results in major computational and theoretical difficulties in designing and analyzing algorithms for low rank tensor recovery. For instance, a canonical analogue of the tensor nuclear norm is NP-hard to compute in general, which is in stark contrast to the matrix case. In this book chapter we consider versions of iterative hard thresholding schemes adapted to hierarchical tensor formats. A variant builds on methods from Riemannian optimization and uses a retraction mapping from the tangent space of the manifold of low rank tensors back to this manifold. We provide first partial convergence results based on a tensor version of the restricted isometry property (TRIP) of the measurement map. Moreover, an estimate of the number of measurements is provided that ensures the TRIP of a given tensor rank with high probability for Gaussian measurement maps.Comment: revised version, to be published in Compressed Sensing and Its Applications (edited by H. Boche, R. Calderbank, G. Kutyniok, J. Vybiral

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Semi-convolutional operators for instance segmentation

Author: B Hariharan
B Leibe
B Romera-Paredes
D.H. BALLARD
J Dai
J Shi
J Uhrig
N Silberman
P Abaev
PF Felzenszwalb
RO Duda
T-Y Lin
V Ljosa
W Liu
Ľ Ladický
Publication venue: Springer Verlag
Publication date: 01/01/2018
Field of study

Object detection and instance segmentation are dominated by region-based methods such as Mask RCNN. However, there is a growing interest in reducing these problems to pixel labeling tasks, as the latter could be more efficient, could be integrated seamlessly in image-to-image network architectures as used in many other tasks, and could be more accurate for objects that are not well approximated by bounding boxes. In this paper we show theoretically and empirically that constructing dense pixel embeddings that can separate object instances cannot be easily achieved using convolutional operators. At the same time, we show that simple modifications, which we call semi-convolutional, have a much better chance of succeeding at this task. We use the latter to show a connection to Hough voting as well as to a variant of the bilateral kernel that is spatially steered by a convolutional network. We demonstrate that these operators can also be used to improve approaches such as Mask RCNN, demonstrating better segmentation of complex biological shapes and PASCAL VOC categories than achievable by Mask RCNN alone

Crossref

Oxford University Research Archive

CUED - Cambridge University Engineering Department

Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study

Author: Askham H
Back T
Blackwell S
Boon C
Carnell D
Chu C
D'Souza D
De Fauw J
Fuller K
Garie B
Hampton K
Hughes CO
Ireland S
Karthikesalingam A
Kelly C
Ledsam JR
Livne M
McQuinlan Y
Mendes R
Meyer C
Moinuddin SA
Montgomery H
Nikolov S
Patel Y
Rees G
Romera-Paredes B
Ronneberger O
Suleyman M
Zverovitch A
Publication venue
Publication date: 01/07/2021
Field of study

BACKGROUND: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. OBJECTIVE: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. METHODS: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. RESULTS: We demonstrated the model's clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model's generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. CONCLUSIONS: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways

UCL Discovery

Quantum circuit optimization with AlphaTensor

Author: Balog M. (Matej)
Barekatain M. (Mohammadamin)
Bausch J. (Johannes)
Fawzi A. (Alhussein)
Fitzpatrick N. (Nathan)
Heras F.J.H. (Francisco)
Kohli P. (Pushmeet)
Laakkonen T. (Tuomas)
Meichanetzidis K. (Konstantinos)
Novikov A. (Alexander)
Romera-Paredes B. (Bernardino)
Ruiz F.J.R. (Francisco)
Wetering J.M.M. (John) van de
Publication venue
Publication date: 05/03/2024
Field of study

A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcement learning that exploits the relationship between optimizing T-count and tensor decomposition. Unlike existing methods for T-count optimization, AlphaTensor-Quantum can incorporate domain-specific knowledge about quantum computation and leverage gadgets, which significantly reduces the T-count of the optimized circuits. AlphaTensor-Quantum outperforms the existing methods for T-count optimization on a set of arithmetic benchmarks (even when compared without making use of gadgets). Remarkably, it discovers an efficient algorithm akin to Karatsuba's method for multiplication in finite fields. AlphaTensor-Quantum also finds the best human-designed solutions for relevant arithmetic computations used in Shor's algorithm and for quantum chemistry simulation, thus demonstrating it can save hundreds of hours of research by optimizing relevant quantum circuits in a fully automated way

CWI's Institutional Repository

Clinically applicable deep learning for diagnosis and referral in retinal disease

Author: A Esteva
Adnan Tufail
AG Roy
Alan Karthikesalingam
AR Rudnicka
B Foot
Balaji Lakshminarayanan
Bernardino Romera-Paredes
Brendan O’Donoghue
Catherine Egan
CG Owen
Clemens Meyer
CS Lee
CS Lee
Cían O. Hughes
D Castelvecchi
D Huang
Daniel Visentin
Dawn A. Sim
Demis Hassabis
Dominic King
E Villani
FA Folgar
Faith Mackinder
George van den Driessche
Geraint Rees
Harry Askham
Hugh Montgomery
J Fauw De
J Schindelin
JC Buchan
JD Whited
Jeffrey De Fauw
Joseph R. Ledsam
Julian Hughes
Julien Cornebise
Kareem Ayoub
KB Schaal
L Arias
L Fang
Mustafa Suleyman
Nenad Tomasev
Olaf Ronneberger
PA Keane
PA Keane
Pearse A. Keane
Peng T. Khaw
PP Srinivasan
PS Muether
R Chopra
Reena Chopra
Rosalind Raine
RRA Bourne
S Farsiu
Sam Blackwell
Simon Bouton
SPK Karri
Stanislav Nikolov
T Schlegl
Trevor Back
U Schmidt-Erfurth
U Schmidt-Erfurth
U Schmidt-Erfurth
V Gulshan
Xavier Glorot
Publication venue
Publication date: 13/08/2018
Field of study

The volume and complexity of diagnostic imaging is increasing at a pace faster than the availability of human expertise to interpret it. Artificial intelligence has shown great promise in classifying two-dimensional photographs of some common diseases and typically relies on databases of millions of annotated images. Until now, the challenge of reaching the performance of expert clinicians in a real-world clinical pathway with three-dimensional diagnostic scans has remained unsolved. Here, we apply a novel deep learning architecture to a clinically heterogeneous set of three-dimensional optical coherence tomography scans from patients referred to a major eye hospital. We demonstrate performance in making a referral recommendation that reaches or exceeds that of experts on a range of sight-threatening retinal diseases after training on only 14,884 scans. Moreover, we demonstrate that the tissue segmentations produced by our architecture act as a device-independent representation; referral accuracy is maintained when using tissue segmentations from a different type of device. Our work removes previous barriers to wider clinical use without prohibitive training data requirements across multiple pathologies in a real-world setting

Crossref

UCL Discovery

Recurrent instance segmentation

Author: Romera-Paredes B
Torr PHS
Publication venue: Springer Verlag
Publication date: 01/01/2016
Field of study

Instance segmentation is the problem of detecting and delineating each distinct object of interest appearing in an image. Current instance segmentation approaches consist of ensembles of modules that are trained independently of each other, thus missing opportunities for joint learning. Here we propose a new instance segmentation paradigm consisting in an end-to-end method that learns how to segment instances sequentially. The model is based on a recurrent neural network that sequentially finds objects and their segmentations one at a time. This net is provided with a spatial memory that keeps track of what pixels have been explained and allows occlusion handling. In order to train the model we designed a principled loss function that accurately represents the properties of the instance segmentation problem. In the experiments carried out, we found that our method outperforms recent approaches on multiple person segmentation, and all state of the art approaches on the Plant Phenotyping dataset for leaf counting

Oxford University Research Archive