Search CORE

10 research outputs found

OPML: A One-Pass Closed-Form Solution for Online Metric Learning

Author: Gao Yang
Huo Jing
Li Wenbin
Shi Yinghuan
Wang Lei
Zhou Luping
Publication venue
Publication date: 28/09/2016
Field of study

To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e.,

O(d)

) and time (i.e.,

O(d^2)

) complexity, where

d

is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.Comment: 12 page

arXiv.org e-Print Archive

Research Online

Prediction of combustion state through a semi-supervised learning model and flame imaging

Author: Abdurakipov
Adewole
Akintayo
Bai
Ballester
Bielza
Chen
Chen
Duan
González-Cencerrado
González-Cencerrado
Gu
Han
Han
Hao
He
Hernández
Hinton
Jiao
Larochelle
Laurens
Lei
Li
Liu
Liu
Liukkonen
Lu
Lunderman
Lyu
Qian
Qiu
Rasmussen
Smrekar
Sun
Sun
Tang
Toth
Vincent
Wang
Wang
Wang
Wang
Wang
Wu
Yan
Zhai
Zhang
Zhou
Zhou
Zhou
Zhu
Ögren
Publication venue: 'Elsevier BV'
Publication date: 30/11/2020
Field of study

Accurate prediction of combustion state is crucial for an in-depth understanding of furnace performance and optimize operation conditions. Traditional data-driven approaches such as artificial neural networks and support vector machine incorporate distinct features which require prior knowledge for feature extraction and suffers poor generalization for unseen combustion states. Therefore, it is necessary to develop an advanced and accurate prediction model to resolve these limitations. This study presents a novel semi-supervised learning model integrating denoising autoencoder (DAE), generative adversarial network (GAN) and Gaussian process classifier (GPC). The DAE network is established to extract representative features of flame images and the network trained through the adversarial learning mechanism of the GAN. Structural similarity (SSIM) metric is introduced as a novel loss function to improve the feature learning ability of the DAE network. The extracted features are then fed into the GPC to predict the seen and unseen combustion states. The effectiveness of the proposed semi-supervised learning model, i.e., DAE-GAN-GPC was evaluated through 4.2Â MW heavy oil-fired boiler furnace flame images captured under different combustion states. The averaged prediction accuracy of 99.83% was achieved for the seen combustion states. The new states (unseen) were predicted accurately through the proposed model by fine-tuning of GPC without retraining the DAE-GAN and averaged prediction accuracy of 98.36% was achieved for the unseen states. A comparative study was also carried out with other deep neural networks and classifiers. Results suggested that the proposed model provides better prediction accuracy and robustness capability compared to other traditional prediction models

Crossref

Kent Academic Repository

A Survey on Metric Learning for Feature Vectors and Structured Data

Author: Bellet Aurélien
Habrard Amaury
Sebban Marc
Publication venue
Publication date: 01/01/2013
Field of study

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

arXiv.org e-Print Archive

HAL-UJM

Applications of deep learning and statistical methods for a systems understanding of convergence in immune repertoires

Author: Moghimi Pejvak Abbas Zadeh
Publication venue: Birkbeck, University of London
Publication date
Field of study

Deep learning and adaptive immune receptor repertoire (AIRR) biology are two emerging fields that are highly compatible due to the inherent complexity of the immune systems and the enormous amount of data produced in AIRR-sequencing research combined with the revolutionary success of deep learning technology to make predictions about high dimensional complex systems/data. We took steps towards the effective utilisation of and statistical methods in repertoire immunology by undertaking one of the central problems in immunology, i.e. immune repertoire convergence. First, we took part in developing and testing an array of summary statistics for immune repertoires to gain insights into the descriptive features of immune repertoires and grant us the ability to compare repertoires. We collected the deepest sequencing datasets to address whether the population-wide genomic convergence of immunoglobulin molecules can be predicted. The immunoglobulin molecules were labelled with their “degree of commonality” (DoC), defined as the number of times an immunoglobulin V3J clonotype is observed in a population, where a V3J clonotype is defined by its V and J genes and CDR3 sequence. We developed various bespoke data analytics methods, informed at different stages by the summary statistics we had previously implemented. Importantly, we demonstrated that machine learning (ML) predictions for immune repertoires could lead to misleadingly positive outcomes if data is processed inappropriately due to “data leakage” and addressed this issue by implementing a leak-free data processing pipeline. Here, data leakage refers to immunoglobulin sequences with the same clonotype definition spreading across the train-validation-test splits in the ML task. We designed a multitude of bespoke deep neural network architectures, implemented under various modelling approaches, including a customised squeeze-and-excitation temporal convolutional neural network (SE-TCN) and a Transformer model. Unsurprisingly, given the continuous spectrum of DoCs, regression modelling proved to be the best approach, both in the granularity of predictions and error distribution. Finally, we report that our SE-TCN architecture under the regression modelling framework achieves state-of-the-art performance by achieving an overall mean absolute error (MAE) score of 0.083 and per-DoC error distributions with reasonably small standard deviations

Birkbeck Institutional Research Online