463 research outputs found
Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels
Time series classification is essential in many fields, such as medicine,
finance, environmental science, and manufacturing, enabling tasks like disease
diagnosis, anomaly detection, and stock price prediction. Machine learning
models like Recurrent Neural Networks and InceptionTime, while successful in
numerous applications, can face scalability limitations due to intensive
training requirements. To address this, random convolutional kernel models such
as Rocket and its derivatives have emerged, simplifying training and achieving
state-of-the-art performance by utilizing a large number of randomly generated
features from time series data. However, due to their random nature, most of
the generated features are redundant or non-informative, adding unnecessary
computational load and compromising generalization. Here, we introduce
Sequential Feature Detachment (SFD) as a method to identify and prune these
non-essential features. SFD uses model coefficients to estimate feature
importance and, unlike previous algorithms, can handle large feature sets
without the need for complex hyperparameter tuning. Testing on the UCR archive
demonstrates that SFD can produce models with of the original features
while improving the accuracy on the test set. We also present an
end-to-end procedure for determining an optimal balance between the number of
features and model accuracy, called Detach-ROCKET. When applied to the largest
binary UCR dataset, Detach-ROCKET is capable of reduce model size by
and increases test accuracy by .Comment: 13 pages, 4 figures, 1 tabl
Recommended from our members
Data Summarizations for Scalable, Robust and Privacy-Aware Learning in High Dimensions
The advent of large-scale datasets has offered unprecedented amounts of information for building statistically powerful machines, but, at the same time, also introduced a remarkable computational challenge: how can we efficiently process massive data? This thesis presents a suite of data reduction methods that make learning algorithms scale on large datasets, via extracting a succinct model-specific representation that summarizes the
full data collection—a coreset. Our frameworks support by design datasets of arbitrary dimensionality, and can be used for general purpose Bayesian inference under real-world constraints, including privacy preservation and robustness to outliers, encompassing diverse uncertainty-aware data analysis tasks, such as density estimation, classification
and regression.
We motivate the necessity for novel data reduction techniques in the first place by developing a reidentification attack on coarsened representations of private behavioural data. Analysing longitudinal records of human mobility, we detect privacy-revealing structural patterns, that remain preserved in reduced graph representations of individuals’ information with manageable size. These unique patterns enable mounting linkage attacks via structural similarity computations on longitudinal mobility traces, revealing an overlooked, yet existing, privacy threat.
We then propose a scalable variational inference scheme for approximating posteriors on large datasets via learnable weighted pseudodata, termed pseudocoresets. We show that the use of pseudodata enables overcoming the constraints on minimum summary size for given approximation quality, that are imposed on all existing Bayesian coreset constructions due to data dimensionality. Moreover, it allows us to develop a scheme for pseudocoresets-based summarization that satisfies the standard framework of differential privacy by construction; in this way, we can release reduced size privacy-preserving representations for sensitive datasets that are amenable to arbitrary post-processing.
Subsequently, we consider summarizations for large-scale Bayesian inference in scenarios when observed datapoints depart from the statistical assumptions of our model. Using robust divergences, we develop a method for constructing coresets resilient to model misspecification. Crucially, this method is able to automatically discard outliers from the generated data summaries. Thus we deliver robustified scalable representations
for inference, that are suitable for applications involving contaminated and unreliable data sources.
We demonstrate the performance of proposed summarization techniques on multiple parametric statistical models, and diverse simulated and real-world datasets, from music genre features to hospital readmission records, considering a wide range of data dimensionalities.Nokia Bell Labs,
Lundgren Fund,
Darwin College, University of Cambridge
Department of Computer Science & Technology, University of Cambridg
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Channel-wise early stopping without a validation set via NNK polytope interpolation
State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.Peer ReviewedPostprint (author's final draft
Entwicklung einer Fully-Convolutional-Netzwerkarchitektur fĂĽr die Detektion von defekten LED-Chips in Photolumineszenzbildern
Nowadays, light-emitting diodes (LEDs) can be found in a large variety of applications, from standard LEDs in domestic lighting solutions to advanced chip designs in automobiles, smart watches and video walls. The advances in chip design also affect the test processes, where the execution of certain contact measurements is exacerbated by ever decreasing chip dimensions or even rendered impossible due to the chip design. As an instance, wafer probing determines the electrical and optical properties of all LED chips on a wafer by contacting each and every chip with a prober needle. Chip designs without a contact pad on the surface, however, elude wafer probing and while electrical and optical properties can be determined by sample measurements, defective LED chips are distributed randomly over the wafer. Here, advanced data analysis methods provide a new approach to gather defect information from already available non-contact measurements. Photoluminescence measurements, for example, record a brightness image of an LED wafer, where conspicuous brightness values indicate defective chips. To extract these defect information from photoluminescence images, a computer-vision algorithm is required that transforms photoluminescence images into defect maps. In other words, each and every pixel of a photoluminescence image must be classifed into a class category via semantic segmentation, where so-called fully-convolutional-network algorithms represent the state-of-the-art method. However, the aforementioned task poses several challenges: on the one hand, each pixel in a photoluminescence image represents an LED chip and thus, pixel-fine output resolution is required. On the other hand, photoluminescence images show a variety of brightness values from wafer to wafer in addition to local areas of differing brightness. Additionally, clusters of defective chips assume various shapes, sizes and brightness gradients and thus, the algorithm must reliably recognise objects at multiple scales. Finally, not all salient brightness values correspond to defective LED chips, requiring the algorithm to distinguish salient brightness values corresponding to measurement artefacts, non-defect structures and defects, respectively.
In this dissertation, a novel fully-convolutional-network architecture was developed that allows the accurate segmentation of defective LED chips in highly variable photoluminescence wafer images. For this purpose, the basic fully-convolutional-network architecture was modifed with regard to the given application and advanced architectural concepts were incorporated so as to enable a pixel-fine output resolution and a reliable segmentation of multiple scaled defect structures. Altogether, the developed dense ASPP Vaughan architecture achieved a pixel accuracy of 97.5 %, mean pixel accuracy of 96.2% and defect-class accuracy of 92.0 %, trained on a dataset of 136 input-label pairs and hereby showed that fully-convolutional-network algorithms can be a valuable contribution to data analysis in industrial manufacturing.Leuchtdioden (LEDs) werden heutzutage in einer Vielzahl von Anwendungen verbaut, angefangen bei Standard-LEDs in der Hausbeleuchtung bis hin zu technisch fortgeschrittenen Chip-Designs in Automobilen, Smartwatches und Videowänden. Die Weiterentwicklungen im Chip-Design beeinflussen auch die Testprozesse: Hierbei wird die Durchführung bestimmter Kontaktmessungen durch zunehmend verringerte Chip-Dimensionen entweder erschwert oder ist aufgrund des Chip-Designs unmöglich. Die sogenannteWafer-Prober-Messung beispielsweise ermittelt die elektrischen und optischen Eigenschaften aller LED-Chips auf einem Wafer, indem jeder einzelne Chip mit einer Messnadel kontaktiert und vermessen wird; Chip-Designs ohne Kontaktpad auf der Oberfläche können daher nicht durch die Wafer-Prober-Messung charakterisiert werden. Während die elektrischen und optischen Chip-Eigenschaften auch mittels Stichprobenmessungen bestimmt werden können, verteilen sich defekte LED-Chips zufällig über die Waferfläche. Fortgeschrittene Datenanalysemethoden ermöglichen hierbei einen neuen Ansatz, Defektinformationen aus bereits vorhandenen, berührungslosen Messungen zu gewinnen. Photolumineszenzmessungen, beispielsweise, erfassen ein Helligkeitsbild des LEDWafers, in dem auffällige Helligkeitswerte auf defekte LED-Chips hinweisen. Ein Bildverarbeitungsalgorithmus, der diese Defektinformationen aus Photolumineszenzbildern extrahiert und ein Defektabbild erstellt, muss hierzu jeden einzelnen Bildpunkt mittels semantischer Segmentation klassifizieren, eine Technik bei der sogenannte Fully-Convolutional-Netzwerke den Stand der Technik darstellen. Die beschriebene Aufgabe wird jedoch durch mehrere Faktoren erschwert: Einerseits entspricht jeder Bildpunkt eines Photolumineszenzbildes einem LED-Chip, so dass eine bildpunktfeine Auflösung der Netzwerkausgabe notwendig ist. Andererseits weisen Photolumineszenzbilder sowohl stark variierende Helligkeitswerte von Wafer zu Wafer als auch lokal begrenzte Helligkeitsabweichungen auf. Zusätzlich nehmen Defektanhäufungen unterschiedliche Formen, Größen und Helligkeitsgradienten an, weswegen der Algorithmus Objekte verschiedener Abmessungen zuverlässig erkennen können muss. Schlussendlich weisen nicht alle auffälligen Helligkeitswerte auf defekte LED-Chips hin, so dass der Algorithmus in der Lage sein muss zu unterscheiden, ob auffällige Helligkeitswerte mit Messartefakten, defekten LED-Chips oder defektfreien Strukturen korrelieren.
In dieser Dissertation wurde eine neuartige Fully-Convolutional-Netzwerkarchitektur entwickelt, die die akkurate Segmentierung defekter LED-Chips in stark variierenden Photolumineszenzbildern von LED-Wafern ermöglicht. Zu diesem Zweck wurde die klassische Fully-Convolutional-Netzwerkarchitektur hinsichtlich der beschriebenen Anwendung angepasst und fortgeschrittene architektonische Konzepte eingearbeitet, um eine bildpunktfeine Ausgabeauflösung und eine zuverlässige Sementierung verschieden großer Defektstrukturen umzusetzen. Insgesamt erzielt die entwickelte dense-ASPP-Vaughan-Architektur eine Pixelgenauigkeit von 97,5 %, durchschnittliche Pixelgenauigkeit von 96,2% und eine Defektklassengenauigkeit von 92,0 %, trainiert mit einem Datensatz von 136 Bildern. Hiermit konnte gezeigt werden, dass Fully-Convolutional-Netzwerke eine wertvolle Erweiterung der Datenanalysemethoden sein können, die in der industriellen Fertigung eingesetzt werden
SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces
Batch Bayesian optimisation and Bayesian quadrature have been shown to be
sample-efficient methods of performing optimisation and quadrature where
expensive-to-evaluate objective functions can be queried in parallel. However,
current methods do not scale to large batch sizes -- a frequent desideratum in
practice (e.g. drug discovery or simulation-based inference). We present a
novel algorithm, SOBER, which permits scalable and diversified batch global
optimisation and quadrature with arbitrary acquisition functions and kernels
over discrete and mixed spaces. The key to our approach is to reformulate batch
selection for global optimisation as a quadrature problem, which relaxes
acquisition function maximisation (non-convex) to kernel recombination
(convex). Bridging global optimisation and quadrature can efficiently solve
both tasks by balancing the merits of exploitative Bayesian optimisation and
explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive
baselines on 12 synthetic and diverse real-world tasks.Comment: 34 pages, 12 figure
Dataset Distillation with Convexified Implicit Gradients
We propose a new dataset distillation algorithm using reparameterization and
convexification of implicit gradients (RCIG), that substantially improves the
state-of-the-art. To this end, we first formulate dataset distillation as a
bi-level optimization problem. Then, we show how implicit gradients can be
effectively used to compute meta-gradient updates. We further equip the
algorithm with a convexified approximation that corresponds to learning on top
of a frozen finite-width neural tangent kernel. Finally, we improve bias in
implicit gradients by parameterizing the neural network to enable analytical
computation of final-layer parameters given the body parameters. RCIG
establishes the new state-of-the-art on a diverse series of dataset
distillation tasks. Notably, with one image per class, on resized ImageNet,
RCIG sees on average a 108% improvement over the previous state-of-the-art
distillation algorithm. Similarly, we observed a 66% gain over SOTA on
Tiny-ImageNet and 37% on CIFAR-100
Meta Learning for Graph Neural Networks
Deep learning has enabled incredible advances in pattern recognition such as the fields of computer vision and natural language processing. One of the most successful areas of deep learning is Convolutional Neural Networks (CNNs). CNNs have helped improve performance on many difficult video and image understanding tasks but are restricted to dense gridded structures. Further, designing their architectures can be challenging, even for image classification problems. The recently introduced graph CNNs can work on both dense gridded structures as well as generic graphs. Graph CNNs have been performing at par with traditional CNNs on tasks such as point cloud classification and segmentation, protein classification and image classification, while reducing the complexity of the network.
Graph CNNs provide an extra challenge in designing architectures due to more complex weight and filter visualization of generic graphs. Designing neural network architectures, yielding optimal performance, is a laborious and rigorous process. Hyperparameter tuning is essential for achieving state of the art results using specific architectures. Using a rich suite of predefined mutations, evolutionary algorithms have had success in delivering a high-quality population from a low-quality starter population. This thesis research formulates the graph CNN architecture design as an evolutionary search problem to generate a high-quality population of graph CNN model architectures for classification tasks on benchmark datasets
Doctor of Philosophy
dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data
- …