34 research outputs found
Discriminative Learning of Similarity and Group Equivariant Representations
One of the most fundamental problems in machine learning is to compare
examples: Given a pair of objects we want to return a value which indicates
degree of (dis)similarity. Similarity is often task specific, and pre-defined
distances can perform poorly, leading to work in metric learning. However,
being able to learn a similarity-sensitive distance function also presupposes
access to a rich, discriminative representation for the objects at hand. In
this dissertation we present contributions towards both ends. In the first part
of the thesis, assuming good representations for the data, we present a
formulation for metric learning that makes a more direct attempt to optimize
for the k-NN accuracy as compared to prior work. We also present extensions of
this formulation to metric learning for kNN regression, asymmetric similarity
learning and discriminative learning of Hamming distance. In the second part,
we consider a situation where we are on a limited computational budget i.e.
optimizing over a space of possible metrics would be infeasible, but access to
a label aware distance metric is still desirable. We present a simple, and
computationally inexpensive approach for estimating a well motivated metric
that relies only on gradient estimates, discussing theoretical and experimental
results. In the final part, we address representational issues, considering
group equivariant convolutional neural networks (GCNNs). Equivariance to
symmetry transformations is explicitly encoded in GCNNs; a classical CNN being
the simplest example. In particular, we present a SO(3)-equivariant neural
network architecture for spherical data, that operates entirely in Fourier
space, while also providing a formalism for the design of fully Fourier neural
networks that are equivariant to the action of any continuous compact group.Comment: PhD thesi
Deep representations of structures in the 3D-world
This thesis demonstrates a collection of neural network tools that leverage the structures and symmetries of the 3D-world. We have explored various aspects of a vision system ranging from relative pose estimation to 3D-part decomposition from 2D images. For any vision system, it is crucially important to understand and to resolve visual ambiguities in 3D arising from imaging methods. This thesis has shown that leveraging prior knowledge about the structures and the symmetries of the 3D-world in neural network architectures brings about better representations for ambiguous situations. It helps solve problems which are inherently ill-posed
Advanced Methods of Power Load Forecasting
This reprint introduces advanced prediction models focused on power load forecasting. Models based on artificial intelligence and more traditional approaches are shown, demonstrating the real possibilities of use to improve prediction in this field. Models of LSTM neural networks, LSTM networks with a SESDA architecture, in even LSTM-CNN are used. On the other hand, multiple seasonal Holt-Winters models with discrete seasonality and the application of the Prophet method to demand forecasting are presented. These models are applied in different circumstances and show highly positive results. This reprint is intended for both researchers related to energy management and those related to forecasting, especially power load
Richer object representations for object class detection in challenging real world images
Object class detection in real world images has been a synonym for object localization for the longest time. State-of-the-art detection methods, inspired by renowned detection benchmarks, typically target 2D bounding box localization of objects. At the same time, due to the rapid technological and scientific advances, high-level vision applications, aiming at understanding the visual world as a whole, are coming into the focus. The diversity of the visual world challenges these applications in terms of representational complexity, robust inference and training data. As objects play a central role in any vision system, it has been argued that richer object representations, providing higher level of detail than modern detection methods, are a promising direction towards understanding visual scenes. Besides bridging the gap between object class detection and high-level tasks, richer object representations also lead to more natural object descriptions, bringing computer vision closer to human perception. Inspired by these prospects, this thesis explores four different directions towards richer object representations, namely, 3D object representations, fine-grained representations, occlusion representations, as well as understanding convnet representations. Moreover, this thesis illustrates that richer object representations can facilitate high-level applications, providing detailed and natural object descriptions. In addition, the presented representations attain high performance rates, at least on par or often superior to state-of-the-art methods.Detektion von Objektklassen in natürlichen Bildern war lange Zeit gleichbedeutend mit Lokalisierung von Objekten. Von anerkannten Detektions-Benchmarks inspirierte Detektionsmethoden, die auf dem neuesten Stand der Forschung sind, zielen üblicherweise auf die Lokalisierung von Objekten im Bild. Gleichzeitig werden durch den schnellen technologischen und wissenschaftlichen Fortschritt abstraktere Bildverarbeitungsanwendungen, die ein Verständnis der visuellen Welt als Ganzes anstreben, immer interessanter. Die Diversität der visuellen Welt ist eine Herausforderung für diese Anwendungen hinsichtlich der Komplexität der Darstellung, robuster Inferenz und Trainingsdaten. Da Objekte eine zentrale Rolle in jedem Visionssystem spielen, wurde argumentiert, dass reichhaltige Objektrepräsentationen, die höhere Detailgenauigkeit als gegenwärtige Detektionsmethoden bieten, ein vielversprechender Schritt zum Verständnis visueller Szenen sind. Reichhaltige Objektrepräsentationen schlagen eine Brücke zwischen der Detektion von Objektklassen und abstrakteren Aufgabenstellungen, und sie führen auch zu natürlicheren Objektbeschreibungen, wodurch sie die Bildverarbeitung der menschlichen Wahrnehmung weiter annähern. Aufgrund dieser Perspektiven erforscht die vorliegende Arbeit vier verschiedene Herangehensweisen zu reichhaltigeren Objektrepräsentationen
Richer object representations for object class detection in challenging real world images
Object class detection in real world images has been a synonym for object localization for the longest time. State-of-the-art detection methods, inspired by renowned detection benchmarks, typically target 2D bounding box localization of objects. At the same time, due to the rapid technological and scientific advances, high-level vision applications, aiming at understanding the visual world as a whole, are coming into the focus. The diversity of the visual world challenges these applications in terms of representational complexity, robust inference and training data. As objects play a central role in any vision system, it has been argued that richer object representations, providing higher level of detail than modern detection methods, are a promising direction towards understanding visual scenes. Besides bridging the gap between object class detection and high-level tasks, richer object representations also lead to more natural object descriptions, bringing computer vision closer to human perception. Inspired by these prospects, this thesis explores four different directions towards richer object representations, namely, 3D object representations, fine-grained representations, occlusion representations, as well as understanding convnet representations. Moreover, this thesis illustrates that richer object representations can facilitate high-level applications, providing detailed and natural object descriptions. In addition, the presented representations attain high performance rates, at least on par or often superior to state-of-the-art methods.Detektion von Objektklassen in natürlichen Bildern war lange Zeit gleichbedeutend mit Lokalisierung von Objekten. Von anerkannten Detektions-Benchmarks inspirierte Detektionsmethoden, die auf dem neuesten Stand der Forschung sind, zielen üblicherweise auf die Lokalisierung von Objekten im Bild. Gleichzeitig werden durch den schnellen technologischen und wissenschaftlichen Fortschritt abstraktere Bildverarbeitungsanwendungen, die ein Verständnis der visuellen Welt als Ganzes anstreben, immer interessanter. Die Diversität der visuellen Welt ist eine Herausforderung für diese Anwendungen hinsichtlich der Komplexität der Darstellung, robuster Inferenz und Trainingsdaten. Da Objekte eine zentrale Rolle in jedem Visionssystem spielen, wurde argumentiert, dass reichhaltige Objektrepräsentationen, die höhere Detailgenauigkeit als gegenwärtige Detektionsmethoden bieten, ein vielversprechender Schritt zum Verständnis visueller Szenen sind. Reichhaltige Objektrepräsentationen schlagen eine Brücke zwischen der Detektion von Objektklassen und abstrakteren Aufgabenstellungen, und sie führen auch zu natürlicheren Objektbeschreibungen, wodurch sie die Bildverarbeitung der menschlichen Wahrnehmung weiter annähern. Aufgrund dieser Perspektiven erforscht die vorliegende Arbeit vier verschiedene Herangehensweisen zu reichhaltigeren Objektrepräsentationen