In recent years, terms such as artificial intelligence, machine learning, deep learning, computer vision, and many others have become increasingly prevalent in daily life.
These technologies, which can be defined as “learning methodologies” characterized by algorithmic systems capable of extracting information (knowledge) from data, were initially confined to laboratories and research centers.
However, thanks to the growing interest in such methodologies, they have gradually permeated corporate environments and our daily lives, becoming essential components in mobile phones, robots, drones, and IoT systems.
Generally speaking, we can refer to this category of systems as embedded devices.
The spread of such learning methodologies, powered by neural networks (models) combined with specialized hardware such as data centers and clusters of graphic processing units, has led to the development of increasingly powerful and computationally demanding solutions.
This concept may be generally expressed as “the bigger the model, the better the performance”.
Parallel to this progress aimed at maximizing the model's performance, researchers and scientists all around the world have developed innovative solutions and operations (layers) to improve the learning capability of such architectures.
Motivated by the proliferation of embedded devices, characterized by limited computational resources and the need to perform on-board/on-device operations to maintain data privacy, and ensure accurate responses in limited time-frames, this Thesis aims to investigate both mathematically and practically, less-explored research areas related to the efficiency of such models for computer vision tasks.
More in detail, we will theoretically analyze and investigate the behavior of fundamental components for neural network learning mechanisms, with a focus on specific layers and elements that characterize the learning procedure, such as self-attention, knowledge distillation, and optimizers.
These features, which are essential for both the structure and the learning phase of neural networks, will be crucial in the subsequent stages of this Thesis.
More in detail, we will develop computationally efficient solutions in the fields of perception and security, i.e., studying efficient techniques in well-known tasks like monocular depth estimation, 3D mesh reconstruction, and deepfake detection.
Additionally, we will look into key elements of neural network efficiency, such as inference time, energy consumption, and their trade-off with estimation performances.
Precisely, in contrast to heavy deep learning models, the underlying idea of this Thesis is to develop methodologies that are not only able to “learn” a given task but also “smartly learn” it, i.e., solutions capable of learning the desired task while ensuring good performance with limited inference and training times that can be practically deployed on embedded devices.
Moreover, along with these studies, which will be defined as primary in the rest of the manuscript, and to provide an exhaustive perspective of some analyzed tasks, we will also investigate side challenges that emerged in primary researches; such studies will be identified as secondary throughout the manuscript.
In conclusion, the purpose of this Thesis is to examine less-explored research areas related to the efficiency of neural network architectures and their applications, with the goal of providing an in-depth view of some open issues and proposing potential solutions, as well as providing the reader valuable hints for further pushing the boundaries of such research fields
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.