19 research outputs found
Development of an R package to learn supervised classification techniques
This TFG aims to develop a custom R package for teaching supervised classification algorithms, starting
with the identification of requirements, including algorithms, data structures, and libraries. A strong
theoretical foundation is essential for effective package design. Documentation will explain each functionâs
purpose, accompanied by necessary paperwork.
The package will include R scripts and data files in organized directories, complemented by a user
manual for easy installation and usage, even for beginners. Built entirely from scratch without external
dependencies, itâs optimized for accuracy and performance.
In conclusion, this TFG provides a roadmap for creating an R package to teach supervised classification
algorithms, benefiting researchers and practitioners dealing with real-world challenges.Grado en IngenierĂa InformĂĄtic
Learning features for offline handwritten signature verification
Handwritten signatures are the most socially and legally accepted means for identifying a person. Over the last few decades, several researchers have approached the problem of automating their recognition, using a variety of techniques from machine learning and pattern recognition. In particular, most of the research effort has been devoted to obtaining good feature representations for signatures, by designing new feature extractors, as well as experimenting with feature extractors developed for other purposes. To this end, researchers have used insights from graphology, computer vision, signal processing, among other areas. In spite of the advancements in the field, building classifiers that can separate between genuine signatures and skilled forgeries (forgeries made targeting a particular individual) is still an open research problem.
In this thesis, we propose to address this problem from another perspective, by learning the feature representations directly from signature images. The hypothesis is that, in the absence of a good model of the data generation process, it is better to learn the features from data. As a first contribution, we propose a method to learn Writer-Independent features using a surrogate objective, followed by training Writer-Dependent classifiers using the learned features. Furthermore, we define an extension that allows leveraging the knowledge of skilled forgeries (from a subset of users) in the feature learning process. We observed that such features generalize well to new users, obtaining state-of-the-art results on four widely used datasets in the literature.
As a second contribution, we investigate three issues of signature verification systems: (i) learning a fixed-sized vector representation for signatures of varied size; (ii) analyzing the impact of the resolution of the scanned signatures in system performance and (iii) how features generalize to new operating conditions with and without fine-tuning. We propose methods to handle signatures of varied size and our experiments show results comparable to state-of-theart while removing the requirement that all input images have the same size.
As a third contribution, we propose to formulate the problem of signature verification as a meta-learning problem. This formulation also learns directly from signatures images, and allows the direct optimization of the objective (separating genuine signatures and skilled forgeries), instead of relying on surrogate objectives for learning the features. Furthermore, we show that this method is naturally extended to formulate the adaptation (training) for new users as one-class classification.
As a fourth contribution, we analyze the limitations of these systems in an Adversarial Machine Learning setting, where an active adversary attempts to disrupt the system. We characterize new threats posed by Adversarial Examples on a taxonomy of threats to biometric systems, and conduct extensive experiments to evaluate the success of attacks under different scenarios of attackerâs goals and knowledge of the system under attack. We observed that both systems that rely on handcrafted features, as well as those using learned features, are susceptible to adversarial attacks in a wide range of scenarios, including partial-knowledge scenarios where the attacker does not have full access to the trained classifiers. While some defenses proposed in the literature increase the robustness of the systems, this research highlights the scenarios where such systems are still vulnerable
Recommended from our members
Interpretable Deep Learning: Beyond Feature-Importance with Concept-based Explanations
Deep Neural Network (DNN) models are challenging to interpret because of their highly complex and non-linear nature. This lack of interpretability (1) inhibits adoption within safety critical applications, (2) makes it challenging to debug existing models, and (3) prevents us from extracting valuable knowledge. Explainable AI (XAI) research aims to increase the transparency of DNN model behaviour to improve interpretability. Feature importance explanations are the most popular interpretability approaches. They show the importance of each input feature (e.g., pixel, patch, word vector) to the modelâs prediction. However, we hypothesise that feature importance explanations have two main shortcomings concerning their inability to describe the complexity of a DNN behaviour with sufficient (1) fidelity and (2) richness. Fidelity and richness are essential because different tasks, users, and data types require specific levels of trust and understanding.
The goal of this thesis is to showcase the shortcomings of feature importance explanations and to develop explanation techniques that describe the DNN behaviour with greater richness. We design an adversarial explanation attack to highlight the infidelity and inadequacy of feature importance explanations. Our attack modifies the parameters of a pre-trained model. It uses fairness as a proxy measure for the fidelity of an explanation method to demonstrate that the apparent importance of a feature does not reveal anything reliable about the fairness of a model. Hence, regulators or auditors should not rely on feature importance explanations to measure or enforce standards of fairness.
As one solution, we formulate five different levels of the semantic richness of explanations to evaluate explanations and propose two function decomposition frameworks (DGINN and CME) to extract explanations from DNNs at a semantically higher level than feature importance explanations. Concept-based approaches provide explanations in terms of atomic human-understandable units (e.g., wheel or door) rather than individual raw features (e.g., pixels or characters). Our function decomposition frameworks can extract specific class representations from 5% of the network parameters and concept representations with an average-per-concept F1 score of 86%. Finally, the CME framework makes it possible to compare concept-based explanations, contributing to the scientific rigour of evaluating interpretability methods.The author would like to appreciate the generous sponsorship of the Engineering and Physical Sciences Research Council (EPSRC), The Department of Computer Science and Technology at the University of Cambridge, and Tenyks, Inc
Energy Data Analytics for Smart Meter Data
The principal advantage of smart electricity meters is their ability to transfer digitized electricity consumption data to remote processing systems. The data collected by these devices make the realization of many novel use cases possible, providing benefits to electricity providers and customers alike. This book includes 14 research articles that explore and exploit the information content of smart meter data, and provides insights into the realization of new digital solutions and services that support the transition towards a sustainable energy system. This volume has been edited by Andreas Reinhardt, head of the Energy Informatics research group at Technische UniversitÀt Clausthal, Germany, and Lucas Pereira, research fellow at Técnico Lisboa, Portugal
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can
accept image and text inputs and produce text outputs. While less capable than
humans in many real-world scenarios, GPT-4 exhibits human-level performance on
various professional and academic benchmarks, including passing a simulated bar
exam with a score around the top 10% of test takers. GPT-4 is a
Transformer-based model pre-trained to predict the next token in a document.
The post-training alignment process results in improved performance on measures
of factuality and adherence to desired behavior. A core component of this
project was developing infrastructure and optimization methods that behave
predictably across a wide range of scales. This allowed us to accurately
predict some aspects of GPT-4's performance based on models trained with no
more than 1/1,000th the compute of GPT-4.Comment: 100 page
Cheminformatics and artificial intelligence for accelerating agrochemical discovery
The global cost-benefit analysis of pesticide use during the last 30 years has been characterized by a significant increase during the period from 1990 to 2007 followed by a decline. This observation can be attributed to several factors including, but not limited to, pest resistance, lack of novelty with respect to modes of action or classes of chemistry, and regulatory action. Due to current and projected increases of the global population, it is evident that the demand for food, and consequently, the usage of pesticides to improve yields will increase. Addressing these challenges and needs while promoting new crop protection agents through an increasingly stringent regulatory landscape requires the development and integration of infrastructures for innovative, cost- and time-effective discovery and development of novel and sustainable molecules. Significant advances in artificial intelligence (AI) and cheminformatics over the last two decades have improved the decision-making power of research scientists in the discovery of bioactive molecules. AI- and cheminformatics-driven molecule discovery offers the opportunity of moving experiments from the greenhouse to a virtual environment where thousands to billions of molecules can be investigated at a rapid pace, providing unbiased hypothesis for lead generation, optimization, and effective suggestions for compound synthesis and testing. To date, this is illustrated to a far lesser extent in the publicly available agrochemical research literature compared to drug discovery. In this review, we provide an overview of the crop protection discovery pipeline and how traditional, cheminformatics, and AI technologies can help to address the needs and challenges of agrochemical discovery towards rapidly developing novel and more sustainable products
Proceedings of the 19th Sound and Music Computing Conference
Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Ătienne (France).
https://smc22.grame.f