Search CORE

127 research outputs found

Speech technologies for the audiovisual and multimedia interaction environments

Author: Alvarez Muniain Aitor
Publication venue
Publication date: 22/07/2016
Field of study

361 p

Archivo Digital para la Docencia y la Investigación

Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

Author: Arronte Alvarez Aitor
Gómez Francisco
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 29/04/2022
Field of study

Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns

Re-UNIR

Crop conditional Convolutional Neural Networks for massive multi-crop plant disease classification over cell phone acquired images taken on real field conditions

Author: Alvarez-Gila Aitor
Echazarra Jone
Mohnke Patrick
Ortiz-Barredo Amaia
Picon Artzai
Seitz Maximiliam
Publication venue: 'Elsevier BV'
Publication date: 01/12/2019
Field of study

Convolutional Neural Networks (CNN) have demonstrated their capabilities on the agronomical field, especially for plant visual symptoms assessment. As these models grow both in the number of training images and in the number of supported crops and diseases, there exist the dichotomy of (1) generating smaller models for specific crop or, (2) to generate a unique multi-crop model in a much more complex task (especially at early disease stages) but with the benefit of the entire multiple crop image dataset variability to enrich image feature description learning. In this work we first introduce a challenging dataset of more than one hundred-thousand images taken by cell phone in real field wild conditions. This dataset contains almost equally distributed disease stages of seventeen diseases and five crops (wheat, barley, corn, rice and rape-seed) where several diseases can be present on the same picture. When applying existing state of the art deep neural network methods to validate the two hypothesised approaches, we obtained a balanced accuracy (BAC=0.92) when generating the smaller crop specific models and a balanced accuracy (BAC=0.93) when generating a single multi-crop model. In this work, we propose three different CNN architectures that incorporate contextual non-image meta-data such as crop information onto an image based Convolutional Neural Network. This combines the advantages of simultaneously learning from the entire multi-crop dataset while reducing the complexity of the disease classification tasks. The crop-conditional plant disease classification network that incorporates the contextual information by concatenation at the embedding vector level obtains a balanced accuracy of 0.98 improving all previous methods and removing 71% of the miss-classifications of the former methods

TECNALIA Publications

Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB

Author: Alvarez-Gila Aitor
Garrote Estibaliz
Van de Weijer Joost
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2017
Field of study

Hyperspectral signal reconstruction aims at recovering the original spectral input that produced a certain trichromatic (RGB) response from a capturing device or observer. Given the heavily underconstrained, non-linear nature of the problem, traditional techniques leverage different statistical properties of the spectral signal in order to build informative priors from real world object reflectances for constructing such RGB to spectral signal mapping. However, most of them treat each sample independently, and thus do not benefit from the contextual information that the spatial dimensions can provide. We pose hyperspectral natural image reconstruction as an image to image mapping learning problem, and apply a conditional generative adversarial framework to help capture spatial semantics. This is the first time Convolutional Neural Networks -and, particularly, Generative Adversarial Networks- are used to solve this task. Quantitative evaluation shows a Root Mean Squared Error (RMSE) drop of 44.7% and a Relative RMSE drop of 47.0% on the ICVL natural hyperspectral image dataset

arXiv.org e-Print Archive

Crossref

TECNALIA Publications

Rhetorical Pattern Finding

Author: Arronte Alvarez Aitor
Gómez Francisco
Padilla Victor
Tizón Manuel
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 15/03/2023
Field of study

In this paper, we research rhetorical patterns from a musicological and computational standpoint. First, a theoretical examination of what constitutes a rhetorical pattern is conducted. Out of that examination, which includes primary sources and the study of the main composers, a formal definition of rhetorical patterns is proposed. Among the rhetorical figures, a set of imitative rhetorical figures is selected for our study, namely, epizeuxis, palilogy, synonymia, and polyptoton. Next, we design a computational model of the selected rhetorical patterns to automatically find those patterns in a corpus consisting of masses by Renaissance composer Tomás Luis de Victoria. In order to have a ground truth with which to test out our model, a group of experts manually annotated the rhetorical patterns. To deal with the problem of reaching a consensus on the annotations, a four-round Delphi method was followed by the annotators. The rhetorical patterns found by the annotators and by the algorithm are compared and their differences discussed. The algorithm reports almost all the patterns annotated by the experts and some additional patterns. The algorithm reports almost all the patterns annotated by the experts (recall: 98.11%) and some additional patterns (precision: 71.73%). These patterns correspond to rhetorical patterns within other rhetorical patterns, which were overlooked by the annotators on the basis of their contextual knowledge. These results pose issues as to how to integrate that contextual knowledge into the computational model

Re-UNIR

A Probabilistic Model and Capturing Device for Remote Simultaneous Estimation of Spectral Emissivity and Temperature of Hot Emissive Materials

Author: Alvarez-Gila Aitor
Arteche Jose Antonio
Lopez Gabriel A.
Picon Artzai
Vicente Asier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Estimating the temperature of hot emissive samples (e.g. liquid slag) in the context of harsh industrial environments such as steelmaking plants is a crucial yet challenging task, which is typically addressed by means of methods that require physical contact. Current remote methods require information on the emissivity of the sample. However, the spectral emissivity is dependent on the sample composition and temperature itself, and it is hardly measurable unless under controlled laboratory procedures. In this work, we present a portable device and associated probabilistic model that can simultaneously produce quasi real-time estimates for temperature and spectral emissivity of hot samples in the [0.2, 12.0μm ] range at distances of up to 20m . The model is robust against variable atmospheric conditions, and the device is presented together with a quick calibration procedure that allows for in field deployment in rough industrial environments, thus enabling in line measurements. We validate the temperature and emissivity estimates by our device against laboratory equipment under controlled conditions in the [550, 850∘C ] temperature range for two solid samples with well characterized spectral emissivity’s: alumina ( α−Al2O3 ) and hexagonal boron nitride ( h−BN ). The analysis of the results yields Root Mean Squared Errors of 32.3∘C and 5.7∘C respectively, and well correlated spectral emissivity’s.This work was supported in part by the Basque Government (Hazitek AURRERA B: Advanced and Useful REdesign of CSP process for new steel gRAdes) under Grant ZE-2017/00009

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

TECNALIA Publications