Search CORE

82 research outputs found

A NOVEL ARCHITECTURE FOR PARKING MANAGEMENT IN SMART CITIES

Author: GIUFFRE' TULLIO
SINISCALCHI SABATO MARCO
TESORIERE GIOVANNI
Publication venue
Publication date: 03/10/2012
Field of study

Parking is becoming an expensive resource in almost any major city in the world. Current technically advanced solutions for parking management are concerned with the application of secured wireless network and sensor communication for parking reservation. Moreover new rules concerning financial transactions in mobile payment allow the definition of new intelligent frameworks that enable a convenient management of public parking in urban area. The paper discusses the conceptual architecture of IPA (Intelligent Parking Assistant) which aims at overcoming current parking management solutions and thereby becoming a leading paradigm for the so called "smart cities"

Archivio istituzionale della ricerca - Università di Palermo

A Novel Architecture of Parking Management for Smart Cities

Author: Giuffrè Tullio
Siniscalchi Sabato Marco
Tesoriere Giovanni
Publication venue: Published by Elsevier Ltd.
Publication date: 03/10/2012
Field of study

AbstractParking is becoming an expensive resource in almost any major city in the world. Current technically advanced solutions for parking management are concerned with the application of secured wireless network and sensor communication for parking reservation. Moreover new rules concerning financial transactions in mobile payment allow the definition of new intelligent frameworks that enable a convenient management of public parking in urban area. The paper discusses the conceptual architecture of IPA (Intelligent Parking Assistant) which aims at overcoming current parking management solutions and thereby becoming a leading paradigm for the so called “smart cities”

Elsevier - Publisher Connector

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

Author: Lee Chin-Hui
Siniscalchi Sabato Marco
Yen Hao
Publication venue
Publication date: 15/09/2023
Field of study

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme mapping matrices are constructed based on the predefined set of universal attribute inventory, which projects the knowledge-rich articulatory attribute logits, into output phoneme logits. The mapping puts knowledge-based constraints to limit inconsistency with acoustic-phonetic evidence in the integrated prediction. Combined with phoneme recognition, our phone recognizer is able to infer from both attribute and phoneme information. The proposed joint multilingual model is evaluated through phoneme recognition. In multilingual experiments over 6 languages on benchmark datasets LibriSpeech and CommonVoice, we find that our proposed solution outperforms conventional multilingual approaches with a relative improvement of 6.85% on average, and it also demonstrates a much better performance compared to monolingual model. Further analysis conclusively demonstrates that the proposed solution eliminates phoneme predictions that are inconsistent with attributes

arXiv.org e-Print Archive

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

Author: Adiban Mohammad
Salvi Giampiero
Siniscalchi Sabato Marco
Stefanov Kalin
Publication venue
Publication date: 13/07/2023
Field of study

We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.Comment: 14 pages, 7 figures, 3 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence on 2023-07-1

arXiv.org e-Print Archive

Una estrategia de procesamiento automático del habla basada en la detección de atributos

Author: Lee Chin-Hui
Siniscalchi Sabato Marco
Publication venue: 'Editorial CSIC'
Publication date: 30/06/2014
Field of study

State-of-the-art automatic speech and speaker recognition systems are often built with a pattern matching framework that has proven to achieve low recognition error rates for a variety of resource-rich tasks when the volume of speech and text examples to build statistical acoustic and language models is plentiful, and the speaker, acoustics and language conditions follow a rigid protocol. However, because of the “blackbox” top-down knowledge integration approach, such systems cannot easily leverage a rich set of knowledge sources already available in the literature on speech, acoustics and languages. In this paper, we present a bottom-up approach to knowledge integration, called automatic speech attribute transcription (ASAT), which is intended to be “knowledge-rich”, so that new and existing knowledge sources can be verified and integrated into current spoken language systems to improve recognition accuracy and system robustness. Since the ASAT framework offers a “divide-and-conquer” strategy and a “plug-andplay” game plan, it will facilitate a cooperative speech processing community that every researcher can contribute to, with a view to improving speech processing capabilities which are currently not easily accessible to researchers in the speech science community.Los sistemas más novedosos de reconocimiento automático de habla y de locutor suelen basarse en un sistema de coincidencia de patrones. Gracias a este modo de trabajo, se han obtenido unos bajos índices de error de reconocimiento para una variedad de tareas ricas en recursos, cuando se aporta una cantidad abundante de ejemplos de habla y texto para el entrenamiento estadístico de los modelos acústicos y de lenguaje, y siempre que el locutor y las condiciones acústicas y lingüísticas sigan un protocolo estricto. Sin embargo, debido a su aplicación de un proceso ciego de integración del conocimiento de arriba a abajo, dichos sistemas no pueden aprovechar fácilmente toda una serie de conocimientos ya disponibles en la literatura sobre el habla, la acústica y las lenguas. En este artículo presentamos una aproximación de abajo a arriba a la integración del conocimiento, llamada transcripción automática de atributos del habla (conocida en inglés como automatic speech attribute transcription, ASAT). Dicho enfoque pretende ser “rico en conocimiento”, con el fin de poder verificar las fuentes de conocimiento, tanto nuevas como ya existentes, e integrarlas en los actuales sistemas de lengua hablada para mejorar la precisión del reconocimiento y la robustez del sistema. Dado que ASAT ofrece una estrategia de tipo “divide y vencerás” y un plan de juego de “instalación y uso inmediato” (en inglés, plugand-play), esto facilitará una comunidad cooperativa de procesamiento del habla a la que todo investigador pueda contribuir con vistas a mejorar la capacidad de procesamiento del habla, que en la actualidad no es fácilmente accesible a los investigadores de la comunidad de las ciencias del habla

Crossref

Loquens (E-Journal)

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

Author: Chen I. F.
Huang Z.
L.e.e. .C. H.
Li L.
SINISCALCHI SABATO MARCO
Wu J.
Publication venue
Publication date: 01/01/2015
Field of study

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance. Given an abundance of DNN parameters but with only a limited amount of data, the effectiveness of the adapted DNN model can often be compromised. We formulate maximum a posteriori (MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an augmented linear hidden networks connected to the output tied states, or senones, and compare it to feature space MAP linear regression previously proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street Journal task demonstrate the feasibility of the proposed framework. In supervised adaptation, the proposed MAP adaptation approach provides more than 10% relative error reduction and consistently outperforms the conventional transformation based methods. Furthermore, we present an initial attempt to generate hierarchical priors to improve adaptation efficiency and effectiveness with limited adaptation data by exploiting similarities among senones

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

Author: Hu Hu
Lee Chin-Hui
Siniscalchi Sabato Marco
Wang Yannan
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data often required in conventional teacher student learning. The proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental results based on AlexNet-L deep classification models confirm the effectiveness of our proposed approach for mismatch situations. NLE-alone adaptation compares favourably with the conventional device adaptation and teacher student based adaptation techniques. NLE with RTSL further improves the classification accuracy.Comment: Accepted by Interspeech 202

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

Author: Chin-Hui Lee
Jun Du
Jun Qi
Sabato Marco Siniscalchi
Xiaoli Ma
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/01/2020
Field of study

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression. The goal of this work is two-fold: (i) presenting performance bounds of MAE, and (ii) demonstrating new properties of MAE that make it more appropriate than mean squared error (MSE) as a loss function for DNN based vector-to-vector regression. First, we show that a generalized upper-bound for DNN-based vector-to-vector regression can be ensured by leveraging the known Lipschitz continuity property of MAE. Next, we derive a new generalized upper bound in the presence of additive noise. Finally, in contrast to conventional MSE commonly adopted to approximate Gaussian errors for regression, we show that MAE can be interpreted as an error modeled by Laplacian distribution. Speech enhancement experiments are conducted to corroborate our proposed theorems and validate the performance advantages of MAE over MSE for DNN based regression

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo