3,909 research outputs found
Machine learning in solar physics
The application of machine learning in solar physics has the potential to
greatly enhance our understanding of the complex processes that take place in
the atmosphere of the Sun. By using techniques such as deep learning, we are
now in the position to analyze large amounts of data from solar observations
and identify patterns and trends that may not have been apparent using
traditional methods. This can help us improve our understanding of explosive
events like solar flares, which can have a strong effect on the Earth
environment. Predicting hazardous events on Earth becomes crucial for our
technological society. Machine learning can also improve our understanding of
the inner workings of the sun itself by allowing us to go deeper into the data
and to propose more complex models to explain them. Additionally, the use of
machine learning can help to automate the analysis of solar data, reducing the
need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a
Living Review in Solar Physics (LRSP
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
We performed an experimental review of current diarization systems for the
conversational telephone speech (CTS) domain. In detail, we considered a total
of eight different algorithms belonging to clustering-based, end-to-end neural
diarization (EEND), and speech separation guided diarization (SSGD) paradigms.
We studied the inference-time computational requirements and diarization
accuracy on four CTS datasets with different characteristics and languages. We
found that, among all methods considered, EEND-vector clustering (EEND-VC)
offers the best trade-off in terms of computing requirements and performance.
More in general, EEND models have been found to be lighter and faster in
inference compared to clustering-based methods. However, they also require a
large amount of diarization-oriented annotated data. In particular EEND-VC
performance in our experiments degraded when the dataset size was reduced,
whereas self-attentive EEND (SA-EEND) was less affected. We also found that
SA-EEND gives less consistent results among all the datasets compared to
EEND-VC, with its performance degrading on long conversations with high speech
sparsity. Clustering-based diarization systems, and in particular VBx, instead
have more consistent performance compared to SA-EEND but are outperformed by
EEND-VC. The gap with respect to this latter is reduced when overlap-aware
clustering methods are considered. SSGD is the most computationally demanding
method, but it could be convenient if speech recognition has to be performed.
Its performance is close to SA-EEND but degrades significantly when the
training and inference data characteristics are less matched.Comment: 52 pages, 10 figure
Oscillatory mechanisms of conscious perception and attention
Although the prominent role of neural oscillations in perception and cognition has been continuously investigated, some critical questions remain unanswered. My PhD thesis was aimed at addressing some of them.
First, can we dissociate oscillatory underpinnings of perceptual accuracy and subjective awareness? Current work would strongly suggest that this dissociation can be drawn. While the fluctuations in alpha-amplitude decide perceptual bias and metacognitive abilities, the speed of alpha activity (i.e., alpha-frequency) dictates sensory sampling, shaping perceptual accuracy.
Second, how are these oscillatory mechanisms integrated during attention? The obtained results indicate that a top-down visuospatial mechanism modulates neural assemblies in visual areas via oscillatory re-alignment and coherence in the alpha/beta range within the fronto-parietal brain network. These perceptual predictions are reflected in the retinotopically distributed posterior alpha-amplitude, while perceptual accuracy is explained by the higher alpha-frequency at the to-be-attended location. Finally, sensory input, elaborated via fast gamma oscillations, is linked to specific phases of this slower activity via oscillatory nesting, enabling integration of the feedback-modulated oscillatory activity with sensory information.
Third, how can we relate this oscillatory activity to other neural markers of behaviour (i.e., event-related potentials)? The obtained results favour the oscillatory model of ERP genesis, where alpha-frequency shapes the latency of early evoked-potentials, namely P1, with both neural indices being related to perceptual accuracy. On the other hand, alpha-amplitude dictates the amplitude of later P3 evoked-response, whereas both indices shape subjective awareness.
Crucially, by combining different methodological approaches, including neurostimulation (TMS) and neuroimaging (EEG), current work identified these oscillatory-behavior links as causal and not just as co-occurring events. Current work aimed at ameliorating the use of the TMS-EEG approach by explaining inter-individual differences in the stimulation outcomes, which could be proven crucial in the way we design entrainment experiments and interpret the results in both research and clinical settings
Automating Fault Detection and Quality Control in PCBs: A Machine Learning Approach to Handle Imbalanced Data
Printed Circuit Boards (PCBs) are fundamental to the operation of a wide array of electronic devices, from consumer electronics to sophisticated industrial machinery. Given this pivotal role, quality control and fault detection are especially significant, as they are essential for ensuring the devices' long-term reliability and efficiency. To address this, the thesis explores advancements in fault detection and quality control methods for PCBs, with a focus on Machine Learning (ML) and Deep Learning (DL) techniques. The study begins with an in-depth review of traditional approaches like visual and X-ray inspections, then delves into modern, data-driven methods, such as automated anomaly detection in PCB manufacturing using tabular datasets. The core of the thesis is divided into three specific tasks: firstly, applying ML and DL models for anomaly detection in PCBs, particularly focusing on solder-pasting issues and the challenges posed by imbalanced datasets; secondly, predicting human inspection labels through specially designed tabular models like TabNet; and thirdly, implementing multi-classification methods to automate repair labeling on PCBs. The study is structured to offer a comprehensive view, beginning with background information, followed by the methodology and results of each task, and concluding with a summary and directions for future research. Through this systematic approach, the research not only provides new insights into the capabilities and limitations of existing fault detection techniques but also sets the stage for more intelligent and efficient systems in PCB manufacturing and quality control
Machine learning for the sustainable energy transition: a data-driven perspective along the value chain from manufacturing to energy conversion
According to the special report Global Warming of 1.5 °C of the IPCC, climate action is not only necessary but more than ever urgent. The world is witnessing rising sea levels, heat waves, events of flooding, droughts, and desertification resulting in the loss of lives and damage to livelihoods, especially in countries of the Global South. To mitigate climate change and commit to the Paris agreement, it is of the uttermost importance to reduce greenhouse gas emissions coming from the most emitting sector, namely the energy sector. To this end, large-scale penetration of renewable energy systems into the energy market is crucial for the energy transition toward a sustainable future by replacing fossil fuels and improving access to energy with socio-economic benefits. With the advent of Industry 4.0, Internet of Things technologies have been increasingly applied to the energy sector introducing the concept of smart grid or, more in general, Internet of Energy. These paradigms are steering the energy sector towards more efficient, reliable, flexible, resilient, safe, and sustainable solutions with huge environmental and social potential benefits. To realize these concepts, new information technologies are required, and among the most promising possibilities are Artificial Intelligence and Machine Learning which in many countries have already revolutionized the energy industry. This thesis presents different Machine Learning algorithms and methods for the implementation of new strategies to make renewable energy systems more efficient and reliable. It presents various learning algorithms, highlighting their advantages and limits, and evaluating their application for different tasks in the energy context. In addition, different techniques are presented for the preprocessing and cleaning of time series, nowadays collected by sensor networks mounted on every renewable energy system. With the possibility to install large numbers of sensors that collect vast amounts of time series, it is vital to detect and remove irrelevant, redundant, or noisy features, and alleviate the curse of dimensionality, thus improving the interpretability of predictive models, speeding up their learning process, and enhancing their generalization properties. Therefore, this thesis discussed the importance of dimensionality reduction in sensor networks mounted on renewable energy systems and, to this end, presents two novel unsupervised algorithms. The first approach maps time series in the network domain through visibility graphs and uses a community detection algorithm to identify clusters of similar time series and select representative parameters. This method can group both homogeneous and heterogeneous physical parameters, even when related to different functional areas of a system. The second approach proposes the Combined Predictive Power Score, a method for feature selection with a multivariate formulation that explores multiple sub-sets of expanding variables and identifies the combination of features with the highest predictive power over specified target variables. This method proposes a selection algorithm for the optimal combination of variables that converges to the smallest set of predictors with the highest predictive power. Once the combination of variables is identified, the most relevant parameters in a sensor network can be selected to perform dimensionality reduction. Data-driven methods open the possibility to support strategic decision-making, resulting in a reduction of Operation & Maintenance costs, machine faults, repair stops, and spare parts inventory size. Therefore, this thesis presents two approaches in the context of predictive maintenance to improve the lifetime and efficiency of the equipment, based on anomaly detection algorithms. The first approach proposes an anomaly detection model based on Principal Component Analysis that is robust to false alarms, can isolate anomalous conditions, and can anticipate equipment failures. The second approach has at its core a neural architecture, namely a Graph Convolutional Autoencoder, which models the sensor network as a dynamical functional graph by simultaneously considering the information content of individual sensor measurements (graph node features) and the nonlinear correlations existing between all pairs of sensors (graph edges). The proposed neural architecture can capture hidden anomalies even when the turbine continues to deliver the power requested by the grid and can anticipate equipment failures. Since the model is unsupervised and completely data-driven, this approach can be applied to any wind turbine equipped with a SCADA system. When it comes to renewable energies, the unschedulable uncertainty due to their intermittent nature represents an obstacle to the reliability and stability of energy grids, especially when dealing with large-scale integration. Nevertheless, these challenges can be alleviated if the natural sources or the power output of renewable energy systems can be forecasted accurately, allowing power system operators to plan optimal power management strategies to balance the dispatch between intermittent power generations and the load demand. To this end, this thesis proposes a multi-modal spatio-temporal neural network for multi-horizon wind power forecasting. In particular, the model combines high-resolution Numerical Weather Prediction forecast maps with turbine-level SCADA data and explores how meteorological variables on different spatial scales together with the turbines' internal operating conditions impact wind power forecasts. The world is undergoing a third energy transition with the main goal to tackle global climate change through decarbonization of the energy supply and consumption patterns. This is not only possible thanks to global cooperation and agreements between parties, power generation systems advancements, and Internet of Things and Artificial Intelligence technologies but also necessary to prevent the severe and irreversible consequences of climate change that are threatening life on the planet as we know it. This thesis is intended as a reference for researchers that want to contribute to the sustainable energy transition and are approaching the field of Artificial Intelligence in the context of renewable energy systems
Leveraging Optical Communication Fiber and AI for Distributed Water Pipe Leak Detection
Detecting leaks in water networks is a costly challenge. This article
introduces a practical solution: the integration of optical network with water
networks for efficient leak detection. Our approach uses a fiber-optic cable to
measure vibrations, enabling accurate leak identification and localization by
an intelligent algorithm. We also propose a method to access leak severity for
prioritized repairs. Our solution detects even small leaks with flow rates as
low as 0.027 L/s. It offers a cost-effective way to improve leak detection,
enhance water management, and increase operational efficiency.Comment: Accepte
Adaptive swarm optimisation assisted surrogate model for pipeline leak detection and characterisation.
Pipelines are often subject to leakage due to ageing, corrosion and weld defects. It is difficult to avoid pipeline leakage as the sources of leaks are diverse. Various pipeline leakage detection methods, including fibre optic, pressure point analysis and numerical modelling, have been proposed during the last decades. One major issue of these methods is distinguishing the leak signal without giving false alarms. Considering that the data obtained by these traditional methods are digital in nature, the machine learning model has been adopted to improve the accuracy of pipeline leakage detection. However, most of these methods rely on a large training dataset for accurate training models. It is difficult to obtain experimental data for accurate model training. Some of the reasons include the huge cost of an experimental setup for data collection to cover all possible scenarios, poor accessibility to the remote pipeline, and labour-intensive experiments. Moreover, datasets constructed from data acquired in laboratory or field tests are usually imbalanced, as leakage data samples are generated from artificial leaks. Computational fluid dynamics (CFD) offers the benefits of providing detailed and accurate pipeline leakage modelling, which may be difficult to obtain experimentally or with the aid of analytical approach. However, CFD simulation is typically time-consuming and computationally expensive, limiting its pertinence in real-time applications. In order to alleviate the high computational cost of CFD modelling, this study proposed a novel data sampling optimisation algorithm, called Adaptive Particle Swarm Optimisation Assisted Surrogate Model (PSOASM), to systematically select simulation scenarios for simulation in an adaptive and optimised manner. The algorithm was designed to place a new sample in a poorly sampled region or regions in parameter space of parametrised leakage scenarios, which the uniform sampling methods may easily miss. This was achieved using two criteria: population density of the training dataset and model prediction fitness value. The model prediction fitness value was used to enhance the global exploration capability of the surrogate model, while the population density of training data samples is beneficial to the local accuracy of the surrogate model. The proposed PSOASM was compared with four conventional sequential sampling approaches and tested on six commonly used benchmark functions in the literature. Different machine learning algorithms are explored with the developed model. The effect of the initial sample size on surrogate model performance was evaluated. Next, pipeline leakage detection analysis - with much emphasis on a multiphase flow system - was investigated in order to find the flow field parameters that provide pertinent indicators in pipeline leakage detection and characterisation. Plausible leak scenarios which may occur in the field were performed for the gas-liquid pipeline using a three-dimensional RANS CFD model. The perturbation of the pertinent flow field indicators for different leak scenarios is reported, which is expected to help in improving the understanding of multiphase flow behaviour induced by leaks. The results of the simulations were validated against the latest experimental and numerical data reported in the literature. The proposed surrogate model was later applied to pipeline leak detection and characterisation. The CFD modelling results showed that fluid flow parameters are pertinent indicators in pipeline leak detection. It was observed that upstream pipeline pressure could serve as a critical indicator for detecting leakage, even if the leak size is small. In contrast, the downstream flow rate is a dominant leakage indicator if the flow rate monitoring is chosen for leak detection. The results also reveal that when two leaks of different sizes co-occur in a single pipe, detecting the small leak becomes difficult if its size is below 25% of the large leak size. However, in the event of a double leak with equal dimensions, the leak closer to the pipe upstream is easier to detect. The results from all the analyses demonstrate the PSOASM algorithm's superiority over the well-known sequential sampling schemes employed for evaluation. The test results show that the PSOASM algorithm can be applied for pipeline leak detection with limited training datasets and provides a general framework for improving computational efficiency using adaptive surrogate modelling in various real-life applications
iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation
In response to the increasing interest in human--machine communication across
various domains, this paper introduces a novel approach called iPhonMatchNet,
which addresses the challenge of barge-in scenarios, wherein user speech
overlaps with device playback audio, thereby creating a self-referencing
problem. The proposed model leverages implicit acoustic echo cancellation
(iAEC) techniques to increase the efficiency of user-defined keyword spotting
models, achieving a remarkable 95% reduction in mean absolute error with a
minimal increase in model size (0.13%) compared to the baseline model,
PhonMatchNet. We also present an efficient model structure and demonstrate its
capability to learn iAEC functionality without requiring a clean signal. The
findings of our study indicate that the proposed model achieves competitive
performance in real-world deployment conditions of smart devices.Comment: Submitted to ICASSP 202
Speech recognition systems and russian pronunciation variation in the context of VoiceInteraction
The present thesis aims to describe the work performed during the internship for the master’s degree in Linguistics at VoiceInteraction, an international Artificial Intelligence (AI) company, specializing in developing speech processing technologies. The goal of the internship was to study phonetic characteristics of the Russian language, attending to four main tasks: description of the phonetic-phonological inventory; validation of transcriptions of broadcast news; validation of a previously created lexicon composed by ten thousand (10 000) most frequently observed words in a text corpus crawled from Russian reference newspapers websites; and integration of filled pauses into the Automatic Speech Recognizer (ASR).
Initially, a collection of audio and text broadcast news media from Russian-speaking regions, European Russian, Belarus, and the Caucasus Region, featuring different varieties of Russian was conducted. The extracted data and the company's existing data were used to train the acoustic, pronunciation, and language models. The audio data was automatically processed in a proprietary platform and then revised by human annotators. Transcriptions produced automatically and reviewed by annotators were analyzed, and the most common errors were extracted to provide feedback to the community of annotators.
The validation of transcriptions, along with the annotation of all of the disfluencies (that previously were left out), resulted in the decrease of Word Error Rate (WER) in most cases. In some cases (in European Russian transcriptions), WER increased, the models were not sufficiently effective to identify the correct words, potentially problematic. Also, audio with overlapped speech, disfluencies, and acoustic events can impact the WER. Since we used the model that was only trained with European Russian to recognize other varieties of Russian language, it resulted in high WER for Belarus and the Caucasus region.
The characterization of the Russian phonetic-phonological inventory and the construction of pronunciation rules for internal and external sandhi phenomena were performed for the validation of the lexicon – ten thousand of the most frequently observed words in a text corpus crawled from Russian reference newspapers websites, were revised and modified for the extraction of linguistic patterns to be used in a statistical Grapheme-to-phone (G2P) model.
Two evaluations were conducted: before the modifications to the lexicon and after. Preliminary results without training the model show no significant results - 19.85% WER before the modifications, and 19.97% WER after, with a difference of 0.12%. However, we observed a slight improvement of the most frequent words. In the future, we aim to extend the analysis of the lexicon to the 400 000 entries (total lexicon size), analyze the type of errors that are produced, decrease the word error rate (WER), and analyze acoustic models, as well.
In this work, we also studied filled pauses, since we believe that research on filled pauses for the Russian language can improve the recognition system of VoiceInteraction, by reducing the processing time and increasing the quality. These are marked in the transcriptions with “%”. In Russian, according to the literature (Ten, 2015; Harlamova, 2008; Bogradonova-Belgarian & Baeva, 2018), these are %a [a], %am [am], %@ [ə], %@m [əm], %e [e], %ɨ [ɨ], %m [m], and %n [n]. In the speech data, two more filled pauses were found, namely, %na [na] and %mna [mna], as far as we know, not yet referenced in the literature.
Finally, the work performed during an internship contributed to a European project - Artificial Intelligence and Advanced Data Analysis for Authority Agencies (AIDA). The main goal of the present project is to build a solution capable of automating the processing of large amounts of data that Law Enforcement Agencies (LEAs) have to analyze in the investigations of Terrorism and Cybercrime, using pioneering machine learning and artificial intelligence methods. VoiceInteraction's main contribution to the project was to apply ASR and validate the transcriptions of the Russian (religious-related content). In order to do so, all the tasks performed during the thesis were very relevant and applied in the scope of the AIDA project.
Transcription analysis results from the AIDA project showed a high Out-of-Vocabulary (OOV) rate and high substitution (SUBS) rate. Since the language model used in this project was adapted for broadcast content, the religious-related words were left out. Also, function words were incorrectly recognized, in most cases, due to coarticulation with the previous or the following word.A presente tese descreve o trabalho que foi realizado no âmbito de um estágio em linguística computacional na VoiceInteraction, uma empresa de tecnologias de processamento de fala. Desde o início da sua atividade, a empresa tem-se dedicado ao desenvolvimento de tecnologia própria em várias áreas do processamento computacional da fala, entre elas, síntese de fala, processamento de língua natural e reconhecimento automático de fala, representando esta última a principal área de negócio da empresa. A tecnologia de reconhecimento de automático de fala da VoiceInteraction explora a utilização de modelos híbridos em combinação com as redes neuronais (DNN - Deep Neural Networks), que, segundo Lüscher et al. (2019), apresenta um melhor desempenho, quando comparado com modelos de end-to-end apenas.
O objetivo principal do estágio focou-se no estudo da fonética da língua russa, atendendo a quatro tarefas: criação do inventário fonético-fonológico; validação das transcrições de noticiários; validação do léxico previamente criado e integração de pausas preenchidas no sistema.
Inicialmente, foi realizada uma recolha dos principais meios de comunicação (áudio e texto), apresentando diferentes variedades do russo, nomeadamente, da Rússia Europeia, Bielorrússia e Cáucaso Central. Na Rússia europeia o russo é a língua oficial, na Bielorrússia o russo faz parte das línguas oficiais do país, e na região do Cáucaso Central, o russo é usado como língua franca, visto que este era falado na União Soviética e continua até hoje a ser falado nas regiões pós-Soviéticas. Tratou-se de abranger a maior cobertura possível da língua russa e neste momento apenas foi possível recolher os dados das variedades mencionadas. Os dados extraídos de momento, juntamente com os dados já existentes na empresa, foram utilizados no treino dos modelos acústicos, modelos de pronúncia e modelos de língua.
Para o tratamento dos dados de áudio, estes foram inseridos numa plataforma proprietária da empresa, Calligraphus, que, para além de fornecer uma interface de transcrição para os anotadores humanos poderem transcrever os conteúdos, efetua também uma sugestão de transcrição automática desses mesmos conteúdos, a fim de diminuir o esforço despendido pelos anotadores na tarefa. De seguida, as transcrições foram analisadas, de forma a garantir que o sistema de anotação criado pela VoiceInteraction foi seguido, indicando todas as disfluências de fala (fenómenos característicos da edição da fala), tais como prolongamentos, pausas preenchidas, repetições, entre outros e transcrevendo a fala o mais próximo da realidade. Posteriormente, os erros sistemáticos foram analisados e exportados, de forma a fornecer orientações e sugestões de melhoria aos anotadores humanos e, por outro lado, melhorar o desempenho do sistema de reconhecimento.
Após a validação das transcrições, juntamente com a anotação de todas as disfluências (que anteriormente eram deixadas de fora), observamos uma diminuição de WER, na maioria dos casos, tal como esperado. Porém, em alguns casos, observamos um aumento do WER. Apesar das correções efetuadas aos ficheiros analisados, os modelos não foram suficientemente eficazes no reconhecimento das palavras corretas, potencialmente problemáticas.
A elevada taxa de WER nos áudios com debates políticos, está relacionada com uma maior frequência de fala sobreposta e disfluências (e.g., pausas preenchidas, prolongamentos). O modelo utilizado para reconhecer todas as variedades foi treinado apenas com a variedade de russo europeu e, por isso, o WER alto também foi observado para as variedades da Bielorrússia e para a região do Cáucaso.
Numa perspetiva baseada em dados coletados pela empresa, foi realizada, de igual modo, uma caracterização e descrição do inventário fonético-fonológico do russo e a construção de regras de pronúncia, para fenómenos de sandhi interno e externo (Shcherba, 1957; Litnevskaya, 2006; Lekant, 2007; Popov, 2014). A empresa já empregava, através de um G2P estatístico específico para russo, um inventário fonético para o russo, correspondente à literatura referida anteriormente, mas o mesmo ainda não havia sido validado. Foi possível realizar uma verificação e correção, com base na caracterização dos fones do léxico do russo e nos dados ecológicos obtidos de falantes russos em situações comunicativas diversas. A validação do inventário fonético-fonológico permitiu ainda a consequente validação do léxico de russo. O léxico foi construído com base num conjunto de características (e.g., grafema em posição átona tem como pronúncia correspondente o fone [I] e em posição tónica - [i]; o grafema em posição final de palavra é pronunciado como [- vozeado] - [f]; entre outras características) e foi organizado com base no critério da frequência de uso. No total, foram verificadas dez mil (10 000) palavras mais frequentes do russo, tendo por base as estatísticas resultantes da análise dos conteúdos existentes num repositório de artigos de notícias recolhidos previamente de jornais de referência em língua russa.
Foi realizada uma avaliação do sistema de reconhecimento antes e depois da modificação das dez mil palavras mais frequentemente ocorridas no léxico - 19,85% WER antes das modificações, e 19,97% WER depois, com uma diferença de 0,12%. Os resultados preliminares, sem o treino do modelo, não demonstram resultados significativos, porém, observamos uma ligeira melhoria no reconhecimento das palavras mais frequentes, tais como palavras funcionais, acrónimos, verbos, nomes, entre outros. Através destes resultados e com base nas regras criadas a partir da correção das dez mil palavras, pretendemos, no futuro, alargar as mesmas a todo o léxico, constituído por quatrocentas mil (400 000) entradas. Após a validação das transcrições e do léxico, com base na literatura, foi também possível realizar uma análise das pausas preenchidas do russo para a integração no sistema de reconhecimento. O interesse de se incluir também as pausas no reconhecedor automático deveu-se sobretudo a estes mecanismos serem difíceis de identificar automaticamente e poderem ser substituídos ou por afetarem as sequências adjacentes. De acordo com o sistema de anotação da empresa, as pausas preenchidas são marcadas na transcrição com o símbolo de percentagem - %. As pausas preenchidas do russo encontradas na literatura foram %a [a], %am [am] (Rose, 1998; Ten, 2015), %@ [ə], %@m [əm] (Bogdanova-Beglarian & Baeva, 2018) %e [e], %ɨ [ɨ], %m [m] e %n [n] (Harlamova, 2008). Nos dados de áudio disponíveis na referida plataforma, para além das pausas preenchidas mencionadas, foram encontradas mais duas, nomeadamente, %na [na] e %mna [mna], até quanto nos é dado saber, ainda não descritas na literatura. De momento, todas as pausas preenchidas referidas já fazem parte dos modelos de reconhecimento automático de fala para a língua russa.
O trabalho desenvolvido durante o estágio, ou seja, a validação dos dados existentes na empresa, foi aplicado ao projeto europeu AIDA - The Artificial Intelligence and Advanced Data Analysis for Authority Agencies. O objetivo principal do presente projeto é de criar uma solução capaz de detetar possíveis crimes informáticos e de terrorismo, utilizando métodos de aprendizagem automática. A principal contribuição da VoiceInteraction para o projeto foi a aplicação do ASR e validação das transcrições do russo (conteúdo relacionado com a religião). Para tal, todas as tarefas realizadas durante a tese foram muito relevantes e aplicadas no âmbito do projeto AIDA.
Os resultados da validação das transcrições do projeto, mostraram uma elevada taxa de palavras Fora de Vocabulário (OOV) e uma elevada taxa de Substituição (SUBS). Uma vez que o modelo de língua utilizado neste projeto foi adaptado ao conteúdo noticioso, as palavras relacionadas com a religião não se encontravam neste. Além disso, as palavras funcionais foram incorretamente reconhecidas, na maioria dos casos, devido à coarticulação com a palavra anterior ou a seguinte
- …