3,302 research outputs found
Speaker recognition for door opening systems
Mestrado de dupla diplomação com a UTFPR - Universidade Tecnológica Federal do ParanáBesides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by
demonstrating the potential of speaker recognition systems in controlled environments.Além de ser uma importante ferramenta de comunicação, a voz também pode servir para fins de identificação, pois possui uma assinatura individual para cada pessoa. As tecnologias de reconhecimento de voz podem usar essa assinatura como um método de autenticação para acessar ambientes. Este trabalho explora o desenvolvimento e teste de modelos de aprendizado de máquina e aprendizado profundo, especificamente os modelos GMM, VGG-M e ResNet50, para controle de acesso de reconhecimento de voz com o objetivo de construir um sistema para conceder acesso ao laboratório do CeDRI. Os modelos de aprendizado profundo foram avaliados com base em seu desempenho no reconhecimento de falantes a partir de amostras de áudio, enfatizando a métrica de Taxa de Erro Igual para determinar sua eficácia. Osmodelos foram inicialmente treinados e testados em conjuntos de dados públicos com 1251 a 6112 falantes e, em seguida, ajustados em conjuntos de dados privados com 32 falantes do laboratório do CeDri.
Neste estudo, comparamos o desempenho dos modelos ResNet50, VGGM e GMM para verificação de falantes. Após realizar experimentos em nossos conjuntos de dados privados, descobrimos que o modelo ResNet50 superou os outros modelos. Ele alcançou a menor Taxa de Erro Igual (EER) de 0,7% no conjunto de dados Framed Silence Removed. No mesmo conjunto de dados, o modelo VGGM alcançou uma EER de 5% e o modelo GMM alcançou uma EER de 2,13%.
O desempenho do nosso melhor modelo não conseguiu atingir o estado da arte atual de 2,87% no conjunto de dados de verificação VoxCeleb 1. No entanto, nossa melhor implementação usando o ResNet50 alcançou uma EER de 5,96%, mesmo sendo treinado em apenas uma pequena parte dos dados que normalmente são utilizados. Assim, este resultado indica que nosso modelo é robusto e eficiente e oferece uma margem significativa de melhoria. Esta tese oferece insights sobre as capacidades desses modelos em uma aplicação do mundo real, visando implantar o sistema em uma plataforma para uso prático na autorização
de acesso ao laboratório. Os resultados deste estudo contribuem para o campo da segurança biométrica ao demonstrar o potencial dos sistemas de reconhecimento de voz em ambientes controlados
Towards Robust Plant Disease Diagnosis with Hard-sample Re-mining Strategy
With rich annotation information, object detection-based automated plant
disease diagnosis systems (e.g., YOLO-based systems) often provide advantages
over classification-based systems (e.g., EfficientNet-based), such as the
ability to detect disease locations and superior classification performance.
One drawback of these detection systems is dealing with unannotated healthy
data with no real symptoms present. In practice, healthy plant data appear to
be very similar to many disease data. Thus, those models often produce
mis-detected boxes on healthy images. In addition, labeling new data for
detection models is typically time-consuming. Hard-sample mining (HSM) is a
common technique for re-training a model by using the mis-detected boxes as new
training samples. However, blindly selecting an arbitrary amount of hard-sample
for re-training will result in the degradation of diagnostic performance for
other diseases due to the high similarity between disease and healthy data. In
this paper, we propose a simple but effective training strategy called
hard-sample re-mining (HSReM), which is designed to enhance the diagnostic
performance of healthy data and simultaneously improve the performance of
disease data by strategically selecting hard-sample training images at an
appropriate level. Experiments based on two practical in-field eight-class
cucumber and ten-class tomato datasets (42.7K and 35.6K images) show that our
HSReM training strategy leads to a substantial improvement in the overall
diagnostic performance on large-scale unseen data. Specifically, the object
detection model trained using the HSReM strategy not only achieved superior
results as compared to the classification-based state-of-the-art
EfficientNetV2-Large model and the original object detection model, but also
outperformed the model using the HSM strategy
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
Learning-based pattern classifiers, including deep networks, have shown
impressive performance in several application domains, ranging from computer
vision to cybersecurity. However, it has also been shown that adversarial input
perturbations carefully crafted either at training or at test time can easily
subvert their predictions. The vulnerability of machine learning to such wild
patterns (also referred to as adversarial examples), along with the design of
suitable countermeasures, have been investigated in the research field of
adversarial machine learning. In this work, we provide a thorough overview of
the evolution of this research area over the last ten years and beyond,
starting from pioneering, earlier work on the security of non-deep learning
algorithms up to more recent work aimed to understand the security properties
of deep learning algorithms, in the context of computer vision and
cybersecurity tasks. We report interesting connections between these
apparently-different lines of work, highlighting common misconceptions related
to the security evaluation of machine-learning algorithms. We review the main
threat models and attacks defined to this end, and discuss the main limitations
of current work, along with the corresponding future challenges towards the
design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Implicit Smartphone User Authentication with Sensors and Contextual Machine Learning
Authentication of smartphone users is important because a lot of sensitive
data is stored in the smartphone and the smartphone is also used to access
various cloud data and services. However, smartphones are easily stolen or
co-opted by an attacker. Beyond the initial login, it is highly desirable to
re-authenticate end-users who are continuing to access security-critical
services and data. Hence, this paper proposes a novel authentication system for
implicit, continuous authentication of the smartphone user based on behavioral
characteristics, by leveraging the sensors already ubiquitously built into
smartphones. We propose novel context-based authentication models to
differentiate the legitimate smartphone owner versus other users. We
systematically show how to achieve high authentication accuracy with different
design alternatives in sensor and feature selection, machine learning
techniques, context detection and multiple devices. Our system can achieve
excellent authentication performance with 98.1% accuracy with negligible system
overhead and less than 2.4% battery consumption.Comment: Published on the IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN) 2017. arXiv admin note: substantial text overlap
with arXiv:1703.0352
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
- …