372 research outputs found
Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion
There are growing implications surrounding generative AI in the speech domain
that enable voice cloning and real-time voice conversion from one individual to
another. This technology poses a significant ethical threat and could lead to
breaches of privacy and misrepresentation, thus there is an urgent need for
real-time detection of AI-generated speech for DeepFake Voice Conversion. To
address the above emerging issues, the DEEP-VOICE dataset is generated in this
study, comprised of real human speech from eight well-known figures and their
speech converted to one another using Retrieval-based Voice Conversion.
Presenting as a binary classification problem of whether the speech is real or
AI-generated, statistical analysis of temporal audio features through t-testing
reveals that there are significantly different distributions. Hyperparameter
optimisation is implemented for machine learning models to identify the source
of speech. Following the training of 208 individual machine learning models
over 10-fold cross validation, it is found that the Extreme Gradient Boosting
model can achieve an average classification accuracy of 99.3% and can classify
speech in real-time, at around 0.004 milliseconds given one second of speech.
All data generated for this study is released publicly for future research on
AI speech detection
Recommended from our members
British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language
In this work, we show that a late fusion approach to multimodality in sign language recognition improves the overall ability of the model in comparison to the singular approaches of image classification (88.14%) and Leap Motion data classification (72.73%). With a large synchronous dataset of 18 BSL gestures collected from multiple subjects, two deep neural networks are benchmarked and compared to derive a best topology for each. The Vision model is implemented by a Convolutional Neural Network and optimised Artificial Neural Network, and the Leap Motion model is implemented by an evolutionary search of Artificial Neural Network topology. Next, the two best networks are fused for synchronised processing, which results in a better overall result (94.44%) as complementary features are learnt in addition to the original task. The hypothesis is further supported by application of the three models to a set of completely unseen data where a multimodality approach achieves the best results relative to the single sensor method. When transfer learning with the weights trained via British Sign Language, all three models outperform standard random weight distribution when classifying American Sign Language (ASL), and the best model overall for ASL classification was the transfer learning multimodality approach, which scored 82.55% accuracy
A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: One for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations
Cross-domain MLP and CNN Transfer Learning for Biological Signal Processing: EEG and EMG
In this work, we show the success of unsupervised transfer learning between Electroencephalographic (brainwave) classification and Electromyographic (muscular wave) domains with both MLP and CNN methods. To achieve this, signals are measured from both the brain and forearm muscles and EMG data is gathered from a 4-class gesture classification experiment via the Myo Armband, and a 3-class mental state EEG dataset is acquired via the Muse EEG Headband. A hyperheuristic multi-objective evolutionary search method is used to find the best network hyperparameters. We then use this optimised topology of deep neural network to classify both EMG and EEG signals, attaining results of 84.76% and 62.37% accuracy, respectively. Next, when pre-trained weights from the EMG classification model are used for initial distribution rather than random weight initialisation for EEG classification, 93.82%(+29.95) accuracy is reached. When EEG pre-trained weights are used for initial weight distribution for EMG, 85.12% (+0.36) accuracy is achieved. When the EMG network attempts to classify EEG, it outperforms the EEG network even without any training (+30.25% to 82.39% at epoch 0), and similarly the EEG network attempting to classify EMG data outperforms the EMG network (+2.38% at epoch 0). All transfer networks achieve higher pre-training abilities, curves, and asymptotes, indicating that knowledge transfer is possible between the two signal domains. In a second experiment with CNN transfer learning, the same datasets are projected as 2D images and the same learning process is carried out. In the CNN experiment, EMG to EEG transfer learning is found to be successful but not vice-versa, although EEG to EMG transfer learning did exhibit a higher starting classification accuracy. The significance of this work is due to the successful transfer of ability between models trained on two different biological signal domains, reducing the need for building more computationally complex models in future research
Recommended from our members
Thumbs up, thumbs down:non-verbal human-robot interaction through real-time EMG classification via inductive and supervised transductive transfer learning
In this study, we present a transfer learning method for gesture classification via an inductive and supervised transductive approach with an electromyographic dataset gathered via the Myo armband. A ternary gesture classification problem is presented by states of ’thumbs up’, ’thumbs down’, and ’relax’ in order to communicate in the affirmative or negative in a non-verbal fashion to a machine. Of the nine statistical learning paradigms benchmarked over 10-fold cross validation (with three methods of feature selection), an ensemble of Random Forest and Support Vector Machine through voting achieves the best score of 91.74% with a rule-based feature selection method. When new subjects are considered, this machine learning approach fails to generalise new data, and thus the processes of Inductive and Supervised Transductive Transfer Learning are introduced with a short calibration exercise (15 s). Failure of generalisation shows that 5 s of data per-class is the strongest for classification (versus one through seven seconds) with only an accuracy of 55%, but when a short 5 s per class calibration task is introduced via the suggested transfer method, a Random Forest can then classify unseen data from the calibrated subject at an accuracy of around 97%, outperforming the 83% accuracy boasted by the proprietary Myo system. Finally, a preliminary application is presented through social interaction with a humanoid Pepper robot, where the use of our approach and a most-common-class metaclassifier achieves 100% accuracy for all trials of a ‘20 Questions’ game
Towards ai-based interactive game intervention to monitor concentration levels in children with attention deficit
—Preliminary results to a new approach for neurocognitive training on academic engagement and monitoring of attention levels in children with learning difficulties is presented. Machine Learning (ML) techniques and a Brain-Computer Interface (BCI) are used to develop an interactive AI-based game for educational therapy to monitor the progress of children’s concentration levels during specific cognitive tasks. Our approach resorts to data acquisition of brainwaves of children using electroencephalography (EEG) to classify concentration levels through model calibration. The real-time brainwave patterns are inputs to our game interface to monitor concentration levels. When the concentration drops, the educational game can personalize to the user by changing the challenge of the training or providing some new visual or auditory stimuli to the user in order to reduce the attention loss. To understand concentration level patterns, we collected brainwave data from children at various primary schools in Brazil who have intellectual disabilities e.g. autism spectrum disorder and attention deficit hyperactivity disorder. Preliminary results show that we successfully benchmarked (96%) the brainwave patterns acquired by using various classical ML techniques. The result obtained through the automatic classification of brainwaves will be fundamental to further develop our full approach. Positive feedback from questionnaires was obtained for both, the AI-based game and the engagement and motivation during the training sessions
Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach
In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as ‘low’, ‘medium-low’, ‘medium-high’, and ‘high’. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding
Recommended from our members
Chatbot Interaction with Artificial Intelligence:human data augmentation with T5 and language transformer ensemble for text classification
In this work we present the Chatbot Interaction with Artificial Intelligence (CI-AI) framework as an approach to the training of a transformer based chatbot-like architecture for task classification with a focus on natural human interaction with a machine as opposed to interfaces, code, or formal commands. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for Natural Language Processing (NLP). Human beings are asked to paraphrase commands and questions for task identification for further execution of algorithms as skills. The commands and questions are split into training and validation sets. A total of 483 responses were recorded. Secondly, the training set is paraphrased by the T5 model in order to augment it with further data. Seven state-of-the-art transformer-based text classification algorithms (BERT, DistilBERT, RoBERTa, DistilRoBERTa, XLM, XLM-RoBERTa, and XLNet) are benchmarked for both sets after fine-tuning on the training data for two epochs. We find that all models are improved when training data is augmented by the T5 model, with an average increase of classification accuracy by 4.01%. The best result was the RoBERTa model trained on T5 augmented data which achieved 98.96% classification accuracy. Finally, we found that an ensemble of the five best-performing transformer models via Logistic Regression of output label predictions led to an accuracy of 99.59% on the dataset of human responses. A highly-performing model allows the intelligent system to interpret human commands at the social-interaction level through a chatbot-like interface (e.g. “Robot, can we have a conversation?”) and allows for better accessibility to AI by non-technical users
- …