675 research outputs found

    Utilising Tree-Based Ensemble Learning for Speaker Segmentation

    Get PDF
    Part 2: Learning-Ensemble LearningInternational audienceIn audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F1 compared with the standard ΔBIC with optimally tuned penalty parameter

    Optimal feature selection and machine learning for high-level audio classification : a random forests approach

    Get PDF
    Content related information, metadata, and semantics can be extracted from soundtracks of multimedia files. Speech recognition, music information retrieval and environmental sound detection techniques have been developed into a fairly mature technology enabling a final text mining process to obtain semantics for the audio scene. An efficient speech, music and environmental sound classification system, which correctly identify these three types of audio signals and feed them into dedicated recognisers, is a critical pre-processing stage for such a content analysis system. The performance and computational efficiency of such a system is predominately dependent on the selected features. This thesis presents a detailed study to identify the suitable classification features and associate a suitable machine learning technique for the intended classification task. In particular, a systematic feature selection procedure is developed to employ the random forests classifier to rank the features according to their importance and reduces the dimensionality of the feature space accordingly. This new technique avoids the trial-and-error approach used by many authors researchers. The implemented feature selection produces results related to individual classification tasks instead of the commonly used statistical distance criteria based approaches that does not consider the intended classification task, which makes it more suitable for supervised learning with specific purposes. A final collective decision-making stage is employed to combine multiple class detectors patterns into one to produce a single classification result for each input frames. The performance of the proposed feature selection technique has been compared with the techniques proposed by MPEG-7 standard to extract the reduced feature space. The results show a significant improvement in the resulted classification accuracy, at the same time, the feature space is simplified and computational overhead reduced. The proposed feature selection and machine learning technique enable the use of only 30 out of the 47 features without degrading the classification accuracy while the classification accuracy lowered by 1.7% only while just 10 features were utilised. The validation shows good performance also and the last stage of collective decision making was able to improve the classification result even after selecting only a small number of classification features. The work represents a successful attempt to determine audio feature importance and classify the audio contents into speech, music and environmental sound using a selected feature subset. The result shows a high degree of accuracy by utilising the random forests for both feature importance ranking and audio content classification

    Cluster validity in clustering methods

    Get PDF

    Robust cepstral feature for bird sound classification

    Get PDF
    Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management

    Statistical Measures to Determine Optimal Structure of Decision Tree: One versus One Support Vector Machine

    Get PDF
    In this paper, one versus one optimal decision tree support vector machine (OvO-ODT SVM) framework is proposed to solve multi-class problems where the optimal structure of decision tree is determined using statistical measures, i.e., information gain, gini index, and chi-square. The performance of proposed OvO-ODT SVM is evaluated in terms of classification accuracy and computation time. It is also shown that proposed OvO-ODT SVM using all the three measures is more efficient in terms of time complexity for both training and testing phases in comparison to conventional OvO and support vector machine binary decision tree (SVMBDT). Experiments on University of California, Irvine (UCI) repository dataset illustrates that ten crossvalidation accuracy of our proposed framework is comparable or better in comparison to conventional OvO and SVM-BDT for most of the datasets. However, the proposed framework outperforms the conventional OvO and SVM-BDT for all the datasets in terms of both training and testing time.Defence Science Journal, 2010, 60(4), pp.399-404, DOI:http://dx.doi.org/10.14429/dsj.60.50

    Transparent Authentication Utilising Gait Recognition

    Get PDF
    Securing smartphones has increasingly become inevitable due to their massive popularity and significant storage and access to sensitive information. The gatekeeper of securing the device is authenticating the user. Amongst the many solutions proposed, gait recognition has been suggested to provide a reliable yet non-intrusive authentication approach – enabling both security and usability. While several studies exploring mobile-based gait recognition have taken place, studies have been mainly preliminary, with various methodological restrictions that have limited the number of participants, samples, and type of features; in addition, prior studies have depended on limited datasets, actual controlled experimental environments, and many activities. They suffered from the absence of real-world datasets, which lead to verify individuals incorrectly. This thesis has sought to overcome these weaknesses and provide, a comprehensive evaluation, including an analysis of smartphone-based motion sensors (accelerometer and gyroscope), understanding the variability of feature vectors during differing activities across a multi-day collection involving 60 participants. This framed into two experiments involving five types of activities: standard, fast, with a bag, downstairs, and upstairs walking. The first experiment explores the classification performance in order to understand whether a single classifier or multi-algorithmic approach would provide a better level of performance. The second experiment investigated the feature vector (comprising of a possible 304 unique features) to understand how its composition affects performance and for a comparison a more particular set of the minimal features are involved. The controlled dataset achieved performance exceeded the prior work using same and cross day methodologies (e.g., for the regular walk activity, the best results EER of 0.70% and EER of 6.30% for the same and cross day scenarios respectively). Moreover, multi-algorithmic approach achieved significant improvement over the single classifier approach and thus a more practical approach to managing the problem of feature vector variability. An Activity recognition model was applied to the real-life gait dataset containing a more significant number of gait samples employed from 44 users (7-10 days for each user). A human physical motion activity identification modelling was built to classify a given individual's activity signal into a predefined class belongs to. As such, the thesis implemented a novel real-world gait recognition system that recognises the subject utilising smartphone-based real-world dataset. It also investigates whether these authentication technologies can recognise the genuine user and rejecting an imposter. Real dataset experiment results are offered a promising level of security particularly when the majority voting techniques were applied. As well as, the proposed multi-algorithmic approach seems to be more reliable and tends to perform relatively well in practice on real live user data, an improved model employing multi-activity regarding the security and transparency of the system within a smartphone. Overall, results from the experimentation have shown an EER of 7.45% for a single classifier (All activities dataset). The multi-algorithmic approach achieved EERs of 5.31%, 6.43% and 5.87% for normal, fast and normal and fast walk respectively using both accelerometer and gyroscope-based features – showing a significant improvement over the single classifier approach. Ultimately, the evaluation of the smartphone-based, gait authentication system over a long period of time under realistic scenarios has revealed that it could provide a secured and appropriate activities identification and user authentication system

    Involving machine learning techniques in heart disease diagnosis: a performance analysis

    Get PDF
    Artificial intelligence is a science that is growing at a tremendous speed every day and has become an essential part of many domains, including the medical domain. Therefore, countless artificial intelligence applications can be seen in the medical domain at various levels, which are employed to enhance early diagnosis and prediction and reduce the risks associated with many diseases, including heart diseases. In this article, machine learning techniques (logistic regression, random forest, artificial neural network, support vector machines, and k-nearest neighbors) are utilized to diagnose heart disease from the Cleveland Clinic dataset got from the University of California Irvine machine learning (UCL) repository and Kaggle platform then create a comparison between the performance of these techniques. In addition, some literature related to machine learning and deep learning techniques that aim to provide reasonable solutions in monitoring, detecting, diagnosing, and predicting heart disease and how these technologies assist in making health decisions are reviewed. Ten studies are selected and summarized by the authors published between 2017 and 2022 are illustrated. After executing a series of tests, it is seen that the most profitable performance in diagnosing heart disease is the support vector machines, with a diagnostic accuracy of 96%. This article has concluded that these techniques play a significant and influential role in assisting physicians and health care workers in analyzing heart patients' data, making health decisions, and saving patients' lives
    • …
    corecore