1,049 research outputs found

    Boosting parallel perceptrons for label noise reduction in classification problems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/11499305_60Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005, Las Palmas, Canary Islands, Spain, June 15-18, 2005Boosting combines an ensemble of weak learners to construct a new weighted classifier that is often more accurate than any of its components. The construction of such learners, whose training sets depend on the performance of the previous members of the ensemble, is carried out by successively focusing on those patterns harder to classify. This fact deteriorates boosting’s results when dealing with malicious noise as, for instance, mislabeled training examples. In order to detect and avoid those noisy examples during the learning process, we propose the use of Parallel Perceptrons. Among other things, these novel machines allow to naturally define margins for hidden unit activations. We shall use these margins to detect which patterns may have an incorrect label and also which are safe, in the sense of being well represented in the training sample by many other similar patterns. As candidates for being noisy examples we shall reduce the weights of the former ones, and as a support for the overall detection procedure we shall augment the weights of the latter ones.With partial support of Spain’s CICyT, TIC 01–572, TIN 2004–0767

    A neural network approach to audio-assisted movie dialogue detection

    Get PDF
    A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%

    Diagnosis of Parkinson’s Disease by Boosted Neural Networks

    Get PDF
    A boosting by filtering technique for neural network systems with back propagation together with a majority voting scheme is presented in this paper. Previous research with regards to predict the presence of Parkinson’s Disease has shown accuracy rates up to 92.9% [1] but it comes with a cost of reduced prediction accuracy of the minority class. The designed neural network system boosted by filtering in this article presents a significant increase of robustness and it is shown that by majority voting of the parallel networks, recognition rates reach to > 90 in a imbalanced 3:1 imbalanced class distribution Parkinson’s Disease data set

    Development of an R package to learn supervised classification techniques

    Get PDF
    This TFG aims to develop a custom R package for teaching supervised classification algorithms, starting with the identification of requirements, including algorithms, data structures, and libraries. A strong theoretical foundation is essential for effective package design. Documentation will explain each function’s purpose, accompanied by necessary paperwork. The package will include R scripts and data files in organized directories, complemented by a user manual for easy installation and usage, even for beginners. Built entirely from scratch without external dependencies, it’s optimized for accuracy and performance. In conclusion, this TFG provides a roadmap for creating an R package to teach supervised classification algorithms, benefiting researchers and practitioners dealing with real-world challenges.Grado en Ingeniería Informátic

    Diagnosis of Parkinson’s Disease using Principal Component Analysis and Boosting Committee Machines

    Get PDF
    Parkinson’s disease (PD) has become one of the most common degenerative disorders of the central nervous system. In this study, our main goal was to discriminate between healthy people and people with Parkinson’s disease. In order to achieve this we used artificial neural networks, and dataset taken from University of California, Irvine machine learning database, having 48 normal and 147 PD cases. We examine the performance of neural network systems with back propagation together with a majority voting scheme. In order to train examples we used boosting by filtering technique with seven committee machines, and principal component analysis is used for data reduction. The experimental results have demonstrated that the combination of these proposed methods has obtained very good results with correct positive value of 92% on the classification of PD.

    Doctor of Philosophy

    Get PDF
    dissertationMachine learning is the science of building predictive models from data that automatically improve based on past experience. To learn these models, traditional learning algorithms require labeled data. They also require that the entire dataset fits in the memory of a single machine. Labeled data are available or can be acquired for small and moderately sized datasets but curating large datasets can be prohibitively expensive. Similarly, massive datasets are usually too huge to fit into the memory of a single machine. An alternative is to distribute the dataset over multiple machines. Distributed learning, however, poses new challenges as most existing machine learning techniques are inherently sequential. Additionally, these distributed approaches have to be designed keeping in mind various resource limitations of real-world settings, prime among them being intermachine communication. With the advent of big datasets machine learning algorithms are facing new challenges. Their design is no longer limited to minimizing some loss function but, additionally, needs to consider other resources that are critical when learning at scale. In this thesis, we explore different models and measures for learning with limited resources that have a budget. What budgetary constraints are posed by modern datasets? Can we reuse or combine existing machine learning paradigms to address these challenges at scale? How does the cost metrics change when we shift to distributed models for learning? These are some of the questions that have been investigated in this thesis. The answers to these questions hold the key to addressing some of the challenges faced when learning on massive datasets. In the first part of this thesis, we present three different budgeted scenarios that deal with scarcity of labeled data and limited computational resources. The goal is to leverage transfer information from related domains to learn under budgetary constraints. Our proposed techniques comprise semisupervised transfer, online transfer and active transfer. In the second part of this thesis, we study distributed learning with limited communication. We present initial sampling based results, as well as, propose communication protocols for learning distributed linear classifiers

    An Introduction to Machine Learning -2/E

    Get PDF
    • …
    corecore