70,406 research outputs found

    A Convex Relaxation for Weakly Supervised Classifiers

    Full text link
    This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as expectation-maximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the soft-max loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semi-supervised learning as well as on clustering tasks.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    Mengenal Machine Learning Dengan Teknik Supervised Dan Unsupervised Learning Menggunakan Python

    Get PDF
    Abstrak: Machine learning merupakan sistem yang mampu belajar sendiri untuk memutuskan sesuatu tanpa harus berulangkali diprogram oleh manusia sehingga komputer menjadi semakin cerdas berlajar dari pengalaman data yang dimiliki. Berdasarkan teknik pembelajarannya, dapat dibedakan supervised learning menggunakan dataset (data training) yang sudah berlabel, sedangkan unsupervised learning menarik kesimpulan berdasarkan dataset. Input berupa dataset digunakan pembelajaran mesin untuk menghasilkan analisis yang benar. Permasalahan yang akan diselesaikan bunga iris (iris tectorum) yang memiliki bunga bermaca-macam warna dan memiliki sepal dan petal yang menunjukkan spesies bunga, dibutuhkan metode yang tepat untuk pengelompokan bunga-bunga tersebut kedalam spesiesnya iris-setosa, iris-versicolor atau iris-virginica. Penyelesaian digunakan Python yang menyediakan algoritma dan library yang digunakan membuat machine learning. Penyelesaian dengan teknik supervised learning dipilih algoritma KNN Clasiffier dan teknik unsupervised learning dipilih algoritma DBSCAN Clustering. Hasil yang diperoleh Python menyediakan library yang lengkap numPy, Pandas, matplotlib, sklearn untuk membuat pemrograman machine learning dengan algortima KNN memanggil from sklearn import neighbors termasuk teknik supervised, maupun DBSCAN memanggil from sklearn.cluster import DBSCAN termasuk teknik unsupervised learning. Kemampuan Python memberikan hasil output sesuai input dalam dataset menghasilkan keputusan berupa klasifikasi maupun klusterisasi.   Kata kunci: DBSCAN, KNN, machine learning, python.   Abstract: Machine learning is a system that is able to learn on its own to decide something without having to be repeatedly programmed by humans so that computers become smarter in learning from the experience of the data they have. Based on the learning technique, supervised learning can be distinguished using a dataset (training data) that is already labeled, while unsupervised learning draws conclusions based on the dataset. The input in the form of a dataset is used by machine learning to produce the correct analysis. The problem to be solved by iris flowers (iris tectorum), which has flowers of various colors and has sepals and petals that indicate the species of flowers, requires an appropriate method for grouping these flowers into iris-setosa, iris-versicolor or iris-virginica species. The solution is used by Python, which provides the algorithms and libraries used to make machine learning. The solution with the supervised learning technique was chosen by the KNN Clasiffier algorithm and the unsupervised learning technique was selected by the DBSCAN Clustering algorithm. The results obtained by Python provide a complete library of numPy, Pandas, matplotlib, sklearn to create machine learning programming with KNN algorithms calling from sklearn import neighbors including supervised techniques, and DBSCAN calling from sklearn.cluster import DBSCAN including unsupervised learning techniques. Python's ability to provide output according to the input in the dataset results in decisions in the form of classification and clustering.   Keywords: DBSCAN, KNN, machine learning, python

    Generalized FLIC: Learning with Misclassification for Binary Classifiers

    Get PDF
    This work formally introduces a generalized fuzzy logic and interval clustering (FLIC) technique which, when integrated with existing supervised learning algorithms, improves their performance. FLIC is a method that was first integrated with neural network in order to improve neural network's performance in drug discovery using high throughput screening (HTS). This research strictly focuses on binary classification problems and generalizes the FLIC in order to incorporate it with other machine learning algorithms. In most binary classification problems, the class boundary is not linear. This pose a major problem when the number of outliers are significantly high, degrading the performance of the supervised learning function. FLIC identifies these misclassifications before the training set is introduced to the learning algorithm. This allows the supervised learning algorithm to learn more efficiently since it is now aware of those misclassifications. Although the proposed method performs well with most binary classification problems, it does significantly well for data set with high class asymmetry. The proposed method has been tested on four well known data sets of which three are from UCI Machine Learning repository and one from BigML. Tests have been conducted with three well known supervised learning techniques: Decision Tree, Logistic Regression and Naive Bayes. The results from the experiments show significant improvement in performance. The paper begins with a formal introduction to the core idea this research is based upon. It then discusses a list of other methods that have either inspired this research or have been referred to, in order to formalize the techniques. Subsequent sections discuss the methodology and the algorithm which is followed by results and conclusion

    Self-Taught Anomaly Detection With Hybrid Unsupervised/Supervised Machine Learning in Optical Networks

    Get PDF
    This paper proposes a self-taught anomaly detection framework for optical networks. The proposed framework makes use of a hybrid unsupervised and supervised machine learning scheme. First, it employs an unsupervised data clustering module (DCM) to analyze the patterns of monitoring data. The DCM enables a self-learning capability that eliminates the requirement of prior knowledge of abnormal network behaviors and therefore can potentially detect unforeseen anomalies. Second, we introduce a self-taught mechanism that transfers the patterns learned by the DCM to a supervised data regression and classification module (DRCM). The DRCM, whose complexity is mainly related to the scale of the applied supervised learning model, can potentially facilitate more scalable and time-efficient online anomaly detection by avoiding excessively traversing the original dataset. We designed the DCM and DRCM based on the density-based clustering algorithm and the deep neural network structure, respectively. Evaluations with experimental data from two use cases (i.e., single-point detection and end-to-end detection) demonstrate that up to 99% anomaly detection accuracy can be achieved with a false positive rate below 1%

    From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification.</p> <p>Results</p> <p>In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model.</p> <p>Conclusions</p> <p>FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.</p

    QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning

    Full text link
    Machine Learning algorithms based on Brain-inspired Hyperdimensional (HD) computing imitate cognition by exploiting statistical properties of high-dimensional vector spaces. It is a promising solution for achieving high energy-efficiency in different machine learning tasks, such as classification, semi-supervised learning and clustering. A weakness of existing HD computing-based ML algorithms is the fact that they have to be binarized for achieving very high energy-efficiency. At the same time, binarized models reach lower classification accuracies. To solve the problem of the trade-off between energy-efficiency and classification accuracy, we propose the QubitHD algorithm. It stochastically binarizes HD-based algorithms, while maintaining comparable classification accuracies to their non-binarized counterparts. The FPGA implementation of QubitHD provides a 65% improvement in terms of energy-efficiency, and a 95% improvement in terms of the training time, as compared to state-of-the-art HD-based ML algorithms. It also outperforms state-of-the-art low-cost classifiers (like Binarized Neural Networks) in terms of speed and energy-efficiency by an order of magnitude during training and inference.Comment: 8 pages, 7 figures, 3 table

    COVID-19: Symptoms Clustering and Severity Classification Using Machine Learning Approach

    Get PDF
    COVID-19 is an extremely contagious illness that causes illnesses varying from either the common cold to more chronic illnesses or even death. The constant mutation of a new variant of COVID-19 makes it important to identify the symptom of COVID-19 in order to contain the infection. The use of clustering and classification in machine learning is in mainstream use in different aspects of research, especially in recent years to generate useful knowledge on COVID-19 outbreak. Many researchers have shared their COVID-19 data on public database and a lot of studies have been carried out. However, the merit of the dataset is unknown and analysis need to be carried by the researchers to check on its reliability. The dataset that is used in this work was sourced from the Kaggle website. The data was obtained through a survey collected from participants of various gender and age who had been to at least ten countries. There are four levels of severity based on the COVID-19 symptom, which was developed in accordance to World Health Organization (WHO) and the Indian Ministry of Health and Family Welfare recommendations.  This paper presented an inquiry on the dataset utilising supervised and unsupervised machine learning approaches in order to better comprehend the dataset. In this study, the analysis of the severity group based on the COVID-19 symptoms using supervised learning techniques employed a total of seven classifiers, namely the K-NN, Linear SVM, Naive Bayes, Decision Tree (J48), Ada Boost, Bagging, and Stacking. For the unsupervised learning techniques, the clustering algorithm utilized in this work are Simple K-Means and Expectation-Maximization. From the result obtained from both supervised and unsupervised learning techniques, we observed that the result analysis yielded relatively poor classification and clustering results. The findings for the dataset analysed in this study do not appear to be providing the correct result for the symptoms categorized against the severity level which raises concerns about the validity and reliability of the dataset

    COVID-19: Symptoms Clustering and Severity Classification Using Machine Learning Approach

    Get PDF
    COVID-19 is an extremely contagious illness that causes illnesses varying from either the common cold to more chronic illnesses or even death. The constant mutation of a new variant of COVID-19 makes it important to identify the symptom of COVID-19 in order to contain the infection. The use of clustering and classification in machine learning is in mainstream use in different aspects of research, especially in recent years to generate useful knowledge on COVID-19 outbreak. Many researchers have shared their COVID-19 data on public database and a lot of studies have been carried out. However, the merit of the dataset is unknown and analysis need to be carried by the researchers to check on its reliability. The dataset that is used in this work was sourced from the Kaggle website. The data was obtained through a survey collected from participants of various gender and age who had been to at least ten countries. There are four levels of severity based on the COVID-19 symptom, which was developed in accordance to World Health Organization (WHO) and the Indian Ministry of Health and Family Welfare recommendations.  This paper presented an inquiry on the dataset utilising supervised and unsupervised machine learning approaches in order to better comprehend the dataset. In this study, the analysis of the severity group based on the COVID-19 symptoms using supervised learning techniques employed a total of seven classifiers, namely the K-NN, Linear SVM, Naive Bayes, Decision Tree (J48), Ada Boost, Bagging, and Stacking. For the unsupervised learning techniques, the clustering algorithm utilized in this work are Simple K-Means and Expectation-Maximization. From the result obtained from both supervised and unsupervised learning techniques, we observed that the result analysis yielded relatively poor classification and clustering results. The findings for the dataset analysed in this study do not appear to be providing the correct result for the symptoms categorized against the severity level which raises concerns about the validity and reliability of the dataset

    Zero initialized active learning with spectral clustering using Hungarian method

    Get PDF
    Supervised machine learning tasks often require a large number of labeled training data to set up a model, and then prediction - for example the classification - is carried out based on this model. Nowadays tremendous amount of data is available on the web or in data warehouses, although only a portion of those data is annotated and the labeling process can be tedious, expensive and time consuming. Active learning tries to overcome this problem by reducing the labeling cost through allowing the learning system to iteratively select the data from which it learns. In special case of active learning, the process starts from zero initialized scenario, where the labeled training dataset is empty, and therefore only unsupervised methods can be performed. In this paper a novel query strategy framework is presented for this problem, called Clustering Based Balanced Sampling Framework (CBBSF), which is not only select the initial labeled training dataset, but uniformly selects the items among the categories to get a balanced labeled training dataset. The framework includes an assignment technique to implicitly determine the class membership probabilities. Assignment solution is updated during CBBSF iterations, hence it simulates supervised machine learning more accurately as the process progresses. The proposed Spectral Clustering Based Sampling (SCBS) query startegy realizes the CBBSF framework, and therefore it is applicable in the special zero initialized situation. This selection approach uses ClusterGAN (Clustering using Generative Adversarial Networks) integrated in the spectral clustering algorithm and then it selects an unlabeled instance depending on the class membership probabilities. Global and local versions of SCBS were developed, furthermore, most confident and minimal entropy measures were calculated, thus four different SCBS variants were examined in total. Experimental evaluation was conducted on the MNIST dataset, and the results showed that SCBS outperforms the state-of-the-art zero initialized active learning query strategies
    • 

    corecore