96,059 research outputs found

    Active Learning with Semi-Supervised Support Vector Machines

    Get PDF
    A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive

    Neural networks and support vector machines based bio-activity classification

    Get PDF
    Classification of various compounds into their respective biological activity classes is important in drug discovery applications from an early phase virtual compound filtering and screening point of view. In this work two types of neural networks, multi layer perceptron (MLP) and radial basis functions (RBF), and support vector machines (SVM) were employed for the classification of three types of biologically active enzyme inhibitors. Both of the networks were trained with back propagation learning method with chemical compounds whose active inhibition properties were previously known. A group of topological indices, selected with the help of principle component analysis (PCA) were used as descriptors. The results of all the three classification methods show that the performance of both the neural networks is better than the SVM

    Non-linear Machine Learning with Active Sampling for MOX Drift Compensation

    Get PDF
    Abstract—Metal oxide (MOX) gas detectors based on SnO2 provide low-cost solutions for real-time sensing of complex gas mixtures for indoor ambient monitoring. With high sensitivity under ideal conditions, MOX detectors may have poor longterm response accuracy due to environmental factors (humidity and temperature) along with sensor aging, leading to calibration drifts. Finding a simple and efficient solution to correct such calibration drifts has been the subject of numerous studies but remains an open problem. In this work, we present an efficient approach to MOX calibration using active and transfer sampling techniques coupled with non-linear machine learning algorithms, namely neural networks, extreme gradient boosting (XGBoost) and radial kernel support vector machines (SVM). Applied on the UCI’s HT detectors dataset, the study evaluates methods for active sampling, makes an assessment of suitable neural networks architectures and compares the performance of neural networks, XGBoost and radial kernel SVM to classify gas mixtures (banana and wine odours, clean air) in the presence of humidity and temperature changes. The results show high classification accuracy levels (above 90%) and confirm that active sampling can provide a suitable solution. Index Terms—Neural Networks, Extreme Gradient Boosting, XGBoost, Support Vector Machines, Non-Linear Learning Methods, Machine Learnin

    Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification

    Get PDF
    Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy

    Bankruptcy prediction of engineering companies in the EU using classification methods

    Get PDF
    This article focuses on the problem of binary classification of 902 small- and medium-sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.O

    Using machine learning to predict potential online gambling addicts.

    Get PDF
    Betting addicts on the gambling websites are difficult to identify because online gambling is by nature different from real gambling. This thesis attempts to identify potential gambling addicts in an online gambling website X using machine learning models. The models are based on user’s usage history on the website. The usage data is collected for each user from the site using JavaScript. The data is then analyzed and stored in a database. Machine learning models are then trained using Support Vector Machines with the data of users who are by definition problem gamblers. The system then makes a prediction for all active users based on their recent usage history. The final results include an automated system for daily learning and prediction of potential problem gamblers who show early signs of gambling addiction
    • 

    corecore