95,691 research outputs found

    Non-linear Machine Learning with Active Sampling for MOX Drift Compensation

    Get PDF
    Abstract—Metal oxide (MOX) gas detectors based on SnO2 provide low-cost solutions for real-time sensing of complex gas mixtures for indoor ambient monitoring. With high sensitivity under ideal conditions, MOX detectors may have poor longterm response accuracy due to environmental factors (humidity and temperature) along with sensor aging, leading to calibration drifts. Finding a simple and efficient solution to correct such calibration drifts has been the subject of numerous studies but remains an open problem. In this work, we present an efficient approach to MOX calibration using active and transfer sampling techniques coupled with non-linear machine learning algorithms, namely neural networks, extreme gradient boosting (XGBoost) and radial kernel support vector machines (SVM). Applied on the UCI’s HT detectors dataset, the study evaluates methods for active sampling, makes an assessment of suitable neural networks architectures and compares the performance of neural networks, XGBoost and radial kernel SVM to classify gas mixtures (banana and wine odours, clean air) in the presence of humidity and temperature changes. The results show high classification accuracy levels (above 90%) and confirm that active sampling can provide a suitable solution. Index Terms—Neural Networks, Extreme Gradient Boosting, XGBoost, Support Vector Machines, Non-Linear Learning Methods, Machine Learnin

    Using machine learning to predict potential online gambling addicts.

    Get PDF
    Betting addicts on the gambling websites are difficult to identify because online gambling is by nature different from real gambling. This thesis attempts to identify potential gambling addicts in an online gambling website X using machine learning models. The models are based on user’s usage history on the website. The usage data is collected for each user from the site using JavaScript. The data is then analyzed and stored in a database. Machine learning models are then trained using Support Vector Machines with the data of users who are by definition problem gamblers. The system then makes a prediction for all active users based on their recent usage history. The final results include an automated system for daily learning and prediction of potential problem gamblers who show early signs of gambling addiction

    Bankruptcy prediction of engineering companies in the EU using classification methods

    Get PDF
    This article focuses on the problem of binary classification of 902 small- and medium-sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.O

    Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification

    Get PDF
    Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy

    Utilizing Import Vector Machines to Identify Dangerous Pro-active Traffic Conditions

    Full text link
    Traffic accidents have been a severe issue in metropolises with the development of traffic flow. This paper explores the theory and application of a recently developed machine learning technique, namely Import Vector Machines (IVMs), in real-time crash risk analysis, which is a hot topic to reduce traffic accidents. Historical crash data and corresponding traffic data from Shanghai Urban Expressway System were employed and matched. Traffic conditions are labelled as dangerous (i.e. probably leading to a crash) and safe (i.e. a normal traffic condition) based on 5-minute measurements of average speed, volume and occupancy. The IVM algorithm is trained to build the classifier and its performance is compared to the popular and successfully applied technique of Support Vector Machines (SVMs). The main findings indicate that IVMs could successfully be employed in real-time identification of dangerous pro-active traffic conditions. Furthermore, similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM, and its classification rates are similar to those of SVMs. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.Comment: 6 pages, 3 figures, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC

    A Novel Hybrid Classification Approach for Sentiment Analysis of Text Document

    Get PDF
    Sentiment analysis is a more popular area of highly active research in Automatic Language Processing. She assigns a negative or positive polarity to one or more entities using different natural language processing tools and also predicted high and low performance of various sentiment classifiers. Our approach focuses on the analysis of feelings resulting from reviews of products using original text search techniques. These reviews can be classified as having a positive or negative feeling based on certain aspects in relation to a query based on terms. In this paper, we chose to use two automatic learning methods for classification: Support Vector Machines (SVM) and Random Forest, and we introduce a novel hybrid approach to identify product reviews offered by Amazon. This is useful for consumers who want to research the sentiment of products before purchase, or companies that want to monitor the public sentiment of their brands. The results summarize that the proposed method outperforms these individual classifiers in this amazon dataset
    • …
    corecore