335 research outputs found

    Active Learning with Semi-Supervised Support Vector Machines

    Get PDF
    A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive

    Optimization meets Machine Learning: An Exact Algorithm for Semi-Supervised Support Vector Machines

    Full text link
    Support vector machines (SVMs) are well-studied supervised learning models for binary classification. In many applications, large amounts of samples can be cheaply and easily obtained. What is often a costly and error-prone process is to manually label these instances. Semi-supervised support vector machines (S3VMs) extend the well-known SVM classifiers to the semi-supervised approach, aiming at maximizing the margin between samples in the presence of unlabeled data. By leveraging both labeled and unlabeled data, S3VMs attempt to achieve better accuracy and robustness compared to traditional SVMs. Unfortunately, the resulting optimization problem is non-convex and hence difficult to solve exactly. In this paper, we present a new branch-and-cut approach for S3VMs using semidefinite programming (SDP) relaxations. We apply optimality-based bound tightening to bound the feasible set. Box constraints allow us to include valid inequalities, strengthening the lower bound. The resulting SDP relaxation provides bounds significantly stronger than the ones available in the literature. For the upper bound, instead, we define a local search exploiting the solution of the SDP relaxation. Computational results highlight the efficiency of the algorithm, showing its capability to solve instances with a number of data points 10 times larger than the ones solved in the literature

    Analisis Sentimen Berbasis Fitur pada Ulasan Online dengan Metode Semi-supervised Support Vector Machines (S3VMs)

    Get PDF
    Situs online review menyediakan fasilitas agar pengguna internet dapat memberikan ulasan mengenai suatu aspek. Sentimen yang terdapat pada kumpulan ulasan mengenai suatu produk bermanfaat dan memiliki pengaruh dalam pengambilan keputusan seseorang atau organisasi. Adapun dalam suatu opini, reviewer dapat memberikan ulasan positif dan negatif sekaligus. Hal ini disebabkan, target opini sering kali bukan merupakan produk secara keseluruhan, melainkan bagian produk yang disebut dengan fitur, dimana terdapat kelebihan dan kekurangan menurut pandangan reviewer.Pada tugas akhir ini, dilakukan penelitian agar sentiment dari suatu opini produk telepon genggam berdasarkan fitur produknya. Data opini yang digunakan pada tugas akhir ini berbahasa Inggris yang diambil dari situs www.cnet.com. Dengan demikian, terdapat dua proses yang dilakukan pada tugas akhir ini : (1) Ekstraksi fitur produk pada opini, (2) Identifikasi sentimen untuk setiap fitur produk. Ekstraksi fitur dilakukan dengan mencari frasa yang sesuai dengan dependencies relation template. Kemudian dilakukan feature filtering. Pada identifikasi sentimen, nilai probabilitas positif, negatif, serta label kelas target dari preparation data, menjadi parameter input classifier S3VMs. Pada penelitian dengan S3VMs, beberapa data diperlakukan sebagai unlabeled data. Dari penelitian ini diperoleh hasil evaluasi untuk identifikasi sentiment dengan F1-Measure untuk kelas positif sebesar 86% dan 70% untuk kelas negatif. Adapun untuk identifikasi fitur diperoleh akurasi 82%. ulasan, sentimen, fitur produk, S3VMs, feature-based opinio

    Mixed-Integer Quadratic Optimization and Iterative Clustering Techniques for Semi-Supervised Support Vector Machines

    Full text link
    Among the most famous algorithms for solving classification problems are support vector machines (SVMs), which find a separating hyperplane for a set of labeled data points. In some applications, however, labels are only available for a subset of points. Furthermore, this subset can be non-representative, e.g., due to self-selection in a survey. Semi-supervised SVMs tackle the setting of labeled and unlabeled data and can often improve the reliability of the results. Moreover, additional information about the size of the classes can be available from undisclosed sources. We propose a mixed-integer quadratic optimization (MIQP) model that covers the setting of labeled and unlabeled data points as well as the overall number of points in each class. Since the MIQP's solution time rapidly grows as the number of variables increases, we introduce an iterative clustering approach to reduce the model's size. Moreover, we present an update rule for the required big-MM values, prove the correctness of the iterative clustering method as well as derive tailored dimension-reduction and warm-starting techniques. Our numerical results show that our approach leads to a similar accuracy and precision than the MIQP formulation but at much lower computational cost. Thus, we can solve solve larger problems. With respect to the original SVM formulation, we observe that our approach has even better accuracy and precision for biased samples.Comment: 33 pages,18 figure

    Detecting genuine multipartite entanglement via machine learning

    Full text link
    In recent years, supervised and semi-supervised machine learning methods such as neural networks, support vector machines (SVM), and semi-supervised support vector machines (S4VM) have been widely used in quantum entanglement and quantum steering verification problems. However, few studies have focused on detecting genuine multipartite entanglement based on machine learning. Here, we investigate supervised and semi-supervised machine learning for detecting genuine multipartite entanglement of three-qubit states. We randomly generate three-qubit density matrices, and train an SVM for the detection of genuine multipartite entangled states. Moreover, we improve the training method of S4VM, which optimizes the grouping of prediction samples and then performs iterative predictions. Through numerical simulation, it is confirmed that this method can significantly improve the prediction accuracy.Comment: 9 pages, 8 figure

    Applicability of semi-supervised learning assumptions for gene ontology terms prediction

    Get PDF
    Gene Ontology (GO) is one of the most important resources in bioinformatics, aiming to provide a unified framework for the biological annotation of genes and proteins across all species. Predicting GO terms is an essential task for bioinformatics, but the number of available labelled proteins is in several cases insufficient for training reliable machine learning classifiers. Semi-supervised learning methods arise as a powerful solution that explodes the information contained in unlabelled data in order to improve the estimations of traditional supervised approaches. However, semi-supervised learning methods have to make strong assumptions about the nature of the training data and thus, the performance of the predictor is highly dependent on these assumptions. This paper presents an analysis of the applicability of semi-supervised learning assumptions over the specific task of GO terms prediction, focused on providing judgment elements that allow choosing the most suitable tools for specific GO terms. The results show that semi-supervised approaches significantly outperform the traditional supervised methods and that the highest performances are reached when applying the cluster assumption. Besides, it is experimentally demonstrated that cluster and manifold assumptions are complimentary to each other and an analysis of which GO terms can be more prone to be correctly predicted with each assumption, is provided.Postprint (published version

    Deep Generative Models for Reject Inference in Credit Scoring

    Get PDF
    Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring
    • …
    corecore