57 research outputs found

    Prediction of lung tumor types based on protein attributes by machine learning algorithms

    Full text link

    Robustness and Regularization of Support Vector Machines

    Full text link
    We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well

    A New Credit Scoring Model For Vehicle Leasing Company

    Get PDF
    Usaha kecil dan menengah menjadi salah satu bisnis yang terdampak akibat penyebaran virus corona. Situasi pandemi di Indonesia menyebabkan penderitaan besar pada perusahaan-perusahaan ini. Untuk mencegah kerugian di masa pandemi saat ini. PT XYZ memutuskan untuk membuat model penilaian kredit untuk memprediksi risiko dari calon pelanggan mereka. Model akan terdiri dari dua jenis. Yang pertama adalah penilaian atau kartu skor sistem pakar. Data yang diperoleh dari sistem pakar nantinya akan dimasukkan ke dalam machine learning menggunakan metode statistik untuk mendapatkan model credit scoring. Kerangka kerja CRISP-DM akan digunakan untuk memandu proses pembuatan untuk memastikan keluaran model yang andal

    Ontology-aided business document classification

    Get PDF

    An Analysis of Predicting Job Titles Using Job Descriptions

    Get PDF
    A job title is an all-encompassing very short form description that conveys all of the pertinent information relating to a job. The job title typically encapsulates - and should encapsulate - the domain, role and level of responsibility of any given job. Significant value is attached to job titles both internally within organisational structures and to individual job holders. Organisations map out all employees in an organogram on the basis of job titles. This has a bearing on issues like salary, level and scale of responsibility, employee selection and so on. Employees draw value from their own job titles as a means of self-identity and this can have a significant impact on their engagement and motivation. Classification of job titles based upon the details of the job is a subjective human resources exercise, however, which risks bias and inconsistency. I am instead proposing that the job title classification process can be performed in a systematic, algorithmic- based process with the application of standard Natural Language Processing (NLP) together with supervised machine learning. In this paper, data (job descriptions) labelled with Job Titles was collected from a popular national job postings website (www.irishjobs.ie). The data went through several standard text-pre-processing transformations which are detailed below, in or- der to reduce dimensionality of the corpus of data. Feature engineering was used to create a Data Model(s) of selected keyword sets characteristic to each Job Title gen- erated on the basis of term frequency. The models developed with the Random Forest and Support Vector Machines supervised learning algorithms were used to generate prediction models to make predictions based on the Top 30 most frequently occurring Job Titles. The most successful model was the SVM linear kernel based model, which had an Accuracy rate of 71%, Macro Average Precision of 70%, Macro Averaged Recall of 67% and a Macro Average F-Score of 66%. The Random Forest Model performed less well; with a Accuracy rate of 58%, Macro Average Precision of 56%, Macro Average Recall of 55% and Macro Average FScore of 56%. The data model described here and the prediction performance obtained indicate that several particularities of the problem its high dimensionality and the complexity of feature engineering required to generate a data model with the correct keywords for each job lead to data models that cannot provide an optimal performance even when using powerful Machine Learning (ML) algorithms. The data model design can be improved using a wider data set (completed from job descriptions collected from a variety of websites) thus optimising the set of keywords describing each job title. More complex and computationally expensive algorithms - based on deep learning - may also provide more refined and more accurate predictive models. No research was found during this study which examined the subject matter of classification of job titles using machine learning specifically. However, other relevant literature was reviewed on text classification via supervised learning which was useful in designing the models and applied to this domain. While supervised ML techniques are commonly applied to text classification includ- ing sentiment analysis, there was no similar study described in the literature approach- ing the link between job titles and the corresponding required skills. Nevertheless, the work presented here describes a valid and practical approach to answering the pro- posed research question within the constraints of a limited data model and basic ML algorithms. Such an approach may prove a working base for designing future models for artificial intelligence applications

    AN EXPERIMENTAL STUDY OF FACE RECOGNITION METHOD

    Get PDF
    The increased use of face recognition techniques leads to the development of improved methods with higher accuracy and efficiency. Currently, there are various face recognition techniques based on different algorithm. In this study, a new method of face recognition is proposed based on the idea of wavelet operators for creating spectral graph wavelet transformation. The proposed idea relies on the spectral graph wavelet kernel procedure. In this proposed method, feature extraction is based on transformation into SGWT by means of spatial domain. For recognition purpose, the feature vectors are used for computation of selected training samples which makes the classification. The decomposition of face image is done using the SGWT. The system identifies the test image by calculating the Euclidean distance. Finally, the study conducted an experiment using the ORL face database. The result states that the recognition accuracy is higher in the proposed system which can be further improved using the number of training images. Overall, the result shows that the proposed method has good performance in terms of accuracy of the face recognitio

    Tea Leaf Pest Detection Using Support Vector Machine (SVM) METHOD IN PTPN IV Unit Bah Butong

    Get PDF
    Indonesia is one of the largest tea-producing countries in the world. The Ministry of Trade recorded the value of tea exports in 2017 of 1,826.8 million US dollars. So that the quality and quality of the tea produced must be considered, starting from planting tea plants, picking tea leaves, to processing tea leaves into ready-to-eat tea. So far, farmers are only picking tea leaves based on the time they are picked from the planting block. If the time to pick the block has arrived, then the block will be picked as a whole. Weather is one of the factors that affect the uncertainty of picking times. This study identified pests on tea leaves using digital image processing. The first stage starts with image acquisition and preprocessing. From the results, the statistical characteristics of each image are taken. The data resulting from the training image are stored in a database. The training image data will be used as a reference for identifying types of pests using a Support Vector Machin
    corecore