23,903 research outputs found

    Automatic Content-Based Image Categorization

    Get PDF
    Tato práce se zabývá problematikou klasifikace fotografií podle obsahu. Hlavním cílem práce je implementace aplikace, která je schopná tuto kategorizaci provádět. Řešení se sestává z variabilního systému využívajícího extrakce lokálních příznaků v obraze a vytvoření vizuálního slovníku metodou k-means. Aplikace využívá Bag of Words reprezentace jako globální funkce pro popis každé fotografe. Poslední složkou tohoto systému je klasifikace prováděná na základě Support Vector Machines. V poslední kapitole jsou představeny výsledky experimentování s tímto systémem.This thesis deals with automatic content-based image classification. The main goal of this work is implementation of application which is able to perform this task automatically. The solution consists of variable system using local image features extraction and visual vocabulary built by k-means method. Bag Of Words representation is used as a global feature describing each image. Support Vector Machines - the final component of this system - perform the classification based on this representation. In the last chapter, the results of this experimental system are presented.

    Designing Semantic Kernels as Implicit Superconcept Expansions

    Get PDF
    Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel

    Well-Known brands recognition by automated classifiers using local and global features

    Get PDF
    From color and type to patterns and illustrations, brands sense to be recognizable and convey their values and personality. Here patterns and color are key elements, as they can play a vital role in brand recognition. The images used for brand classification were handpicked and collectively named as HKDataset. We have explored various feature extractors used for classification and used automated classifiers named Linear SVM to achieve higher accuracy while tuning the model parameters to achieve optimal performance. It has been observed that Support Vector Machines performs better when using GIST descriptors combined with Bag of SIFT features. We hope to apply deep learning and other sophisticated classifiers to much-expanded categories of brands in the future

    Kernels for Protein Homology Detection

    Get PDF
    Determining protein sequence similarity is an important task for protein classification and homology detection, which is typically performed using sequence alignment algorithms. Fast and accurate alignment-free kernel based classifiers exist, that treat protein sequences as a “bag of words”. Kernels implicitly map the sequences to a high dimensional feature space, and can be thought of as an inner product between two vectors in that space. This allows an algorithm that can be expressed purely in terms of inner products to be ‘kernelised’, where the algorithm implicitly operates in the kernel’s feature space. A weighted string kernel, where the weighting is derived using probabilistic methods, is implemented using a binary data representation, and the results reported. Alternative forms of data representation, such as Ising and frequency forms, are implemented and the results discussed. These results are then used to inform the development of a variety of novel kernels for protein sequence comparison. Alternative forms of classifier are investigated, such as nearest neighbour, support vector machines, and multiple kernel learning. A kernelized Gaussian classifier is derived and tested, which is informative as it returns a score related to the probability of a sequence belonging to a particular classification. Support vector machines are tested with the introduced kernels, and the results compared to alternate classifiers. As similarity can be thought of as having different components, such as composition and position, multiple kernel learning is investigated with the novel kernels developed here. The results show that a support vector machine, using either single or multiple kernels, is the best classifier for remote protein homology detection out of all the classifiers tested in this thesis.EPSR

    Sentiment analysis and classification of Indian farmers’ protest using twitter data

    Get PDF
    Protests are an integral part of democracy and an important source for citizens to convey their demands and/or dissatisfaction to the government. As citizens become more aware of their rights, there has been an increasing number of protests all over the world for various reasons. With the advancement of technology, there has also been an exponential rise in the use of social media to exchange information and ideas. In this research, we gathered data from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level. We used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest. We conducted our analysis using Bag of Words and TF-IDF and discovered that Bag of Words performed better than TF-IDF. In addition, we also used Naive Bayes, Decision Trees, Random Forests, and Support Vector Machines and also discovered that Random Forest had the highest classification accuracy

    Using the fisher vector approach for cold identification

    Get PDF
    In this paper, we present a computational paralinguistic method for assessing whether a person has an upper respiratory tract infection (i.e. cold) using their speech. Having a system that can accurately assess a cold can be helpful for predicting its propagation. For this purpose, we utilize Mel-frequency Cepstral Coefficients (MFCC) as audio-signal representations, extracted from the utterances, which allowed us to fit a generative Gaussian Mixture Model (GMM) that serves to produce an encoding based on the Fisher Vector (FV) approach. Here, we use the URTIC dataset provided by the organizers of the ComParE Challenge 2017 of the Interspeech Conference. The classification is done by a linear kernel Support Vector Machines (SVM); owing to the high imbalance of classes on the training dataset, we opt for undersampling the majority class, that is, to reduce the number of samples to those of the minority class. We find that applying Power Normalization (PN) and Principal Component Analysis (PCA) on the Fisher vector features is an effective strategy for the classification performance. We get better performance than that of the Bag-of-Audio-Words approach reported in the paper of the challenge

    Vehicle make and model recognition using bag of expressions

    Get PDF
    This article belongs to the Section Intelligent SensorsVehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual words. BoE improves the existing power of a bag of words (BOW) approach, including occlusion handling, scale invariance and view independence. The proposed approach extracts features using a combination of different keypoint detectors and a Histogram of Oriented Gradients (HOG) descriptor. An optimized dictionary of expressions is formed using visual words acquired through k-means clustering. The histogram of expressions is created by computing the occurrences of each expression in the image. For classification, multiclass linear support vector machines (SVM) are trained over the BoE-based features representation. The approach has been evaluated by applying cross-validation tests on the publicly available National Taiwan Ocean University-Make and Model Recognition (NTOU-MMR) dataset, and experimental results show that it outperforms recent approaches for VMMR. With multiclass linear SVM classification, promising average accuracy and processing speed are obtained using a combination of keypoint detectors with HOG-based BoE description, making it applicable to real-time VMMR systems.Muhammad Haroon Yousaf received funding from the Higher Education Commission, Pakistan for Swarm Robotics Lab under the National Centre for Robotics and Automation (NCRA). The authors also acknowledge support from the Directorate of ASR& TD, University of Engineering and Technology Taxila, Pakistan

    Machine learning approach to thermite weld defects detection and classification.

    Get PDF
    Masters Degree. University of KwaZulu- Natal, Durban.The defects formed during the thermite welding process between two sections of rails require the welded joints to be inspected for quality purpose. The commonly used non-destructive method for inspection is Radiography testing. However, the detection and classification of various defects from the generated radiography imagesremains a costly, lengthy and subjective process as it is purely conducted manually by trained experts. It has been shown that most rail breaks occur due to a crack that initiated from the weld joint defect that was not detected. To meet the requirements of the modern technologies, the development of an automated detection and classification model is significantly demanded by the railway industry. This work presents a method based on image processing and machine learning techniques to automatically detect and classify welding defects. Radiography images are first enhanced using the Contrast Limited Adaptive Histogram Equalisation method; thereafter, the Chan-Vese Active Contour Model is applied to the enhanced images to segment and extract the weld joint as the Region of Interest from the image background. A comparative investigation between the Local Binary Patterns descriptor and the Bag of Visual Words approach with Speeded Up Robust Features descriptor was carried out for extracting features in the weld joint images. The effectiveness of the aforementioned feature extractors was evaluated using the Support Vector Machines, K-Nearest Neighbours and Naive Bayes classifiers. This study’s experimental results showed that the Bag of Visual Words approach when used with the Support Vector Machines classifier, achieves the best overall classification accuracy of 94.66%. The proposed method can be expanded in other industries where Radiography testing is used as the inspection tool
    corecore