470 research outputs found

    Creating classification models from textual descriptions of companies using crunchbase

    Get PDF
    This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc.info:eu-repo/semantics/publishedVersio

    A Novel Progressive Multi-label Classifier for Classincremental Data

    Full text link
    In this paper, a progressive learning algorithm for multi-label classification to learn new labels while retaining the knowledge of previous labels is designed. New output neurons corresponding to new labels are added and the neural network connections and parameters are automatically restructured as if the label has been introduced from the beginning. This work is the first of the kind in multi-label classifier for class-incremental learning. It is useful for real-world applications such as robotics where streaming data are available and the number of labels is often unknown. Based on the Extreme Learning Machine framework, a novel universal classifier with plug and play capabilities for progressive multi-label classification is developed. Experimental results on various benchmark synthetic and real datasets validate the efficiency and effectiveness of our proposed algorithm.Comment: 5 pages, 3 figures, 4 table

    A Multilabel Approach for Fault Detection and Classification of Transmission Lines using Binary Relevance

    Get PDF
    In Contemporary automation systems, Fault detection and classification of electrical transmission lines in grid systems are given top priority. The broad application of Machine Learning (ML) methods has enabled the substitute of conventional methods of fault identification and classification. These methods are more effective ones that can identify faults early on using a significant quantity of sensory data. So detecting simultaneous failures is difficult in the context of distracting the noise and several faults in the transmission lines. This study contributes by offering a unique way for concurrently detecting and classifying several faults using a multilabel classification approach based on binary relevance classifiers. The proposed binary relevance multilabel detection and classification models’ performances are examined. Under both ideal and problematic circumstances, faults in the dataset are collected. A variety of multilabel fault types detection and classification determines the suggested method’s effectiveness
    • …
    corecore