470 research outputs found
Creating classification models from textual descriptions of companies using crunchbase
This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc.info:eu-repo/semantics/publishedVersio
A Novel Progressive Multi-label Classifier for Classincremental Data
In this paper, a progressive learning algorithm for multi-label
classification to learn new labels while retaining the knowledge of previous
labels is designed. New output neurons corresponding to new labels are added
and the neural network connections and parameters are automatically
restructured as if the label has been introduced from the beginning. This work
is the first of the kind in multi-label classifier for class-incremental
learning. It is useful for real-world applications such as robotics where
streaming data are available and the number of labels is often unknown. Based
on the Extreme Learning Machine framework, a novel universal classifier with
plug and play capabilities for progressive multi-label classification is
developed. Experimental results on various benchmark synthetic and real
datasets validate the efficiency and effectiveness of our proposed algorithm.Comment: 5 pages, 3 figures, 4 table
A Multilabel Approach for Fault Detection and Classification of Transmission Lines using Binary Relevance
In Contemporary automation systems, Fault detection and classification of electrical transmission lines in grid systems are given top priority. The broad application of Machine Learning (ML) methods has enabled the substitute of conventional methods of fault identification and classification. These methods are more effective ones that can identify faults early on using a significant quantity of sensory data. So detecting simultaneous failures is difficult in the context of distracting the noise and several faults in the transmission lines. This study contributes by offering a unique way for concurrently detecting and classifying several faults using a multilabel classification approach based on binary relevance classifiers. The proposed binary relevance multilabel detection and classification models’ performances are examined. Under both ideal and problematic circumstances, faults in the dataset are collected. A variety of multilabel fault types detection and classification determines the suggested method’s effectiveness
- …