11,461 research outputs found
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
A Robust Deep Model for Improved Categorization of Legal Documents for Predictive Analytics
Predictive legal analytics is a technology used to predict the chances of successful and unsuccessful outcomes in a particular case. Predictive legal analytics is performed through automated document classification for facilitating legal experts in their classification of court documents to retrieve and understand the details of specific legal factors from legal judgments for accurate document analysis. However, extracting these factors from legal texts document is a time-consuming process. In order to facilitate the task of classifying documents, a robust method namely Distributed Stochastic Keyword Extraction based Ensemble Theil-Sen Regressive Deep Belief Reweight Boost Classification (DSKE-TRDBRBC) is proposed. The DSKE-TRDBRBC technique consists of two major processes namely Keyword Extraction and Classification. At first, the t-distributed stochastic neighbor embedding technique is applied to DSKE-TRDBRBC for keyword extraction. This in turn minimizes the time consumption for document classification. After that, the Ensemble Theil-Sen Regressive Deep Belief Reweight Boosting technique is applied for document classification. The Ensemble boosting algorithm initially constructs’ set of Theil-Sen Regressive Deep Belief neural networks to classify the input legal documents. Then the results of the Deep Belief neural network are combined to built a strong classifier by reducing the error. This aids in improving the classification accuracy. The proposed method is experimentally evaluated with various metrics such as F-measure , recall, accuracy, precision, , and computational time. The experimental results quantitatively confirm that the proposed DSKE-TRDBRBC technique achieves better accuracy with lowest computation time as compared to the conventional approaches
ServeNet: A Deep Neural Network for Web Services Classification
Automated service classification plays a crucial role in service discovery,
selection, and composition. Machine learning has been widely used for service
classification in recent years. However, the performance of conventional
machine learning methods highly depends on the quality of manual feature
engineering. In this paper, we present a novel deep neural network to
automatically abstract low-level representation of both service name and
service description to high-level merged features without feature engineering
and the length limitation, and then predict service classification on 50
service categories. To demonstrate the effectiveness of our approach, we
conduct a comprehensive experimental study by comparing 10 machine learning
methods on 10,000 real-world web services. The result shows that the proposed
deep neural network can achieve higher accuracy in classification and more
robust than other machine learning methods.Comment: Accepted by ICWS'2
- …