Search CORE

11,461 research outputs found

Weakly-Supervised Neural Text Classification

Author: Han Jiawei
Meng Yu
Shen Jiaming
Zhang Chao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/09/2018
Field of study

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.Comment: CIKM 2018 Full Pape

arXiv.org e-Print Archive

Crossref

Context and Keyword Extraction in Plain Text Using a Graph Representation

Author: Chahine Carlo Abi
Chaignaud Nathalie
Kotowicz Jean-Philippe
Pécuchet Jean-Pierre
Publication venue
Publication date: 30/11/2008
Field of study

Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

A Robust Deep Model for Improved Categorization of Legal Documents for Predictive Analytics

Author: Mohan Divya
Nair Latha Ravindran
Publication venue: Auricle Global Society of Education and Research
Publication date: 11/03/2023
Field of study

Predictive legal analytics is a technology used to predict the chances of successful and unsuccessful outcomes in a particular case. Predictive legal analytics is performed through automated document classification for facilitating legal experts in their classification of court documents to retrieve and understand the details of specific legal factors from legal judgments for accurate document analysis. However, extracting these factors from legal texts document is a time-consuming process. In order to facilitate the task of classifying documents, a robust method namely Distributed Stochastic Keyword Extraction based Ensemble Theil-Sen Regressive Deep Belief Reweight Boost Classification (DSKE-TRDBRBC) is proposed. The DSKE-TRDBRBC technique consists of two major processes namely Keyword Extraction and Classification. At first, the t-distributed stochastic neighbor embedding technique is applied to DSKE-TRDBRBC for keyword extraction. This in turn minimizes the time consumption for document classification. After that, the Ensemble Theil-Sen Regressive Deep Belief Reweight Boosting technique is applied for document classification. The Ensemble boosting algorithm initially constructs’ set of Theil-Sen Regressive Deep Belief neural networks to classify the input legal documents. Then the results of the Deep Belief neural network are combined to built a strong classifier by reducing the error. This aids in improving the classification accuracy. The proposed method is experimentally evaluated with various metrics such as F-measure , recall, accuracy, precision, , and computational time. The experimental results quantitatively confirm that the proposed DSKE-TRDBRBC technique achieves better accuracy with lowest computation time as compared to the conventional approaches

International Journal on Recent and Innovation Trends in Computing and Communication

ServeNet: A Deep Neural Network for Web Services Classification

Author: Grolinger Katarina
Li Zhi
Liao Zhifang
Liu Peng
Qamar Nafees
Wang Weiru
Yang Yilong
Publication venue
Publication date: 06/08/2020
Field of study

Automated service classification plays a crucial role in service discovery, selection, and composition. Machine learning has been widely used for service classification in recent years. However, the performance of conventional machine learning methods highly depends on the quality of manual feature engineering. In this paper, we present a novel deep neural network to automatically abstract low-level representation of both service name and service description to high-level merged features without feature engineering and the length limitation, and then predict service classification on 50 service categories. To demonstrate the effectiveness of our approach, we conduct a comprehensive experimental study by comparing 10 machine learning methods on 10,000 real-world web services. The result shows that the proposed deep neural network can achieve higher accuracy in classification and more robust than other machine learning methods.Comment: Accepted by ICWS'2

arXiv.org e-Print Archive

Crossref