3,289 research outputs found
Cross-language Text Classification with Convolutional Neural Networks From Scratch
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach
Weak signal identification with semantic web mining
We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization's environment and that they appear in different contexts. We use internet information to represent organization's environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Deep Learning for Technical Document Classification
In large technology companies, the requirements for managing and organizing
technical documents created by engineers and managers have increased
dramatically in recent years, which has led to a higher demand for more
scalable, accurate, and automated document classification. Prior studies have
only focused on processing text for classification, whereas technical documents
often contain multimodal information. To leverage multimodal information for
document classification to improve the model performance, this paper presents a
novel multimodal deep learning architecture, TechDoc, which utilizes three
types of information, including natural language texts and descriptive images
within documents and the associations among the documents. The architecture
synthesizes the convolutional neural network, recurrent neural network, and
graph neural network through an integrated training process. We applied the
architecture to a large multimodal technical document database and trained the
model for classifying documents based on the hierarchical International Patent
Classification system. Our results show that TechDoc presents a greater
classification accuracy than the unimodal methods and other state-of-the-art
benchmarks. The trained model can potentially be scaled to millions of
real-world multimodal technical documents, which is useful for data and
knowledge management in large technology companies and organizations.Comment: 16 pages, 8 figures, 9 table
Patent Classification Using Ontology-Based Patent Network Analysis
Patent management is increasingly important for organizations to sustain their competitive advantage. The classification of patents is essential for patent management and industrial analysis. In this study, we propose a novel patent network-based classification method to analyze query patents and predict their classes. The proposed patent network, which contains various types of nodes that represent different features extracted from patent documents, is constructed based on the relationship metrics derived from patent metadata. The novel approach analyzes reachable nodes in the patent ontology network to calculate their relevance to query patents, after which it uses the modified k-nearest neighbor classifier to classify query patents. We evaluate the performance of the proposed approach on a test dataset of patent documents obtained from the United States Patent and Trademark Office (USPTO), and compare it with the performance of the three conventional methods. The results demonstrate that the proposed patent network-based approach outperforms the conventional approaches
- …