1,700 research outputs found
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
ServeNet: A Deep Neural Network for Web Services Classification
Automated service classification plays a crucial role in service discovery,
selection, and composition. Machine learning has been widely used for service
classification in recent years. However, the performance of conventional
machine learning methods highly depends on the quality of manual feature
engineering. In this paper, we present a novel deep neural network to
automatically abstract low-level representation of both service name and
service description to high-level merged features without feature engineering
and the length limitation, and then predict service classification on 50
service categories. To demonstrate the effectiveness of our approach, we
conduct a comprehensive experimental study by comparing 10 machine learning
methods on 10,000 real-world web services. The result shows that the proposed
deep neural network can achieve higher accuracy in classification and more
robust than other machine learning methods.Comment: Accepted by ICWS'2
What Works Better? A Study of Classifying Requirements
Classifying requirements into functional requirements (FR) and non-functional
ones (NFR) is an important task in requirements engineering. However, automated
classification of requirements written in natural language is not
straightforward, due to the variability of natural language and the absence of
a controlled vocabulary. This paper investigates how automated classification
of requirements into FR and NFR can be improved and how well several machine
learning approaches work in this context. We contribute an approach for
preprocessing requirements that standardizes and normalizes requirements before
applying classification algorithms. Further, we report on how well several
existing machine learning methods perform for automated classification of NFRs
into sub-categories such as usability, availability, or performance. Our study
is performed on 625 requirements provided by the OpenScience tera-PROMISE
repository. We found that our preprocessing improved the performance of an
existing classification method. We further found significant differences in the
performance of approaches such as Latent Dirichlet Allocation, Biterm Topic
Modeling, or Naive Bayes for the sub-classification of NFRs.Comment: 7 pages, the 25th IEEE International Conference on Requirements
Engineering (RE'17
A Review on Web Page Classification
With the increase in digital documents on the world wide web and an increase in the number of webpages and blogs which are common sources for providing users with news about current events, aggregating and categorizing information from these sources seems to be a daunting task as the volume of digital documents available online is growing exponentially. Although several benefits can accrue from the accurate classification of such documents into their respective categories such as providing tools that help people to find, filter and analyze digital information on the web amongst others. Accurate classification of these documents into their respective categories is dependent on the quality of training dataset which is dependent on the preprocessing techniques. Existing literature in this area of web page classification identified that better document representation techniques would reduce the training and testing time, improve the classification accuracy, precision and recall of classifier. In this paper, we give an overview of web page classification with an in-depth study of the web classification process, while at the same time making awareness of the need for an adequate document representation technique as this helps capture the semantics of document and-also contribute to reduce the problem of high dimensionality
Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection
The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field
- …