81,326 research outputs found

    Morphological Analysis as Classification: an Inductive-Learning Approach

    Full text link
    Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    A bagging SVM to learn from positive and unlabeled examples

    Full text link
    We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications such as information retrieval or gene ranking, when we have identified a set of data of interest sharing a particular property, and we wish to automatically retrieve additional data sharing the same property among a large and easily available pool of unlabeled data. We propose a conceptually simple method, akin to bagging, to approach both inductive and transductive PU learning problems, by converting them into series of supervised binary classification problems discriminating the known positive examples from random subsamples of the unlabeled set. We empirically demonstrate the relevance of the method on simulated and real data, where it performs at least as well as existing methods while being faster

    Adversarial Attack and Defense on Graph Data: A Survey

    Full text link
    Deep neural networks (DNNs) have been widely applied to various applications including image classification, text generation, audio recognition, and graph data analysis. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. Though there are several works studying adversarial attack and defense strategies on domains such as images and natural language processing, it is still difficult to directly transfer the learned knowledge to graph structure data due to its representation challenges. Given the importance of graph analysis, an increasing number of works start to analyze the robustness of machine learning models on graph data. Nevertheless, current studies considering adversarial behaviors on graph data usually focus on specific types of attacks with certain assumptions. In addition, each work proposes its own mathematical formulation which makes the comparison among different methods difficult. Therefore, in this paper, we aim to survey existing adversarial learning strategies on graph data and first provide a unified formulation for adversarial learning on graph data which covers most adversarial learning studies on graph. Moreover, we also compare different attacks and defenses on graph data and discuss their corresponding contributions and limitations. In this work, we systemically organize the considered works based on the features of each topic. This survey not only serves as a reference for the research community, but also brings a clear image researchers outside this research domain. Besides, we also create an online resource and keep updating the relevant papers during the last two years. More details of the comparisons of various studies based on this survey are open-sourced at https://github.com/YingtongDou/graph-adversarial-learning-literature.Comment: In submission to Journal. For more open-source and up-to-date information, please check our Github repository: https://github.com/YingtongDou/graph-adversarial-learning-literatur

    Optimizing E-Commerce Product Classification Using Transfer Learning

    Get PDF
    The global e-commerce market is snowballing at a rate of 23% per year. In 2017, retail e-commerce users were 1.66 billion and sales worldwide amounted to 2.3 trillion US dollars, and e-retail revenues are projected to grow to 4.88 trillion USD in 2021. With the immense popularity that e-commerce has gained over past few years comes the responsibility to deliver relevant results to provide rich user experience. In order to do this, it is essential that the products on the ecommerce website be organized correctly into their respective categories. Misclassification of products leads to irrelevant results for users which not just reflects badly on the website, it could also lead to lost customers. With ecommerce sites nowadays providing their portal as a platform for third party merchants to sell their products as well, maintaining a consistency in product categorization becomes difficult. Therefore, automating this process could be of great utilization. This task of automation done on the basis of text could lead to discrepancies since the website itself, its various merchants, and users, all could use different terminologies for a product and its category. Thus, using images becomes a plausible solution for this problem. Dealing with images can best be done using deep learning in the form of convolutional neural networks. This is a computationally expensive task, and in order to keep the accuracy of a traditional convolutional neural network while reducing the hours it takes for the model to train, this project aims at using a technique called transfer learning. Transfer learning refers to sharing the knowledge gained from one task for another where new model does not need to be trained from scratch in order to reduce the time it takes for training. This project aims at using various product images belonging to five categories from an ecommerce platform and developing an algorithm that can accurately classify products in their respective categories while taking as less time as possible. The goal is to first test the performance of transfer learning against traditional convolutional networks. Then the next step is to apply transfer learning to the downloaded dataset and assess its performance on the accuracy and time taken to classify test data that the model has never seen before

    A Comparative Study of the Application of Different Learning Techniques to Natural Language Interfaces

    Full text link
    In this paper we present first results from a comparative study. Its aim is to test the feasibility of different inductive learning techniques to perform the automatic acquisition of linguistic knowledge within a natural language database interface. In our interface architecture the machine learning module replaces an elaborate semantic analysis component. The learning module learns the correct mapping of a user's input to the corresponding database command based on a collection of past input data. We use an existing interface to a production planning and control system as evaluation and compare the results achieved by different instance-based and model-based learning algorithms.Comment: 10 pages, to appear CoNLL9
    corecore