2,787 research outputs found

    Introduction to the CoNLL-2000 Shared Task: Chunking

    Full text link
    We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.Comment: 6 page

    A Survey on Potential of the Support Vector Machines in Solving Classification and Regression Problems

    Get PDF
    Kernel methods and support vector machines have become the most popular learning from examples paradigms. Several areas of application research make use of SVM approaches as for instance hand written character recognition, text categorization, face detection, pharmaceutical data analysis and drug design. Also, adapted SVM’s have been proposed for time series forecasting and in computational neuroscience as a tool for detection of symmetry when eye movement is connected with attention and visual perception. The aim of the paper is to investigate the potential of SVM’s in solving classification and regression tasks as well as to analyze the computational complexity corresponding to different methodologies aiming to solve a series of afferent arising sub-problems.Support Vector Machines, Kernel-Based Methods, Supervised Learning, Regression, Classification

    Distributed Support Vector Machine Learning

    Get PDF
    Support Vector Machines (SVMs) are used for a growing number of applications. A fundamental constraint on SVM learning is the management of the training set. This is because the order of computations goes as the square of the size of the training set. Typically, training sets of 1000 (500 positives and 500 negatives, for example) can be managed on a PC without hard-drive thrashing. Training sets of 10,000 however, simply cannot be managed with PC-based resources. For this reason most SVM implementations must contend with some kind of chunking process to train parts of the data at a time (10 chunks of 1000, for example, to learn the 10,000). Sequential and multi-threaded chunking methods provide a way to run the SVM on large datasets while retaining accuracy. The multi-threaded distributed SVM described in this thesis is implemented using Java RMI, and has been developed to run on a network of multi-core/multi-processor computers

    Memory-Based Shallow Parsing

    Full text link
    We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement
    corecore