Search CORE

52 research outputs found

Learning to simplify sentences with quasi-synchronous grammar and integer programming

Author: Lapata M.
Woodsend K.
Publication venue
Publication date: 01/01/2011
Field of study

Using Interior Point Methods for Large-scale Support Vector Machine training

Author: Woodsend Kristian
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Support Vector Machines (SVMs) are powerful machine learning techniques for classification and regression, but the training stage involves a convex quadratic optimization program that is most often computationally expensive. Traditionally, active-set methods have been used rather than interior point methods, due to the Hessian in the standard dual formulation being completely dense. But as active-set methods are essentially sequential, they may not be adequate for machine learning challenges of the future. Additionally, training time may be limited, or data may grow so large that cluster-computing approaches need to be considered. Interior point methods have the potential to answer these concerns directly. They scale efficiently, they can provide good early approximations, and they are suitable for parallel and multi-core environments. To apply them to SVM training, it is necessary to address directly the most computationally expensive aspect of the algorithm. We therefore present an exact reformulation of the standard linear SVM training optimization problem that exploits separability of terms in the objective. By so doing, per-iteration computational complexity is reduced from O(n3) to O(n). We show how this reformulation can be applied to many machine learning problems in the SVM family. Implementation issues relating to specializing the algorithm are explored through extensive numerical experiments. They show that the performance of our algorithm for large dense or noisy data sets is consistent and highly competitive, and in some cases can out perform all other approaches by a large margin. Unlike active set methods, performance is largely unaffected by noisy data. We also show how, by exploiting the block structure of the augmented system matrix, a hybrid MPI/Open MP implementation of the algorithm enables data and linear algebra computations to be efficiently partitioned amongst parallel processing nodes in a clustered computing environment. The applicability of our technique is extended to nonlinear SVMs by low-rank approximation of the kernel matrix. We develop a heuristic designed to represent clusters using a small number of features. Additionally, an early approximation scheme reduces the number of samples that need to be considered. Both elements improve the computational efficiency of the training phase. Taken as a whole, this thesis shows that with suitable problem formulation and efficient implementation techniques, interior point methods are a viable optimization technology to apply to large-scale SVM training, and are able to provide state-of-the-art performance

Edinburgh Research Archive

Automatic Generation of Story Highlights

Author: Lapata Mirella
Woodsend Kristian
Publication venue
Publication date: 01/01/2010
Field of study

In this paper we present a joint content selection and compression model for single-document summarization. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. Using an integer linear programming formulation, the model learns to select and combine phrases subject to length, coverage and grammar constraints. We evaluate the approach on the task of generating “story highlights”—a small number of brief, self-contained sentences that allow readers to quickly gather information on news stories. Experimental results show that the model’s output is comparable to human-written highlights in terms of both grammaticality and content.

CiteSeerX

Edinburgh Research Explorer

Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training

Author: Gondzio Jacek
Woodsend Kristian
Publication venue: 'American Physical Society (APS)'
Publication date: 01/08/2009
Field of study

Edinburgh Research Explorer

Multiple aspect summarization using integer linear programming

Author: Lapata Mirella
Woodsend Kristian
Publication venue
Publication date: 01/01/2012
Field of study

Multi-document summarization involves many aspects of content selection and sur-face realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and signifi-cantly improves the informativeness of the summaries.

CiteSeerX

Edinburgh Research Explorer

Title Generation with Quasi-Synchronous Grammar

Author: Feng Yansong
Lapata Mirella
Woodsend Kristian
Publication venue
Publication date: 01/01/2010
Field of study

The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

CiteSeerX

Edinburgh Research Explorer

An ontology enhanced parallel SVM for scalable spam filter training

Author: Bauer
Blanco
Blanzieri
Blei
Breiman
Cao
Caruana
Chawla
Colas
Cristianini
Dean
Do
Gansterer
Godwin Caruana
Graf
Hall
Huang
Kearns
Kim
Maozhen Li
Mei
Platt
Suykens
Taura
Vapnik
Wang
Woodsend
Yang Liu
Zanghirati
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

Crossref

Brunel University Research Archive

Automated Identification from Dental Data (AutoIDD):A New Development in Digital Forensics

Author: Brown Nathan L.
Manica Scheila
Mossey Peter A.
Reesu Gowri Vijay
Revie Gavin F.
Woodsend Brenainn
Publication venue: 'Elsevier BV'
Publication date: 01/04/2020
Field of study

University of Dundee Online Publications

Development of intra-oral automated landmark recognition (ALR) for dental and occlusal outcome measurements

Author: Aziz Azad
El-Angbawi Ahmed
Koufoudaki Eirini
Lin Ping
McIntyre Grant
Mossey Peter A.
Reesu Gowri Vijay
Semb Gunvor
Shaw William
Woodsend Brenainn
Publication venue: 'Oxford University Press (OUP)'
Publication date: 18/12/2020
Field of study

BACKGROUND: Previous studies embracing digital technology and automated methods of scoring dental arch relationships have shown that such technology is valid and accurate. To date, however there is no published literature on artificial intelligence and machine learning to completely automate the process of dental landmark recognition. OBJECTIVES: This study aimed to develop and evaluate a fully automated system and software tool for the identification of landmarks on human teeth using geometric computing, image segmenting, and machine learning technology. METHODS: Two hundred and thirty-nine digital models were used in the automated landmark recognition (ALR) validation phase, 161 of which were digital models from cleft palate subjects aged 5 years. These were manually annotated to facilitate qualitative validation. Additionally, landmarks were placed on 20 adult digital models manually by 3 independent observers. The same models were subjected to scoring using the ALR software and the differences (in mm) were calculated. All the teeth from the 239 models were evaluated for correct recognition by the ALR with a breakdown to find which stages of the process caused the errors. RESULTS: The results revealed that 1526 out of 1915 teeth (79.7%) were correctly identified, and the accuracy validation gave 95% confidence intervals for the geometric mean error of [0.285, 0.317] for the humans and [0.269, 0.325] for ALR—a negligible difference. CONCLUSIONS/IMPLICATIONS: It is anticipated that ALR software tool will have applications throughout clinical dentistry and anthropology, and in research will constitute an accurate and objective tool for handling large datasets without the need for time intensive employment of experts to place landmarks manually

PubMed Central

University of Dundee Online Publications