Search CORE

618 research outputs found

Learning Machine Translation

Author: Osborne Miles
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

Crossref

Edinburgh Research Explorer

Estimation of Stochastic Attribute-Value Grammars using an Informative Sample

Author: Osborne Miles
Publication venue
Publication date: 01/01/2000
Field of study

We argue that some of the computational complexity associated with estimation of stochastic attribute-value grammars can be reduced by training upon an informative subset of the full training set. Results using the parsed Wall Street Journal corpus show that in some circumstances, it is possible to obtain better estimation results using an informative sample than when training upon all the available material. Further experimentation demonstrates that with unlexicalised models, a Gaussian Prior can reduce overfitting. However, when models are lexicalised and contain overlapping features, overfitting does not seem to be a problem, and a Gaussian Prior makes minimal difference to performance. Our approach is applicable for situations when there are an infeasibly large number of parses in the training set, or else for when recovery of these parses from a packed representation is itself computationally expensive.Comment: 6 pages, 2 figures. Coling 2000, Saarbr\"{u}cken, Germany. pp 586--59

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Learning unification-based grammars using the Spoken English Corpus

Author: Bridge Derek
Osborne Miles
Publication venue
Publication date: 01/01/1994
Field of study

This paper describes a grammar learning system that combines model-based and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style isolation.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

Scalable distributed event detection for Twitter

Author: Macdonald Craig
McCreadie Richard
Osborne Miles
Ounis Iadh
Petrovic Sara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system

CiteSeerX

Crossref

Enlighten

Can twitter replace newswire for breaking news?

Author: Macdonald Craig
McCreadie Richard
Osborne Miles
Ounis Iadh
Petrovic Sasa
Shrimpton Luke
Publication venue
Publication date: 01/01/2013
Field of study

Twitter is often considered to be a useful source of real-time news, potentially replacing newswire for this purpose. But is this true? In this paper, we examine the extent to which news reporting in newswire and Twitter overlap and whether Twitter often reports news faster than traditional newswire providers. In particular, we analyse 77 days worth of tweet and newswire articles with respect to both manually identified major news events and larger volumes of automatically identified news events. Our results indicate that Twitter reports the same events as newswire providers, in addition to a long tail of minor events ignored by mainstream media. However, contrary to popular belief, neither stream leads the other when dealing with major news events, indicating that the value that Twitter can bring in a news setting comes predominantly from increased event coverage, not timeliness of reporting

Enlighten

Association for the Advancement of Artificial Intelligence: AAAI Publications

Understanding the social dynamics of Twitter, Facebook and Diabetes.co.uk and their value implications for patients and health researchers

Author: Osborne Miles
Pagliari Claudia
Taylor Joanna
Publication venue
Publication date: 19/07/2015
Field of study

Edinburgh Research Explorer

Stream-based Translation Models for Statistical Machine Translation

Author: Callison-Burch Chris
Levenberg Abby
Osborne Miles
Publication venue
Publication date: 01/01/2010
Field of study

Typical statistical machine translation systems are trained with static parallel corpora. Here we account for scenarios with a continuous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from translation agencies, or crowd-sourced translations. We show incorporating recent sentence pairs from the stream improves performance compared with a static baseline. Since frequent batch retraining is computationally demanding we introduce a fast incremental alternative using an online version of the EM algorithm. To bound our memory requirements we use a novel data-structure and associated training regime. When compared to frequent batch retraining, our online time and space-bounded model achieves the same performance with significantly less computational overhead.

CiteSeerX

Edinburgh Research Explorer

Oxford University Research Archive