Search CORE

5 research outputs found

Dagstuhl Annual Report January - December 2011

Author: Wilhelm Reinhard
Publication venue: Dagstuhl Publications. Dagstuhl Activity Reports
Publication date: 01/01/2011
Field of study

The International Conference and Research Center for Computer Science is a non-profit organization. Its objective is to promote world-class research in computer science and to host research seminars which enable new ideas to be showcased, problems to be discussed and the course to be set for future development in this field. The work being done to run this informatics center is documented in this report for the business year 2011

Dagstuhl Research Online Publication Server

GATE Teamware: a web-based, collaborative text annotation framework

Author: Angus Roberts
C. Müller
D. Ferrucci
Genevieve Gorrell
Hamish Cunningham
Ian Roberts
J. Carletta
J. Wiebe
Kalina Bontcheva
N. Ide
Niraj Aswani
Valentin Tablan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Novel and Domain-Specific Document Clustering and Topic Aggregation Toolset for a News Organisation

Author: McMahon Claire
Publication venue: Dublin Institute of Technology
Publication date: 30/09/2015
Field of study

Large collections of documents are becoming increasingly common in the news gathering industry. A review of the literature shows there is a growing interest in datadriven journalism and specifically that the journalism profession needs better tools to understand and develop actionable knowledge from large document sets. On a daily basis, journalists are tasked with searching a diverse range of document sets including news gathering services, emails, freedom of information requests, court records, government reports, press releases and many other types of generally unstructured documents. Document clustering techniques can help address problems of understanding the ever expanding quantities of documents available to journalists by finding patterns within documents. These patterns can be used to develop useful and actionable knowledge which can contribute to journalism. News articles in particular are fertile ground for document clustering principles. Term weighting schemes assign importance to terms within a document and are central to the study of document clustering methods. This study contributes a review of the dominant and most commonly used term frequency weighting functions put forward in research, establishes the merits and limitations of each approach, and proposes modifications to develop a news-centric document clustering and topic aggregation approach. Experimentation was conducted on a large unstructured collection of newspaper articles from the Irish Times to establish if the newly proposed news-centric term weighting and document similarity approach improves document clustering accuracy and topic aggregation capabilities for news articles when compared to the traditional term weighting approach. Whilst the experimentation shows that that the developed approach is promising when compared to the manual document clustering effort undertaken by the three journalist expert users, it also highlights the challenges of natural language processing and document clustering methods in general. The results may suggest that a blended approach of complimenting automated methods with human-level supervision and guidance may yield the best results

Arrow@TUDublin

Challenges in Document Mining (Dagstuhl Seminar 11171)

Author: Cunningham Hamish
Fuhr Norbert
Stein Benno M.
Publication venue: Dagstuhl Reports. Dagstuhl Reports, Volume 1, Issue 4
Publication date: 01/01/2011
Field of study

This report documents the programme and outcomes of the Dagstuhl Seminar 11171 "Challenges in Document Mining". Our starting point was the observation that document mining techniques are often applied in an isolated manner, with the consequence that their potential is still to be fully realised. The goal of the seminar was to analyze this untapped potential. To this end researchers from the main areas of document mining were invited to present their views, to synthesise an understanding of where and how the latest disciplinary achievements can be combined, and to develop a more integrative view on the state of the art and the prospects for future progress

Dagstuhl Research Online Publication Server

Digital Object Identifier 10.4230/DagRep.1.4.65 Edited in cooperation with Melikka Khosh Niat 1 Executive Summary

Author: Benno Stein
Benno Stein
Benno Stein
Hamish Cunningham
Hamish Cunningham
Hamish Cunningham
Norbert Fuhr
Norbert Fuhr
Norbert Fuhr
Publication venue
Publication date
Field of study

This report documents the programme and outcomes of the Dagstuhl Seminar 11171 Challenges in Document Mining. Our starting point was the observation that document mining techniques are often applied in an isolated manner, with the consequence that their potential is still to be fully realised. The goal of the seminar was to analyze this untapped potential. To this end researchers from the main areas of document mining were invited to present their views, to synthesise an understanding of where and how the latest disciplinary achievements can be combined, and to develop a more integrative view on the state of the art and the prospects for future progress

CiteSeerX