34 research outputs found

    Reading News Data

    Get PDF
    Information overflow is both inspiring and depressing. Inspiring is more or less the easy access to various information and communication resources, which in turn facilitate the exchange of ideas, knowledge and creativity. However, the impossible fathomability of infinite information space creates a feeling of depression and anxiety. Thanks to digital technology, the speed at which information is generated and distributed significantly exceeds the speed at which it can be perceived. The more the hypertext ocean of the web is filled with content, the more impossible it is to be ‘tamed’ by human senses. In the ambivalent nature of information super-abundance, technological optimism and technological pessimism constantly compete. In fact, the two perspectives on the role of technology in the world of people have always been in conflict, but are now strongly intensified with the evolution and spread of the internet.1 This text looks at a conditional technological optimism, aiming not at postulating utopian aspirations, but at illustrating how scientific and technical elites do not lose their desire to overcome depressing complexity by seeking bold optimization solutions. The focus of this paper is on an innovative technological system for processing online news, namely the publicly available Europe Media Monitor (EMM).2 The interest in this monitoring tool is multifaceted. In most general terms it is interesting to trace the technological solutions with which computer science specialists are trying to discipline the information flow. However, EMM is also interesting as a powerful tool for understanding reality on the basis of statistically processed news databases. This study provides examples of how EMM-enabled media content processing options may be used as a basis for further detailed analysis. Of importance are also the social and institutional intentions behind the development of such a system: what are the motives and the uses associated with such an intersection between news and intelligent software

    Interim research assessment 2003-2005 - Computer Science

    Get PDF
    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2). 29 November 2012, Lisbon, Portugal

    Get PDF
    Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), held in Lisbon, Portugal on 29 November 2012

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Semi-Supervised Learning For Identifying Opinions In Web Content

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented
    corecore