7,718 research outputs found

    Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

    Get PDF
    In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

    Retrieval Models for Genre Classification

    Get PDF
    Genre provides a characterization of a document with respect to its form or functional trait. Genre is orthogonal to topic, rendering genre information a powerful filter technology for information seekers in digital libraries. However, an efficient means for genre classification is an open and controversially discussed issue. This paper gives an overview and presents new results related to automatic genre classification of text documents. We present a comprehensive survey which contrasts the genre retrieval models that have been developed for Web and non-Web corpora. With the concept of genre-specific core vocabularies the paper provides an original contribution related to computational aspects and classification performance of genre retrieval models: we show how such vocabularies are acquired automatically and introduce new concentration measures that quantify the vocabulary distribution in a sensible way. Based on these findings we construct lightweight genre retrieval models and evaluate their discriminative power and computational efficiency. The presented concepts go beyond the existing utilization of vocabulary-centered, genre-revealing features and open new possibilities for the construction of genre classifiers that operate in real-time

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Trends and concerns in digital cartography

    Get PDF
    CISRG discussion paper ;

    SOME APPROACHES TO TEXT MINING AND THEIR POTENTIAL FOR SEMANTIC WEB APPLICATIONS

    Get PDF
    In this paper we describe some approaches to text mining, which are supported by an original software system developed in Java for support of information retrieval and text mining (JBowl), as well as its possible use in a distributed environment. The system JBowl1 is being developed as an open source software with the intention to provide an easily extensible, modular framework for pre-processing, indexing and further exploration of large text collections. The overall architecture of the system is described, followed by some typical use case scenarios, which have been used in some previous projects. Then, basic principles and technologies used for service-oriented computing, web services and semantic web services are presented. We further discuss how the JBowl system can be adopted into a distributed environment via technologies available already and what benefits can bring such an adaptation. This is in particular important in the context of a new integrated EU-funded project KP-Lab2 (Knowledge Practices Laboratory) that is briefly presented as well as the role of the proposed text mining services, which are currently being designed and developed there

    Re-classification: some warnings and a proposal

    Get PDF
    • 

    corecore