864 research outputs found

    Part of Speech Based Term Weighting for Information Retrieval

    Full text link
    Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline

    Which User Interaction for Cross-Language Information Retrieval? Design Issues and Reflections

    Get PDF
    A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. This paper presents three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for rare languages, and shows how the user interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focussed on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users

    Which User Interaction for Cross-Language Information Retrieval? Design Issues and Reflections

    Get PDF
    A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. This paper presents three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for rare languages, and shows how the user interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focussed on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users

    LifeLogging: personal big data

    Get PDF
    We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self

    Knowledge representation and text mining in biomedical, healthcare, and political domains

    Get PDF
    Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains. Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models. Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media

    Personalisation and recommender systems in digital libraries

    Get PDF
    Widespread use of the Internet has resulted in digital libraries that are increasingly used by diverse communities of users for diverse purposes and in which sharing and collaboration have become important social elements. As such libraries become commonplace, as their contents and services become more varied, and as their patrons become more experienced with computer technology, users will expect more sophisticated services from these libraries. A simple search function, normally an integral part of any digital library, increasingly leads to user frustration as user needs become more complex and as the volume of managed information increases. Proactive digital libraries, where the library evolves from being passive and untailored, are seen as offering great potential for addressing and overcoming these issues and include techniques such as personalisation and recommender systems. In this paper, following on from the DELOS/NSF Working Group on Personalisation and Recommender Systems for Digital Libraries, which met and reported during 2003, we present some background material on the scope of personalisation and recommender systems in digital libraries. We then outline the working group’s vision for the evolution of digital libraries and the role that personalisation and recommender systems will play, and we present a series of research challenges and specific recommendations and research priorities for the field

    Report on the Finnish Language

    Get PDF
    Language-centric AI is already ubiquitous and language technology is in its intrinsic core. As was stated in the report The Finnish Language in the Digital Age (Koskenniemi et al., 2012): “If there is adequate language technology available, it will be able to ensure the survival of languages with small populations of speakers.” During the last ten years, digitalisation has changed the way we communicate and interact in the world creating an increasing demand for language-based AI services. New skills are needed to be able to cope in the digital world, so digital education and media awareness are now taught in elementary schools. Digital skills are considered new citizen skills. To provide language-based services to an increasing number of users, we need applications that are built on AI, as well as to provide routine services to special groups and to meet accessibility requirements. The still small number of existing applications and services is partly due to the lack of language resources. Also, the small size of the Finnish market area has affected this when large corporations have primarily focused on English with only some support for Finnish in high-demand products in the Finnish market. In the field of language technology, the Finnish language is still only moderately equipped with products, technologies and resources. There are applications and tools for speech synthesis, speech recognition, information retrieval, spelling correction and grammar checking. There are also a few applications for automatically translating language. The situation has improved during the last 10 years, but still support for automated translation leaves room for ample improvement and the general support for spoken language is modest in industry applications although some recent research results are encouraging

    Which user interaction for cross-language information retrieval? Design issues and reflections

    Get PDF
    A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for low-density languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users
    • 

    corecore