4,544 research outputs found

    Applying digital content management to support localisation

    Get PDF
    The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM

    Cultural Heritage & Built Environment Scoping Report

    Get PDF
    This report presents the findings of a scoping study that explores engagement between a heritage institution and its local community. The report addresses this topic by considering the opportunities and limitations of urban screens to form new audiences for heritage institutions; specifically through a case study of the BBC Big Screens. Literature suggests that urban screens have the potential to form new types of audiences for heritage institutions yet processes for achieving this are rarely described. This report proposes that understanding these processes may help address issues of measuring engagement associated with urban screens and contribute to assessing the value of urban screens for communities and heritage institutions. Key themes of participation, site and value are explored through a literature review. These themes are then used to structure the analysis and discussion of the case study. Further questions for future study are described

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone

    Mnews: A Study of Multilingual News Search Interfaces

    Get PDF
    With the global expansion of the Internet and the World Wide Web, users are becoming increasingly diverse, particularly in terms of languages. In fact, the number of polyglot Web users across the globe has increased dramatically. However, even such multilingual users often continue to suffer from unbalanced and fragmented news information, as traditional news access systems seldom allow users to simultaneously search for and/or compare news in different languages, even though prior research results have shown that multilingual users make significant use of each of their languages when searching for information online. Relatively little human-centered research has been conducted to better understand and support multilingual user abilities and preferences. In particular, in the fields of cross-language and multilingual search, the majority of research has focused primarily on improving retrieval and translation accuracy, while paying comparably less attention to multilingual user interaction aspects. The research presented in this thesis provides the first large-scale investigations of multilingual news consumption and querying/search result selection behaviors, as well as a detailed comparative analysis of polyglots’ preferences and behaviors with respect to different multilingual news search interfaces on desktop and mobile platforms. Through a set of 4 phases of user studies, including surveys, interviews, as well as task-based user studies using crowdsourcing and laboratory experiments, this thesis presents the first human-centered studies in multilingual news access, aiming to drive the development of personalized multilingual news access systems to better support each individual user

    Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher

    Full text link
    The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LLM era, we strive to transform LLM into a relevant, responsible, and trustworthy searcher. We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources. This framework consists of three core modules: Generator, Validator, and Optimizer, each focusing on generating trustworthy online sources, verifying source reliability, and refining unreliable sources, respectively. Extensive experiments and evaluations highlight our method's superior relevance, responsibility, and trustfulness against various SOTA methods.Comment: 14 pages, 4 figures, under peer revie

    eCPD Programme - Enhanced Learning.

    Get PDF
    This collection of papers (edited by Kevin Donovan) has been produced by the Association for Learning Technology (ALT) for LSIS. They are based on the summaries used by presenters during workshops at the 2009 launch of the eCPD Programme

    Data-driven prototyping via natural-language-based GUI retrieval

    Get PDF
    Rapid GUI prototyping has evolved into a widely applied technique in early stages of software development to facilitate the clarification and refinement of requirements. Especially high-fidelity GUI prototyping has shown to enable productive discussions with customers and mitigate potential misunderstandings, however, the benefits of applying high-fidelity GUI prototypes are accompanied by the disadvantage of being expensive and time-consuming in development and requiring experience to create. In this work, we show RaWi, a data-driven GUI prototyping approach that effectively retrieves GUIs for reuse from a large-scale semi-automatically created GUI repository for mobile apps on the basis of Natural Language (NL) searches to facilitate GUI prototyping and improve its productivity by leveraging the vast GUI prototyping knowledge embodied in the repository. Retrieved GUIs can directly be reused and adapted in the graphical editor of RaWi. Moreover, we present a comprehensive evaluation methodology to enable (i) the systematic evaluation of NL-based GUI ranking methods through a novel high-quality gold standard and conduct an in-depth evaluation of traditional IR and state-of-the-art BERT-based models for GUI ranking, and (ii) the assessment of GUI prototyping productivity accompanied by an extensive user study in a practical GUI prototyping environment

    Understanding the use of Virtual Reality in Marketing: a text mining-based review

    Get PDF
    The current study intends to highlight the most relevant studies in simulated realities with special attention to VR and marketing, showing how studies have evolved over time and discussing the findings. A text-mining approach using a Bayesian statistical topic model called latent Dirichlet allocation is employed to conduct a comprehensive analysis of 150 articles from 115 journals, all indexed in Web of Science. The findings reveal seven relevant topics, as well as the number of articles published over time, the authors most cited in VR papers and the leading journals in each topic. The article also provides theoretical and practical implications and suggestions for further research.info:eu-repo/semantics/acceptedVersio

    Evaluating Generative Ad Hoc Information Retrieval

    Full text link
    Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.Comment: 14 pages, 5 figures, 1 tabl
    • 

    corecore