33 research outputs found

    Evaluating Generative Ad Hoc Information Retrieval

    Full text link
    Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.Comment: 14 pages, 5 figures, 1 tabl

    Missed urinary tract infection in patients with chronic recalcitrant LUTS and recurrent cystitis

    Get PDF
    Background: MSU culture and Urinary dipsticks as a diagnostic method for urinary infection (UTI) are discredited despite commonly used to exclude UTI in patients with lower urinary tract symptoms (LUTS). The phenotype of painful LUTS has been recast as Interstitial Cystitis (IC) or Bladder Pains Syndrome (BPS) because infection has been excluded on the evidence of these methods. Given that these all-important tests have been found insensitive and misleading, there is justification in re-examining IC/BPS to ascertain whether we have been mistaken. I studied patients with “Chronic recalcitrant bladder pain and recurrent cystitis” (abbreviated “painful LUTS”) who had been diagnosed with IC/PBS in order to re-assess their pathophysiology.// Aim: I characterised these patients using the scientific method of consilience, which scrutinised them from unrelated perspectives. These studies implied that infection was a most probable aetiological factor. Therefore, I moved on to test infection as a causal factor using Pearl’s three rungs of causation: Correlation, intervention and the counterfactual.// Methods: Data on quality of life and disease experience were obtained. Symptoms and pathophysiological variables in 146 patients presenting with painful LUTS were studied. To achieve Pearl’s specifications, an observational study studied intervention and a cross-over study analysed the counter factual of arbitrary treatment cessation. The evolution of treatment of these patients, using first generation, narrow spectrum urinary agents in protracted courses is reported. Since protracted antibiotic exposure is feared as a cause of antimicrobial resistance (AMR), I measured this in order to round off my findings// Results: The consilience studies incriminated UTI in the aetiology of painful LUTS. It is also clear that the patients suffer terribly, and this is aggravated by professional scepticism catalysed by a misinterpretation of urinalysis data. Antibiotic intervention demonstrated a regression in all disease indicators but there was resurgence of symptoms and signs during trials without treatment. The data on AMR demonstrated a rise in resistance in response to a first prescription without this increasing with persistence of the antibiotic regimen.// Conclusion: These data imply that IC/BPS (painful LUTS) is caused by a treatable urinary tract infection and are sufficient to merit a RCT. Whilst, treatment requires protracted exposure to antibiotics, my data on AMR amongst these patients is surprisingly reassuring. This requires further exploration. Contemporaneous to this thesis, other have published definitive data that refute urine culture and dipstick analysis./

    On the enhancement of Big Data Pipelines through Data Preparation, Data Quality, and the distribution of Optimisation Problems

    Get PDF
    Nowadays, data are fundamental for companies, providing operational support by facilitating daily transactions. Data has also become the cornerstone of strategic decision-making processes in businesses. For this purpose, there are numerous techniques that allow to extract knowledge and value from data. For example, optimisation algorithms excel at supporting decision-making processes to improve the use of resources, time and costs in the organisation. In the current industrial context, organisations usually rely on business processes to orchestrate their daily activities while collecting large amounts of information from heterogeneous sources. Therefore, the support of Big Data technologies (which are based on distributed environments) is required given the volume, variety and speed of data. Then, in order to extract value from the data, a set of techniques or activities is applied in an orderly way and at different stages. This set of techniques or activities, which facilitate the acquisition, preparation, and analysis of data, is known in the literature as Big Data pipelines. In this thesis, the improvement of three stages of the Big Data pipelines is tackled: Data Preparation, Data Quality assessment, and Data Analysis. These improvements can be addressed from an individual perspective, by focussing on each stage, or from a more complex and global perspective, implying the coordination of these stages to create data workflows. The first stage to improve is the Data Preparation by supporting the preparation of data with complex structures (i.e., data with various levels of nested structures, such as arrays). Shortcomings have been found in the literature and current technologies for transforming complex data in a simple way. Therefore, this thesis aims to improve the Data Preparation stage through Domain-Specific Languages (DSLs). Specifically, two DSLs are proposed for different use cases. While one of them is a general-purpose Data Transformation language, the other is a DSL aimed at extracting event logs in a standard format for process mining algorithms. The second area for improvement is related to the assessment of Data Quality. Depending on the type of Data Analysis algorithm, poor-quality data can seriously skew the results. A clear example are optimisation algorithms. If the data are not sufficiently accurate and complete, the search space can be severely affected. Therefore, this thesis formulates a methodology for modelling Data Quality rules adjusted to the context of use, as well as a tool that facilitates the automation of their assessment. This allows to discard the data that do not meet the quality criteria defined by the organisation. In addition, the proposal includes a framework that helps to select actions to improve the usability of the data. The third and last proposal involves the Data Analysis stage. In this case, this thesis faces the challenge of supporting the use of optimisation problems in Big Data pipelines. There is a lack of methodological solutions that allow computing exhaustive optimisation problems in distributed environments (i.e., those optimisation problems that guarantee the finding of an optimal solution by exploring the whole search space). The resolution of this type of problem in the Big Data context is computationally complex, and can be NP-complete. This is caused by two different factors. On the one hand, the search space can increase significantly as the amount of data to be processed by the optimisation algorithms increases. This challenge is addressed through a technique to generate and group problems with distributed data. On the other hand, processing optimisation problems with complex models and large search spaces in distributed environments is not trivial. Therefore, a proposal is presented for a particular case in this type of scenario. As a result, this thesis develops methodologies that have been published in scientific journals and conferences.The methodologies have been implemented in software tools that are integrated with the Apache Spark data processing engine. The solutions have been validated through tests and use cases with real datasets

    Analysis of Family-Health-Related Topics on Wikipedia

    Get PDF
    New concepts, terms, and topics always emerge; and meanings of existing terms and topics keep changing all the time. These phenomena occur more frequently on social media than on conventional media because social media allows a huge number of users to generate information online. Retrieving relevant results in different time periods of a fast-changing topic becomes one of the most difficult challenges in the information retrieval field. Among numerous topics discussed on social media, health-related topics are a major category which attracts increasing attention from the general public. This study investigated and explored the evolution patterns of family-health-related topics on Wikipedia. Three family-health-related topics (Child Maltreatment, Family Planning, and Women’s Health) were selected from the World Health Organization Website and their associated entries were retrieved on Wikipedia. Historical numeric and text data of the entries from 2010 to 2017 were collected from a Wikipedia data dump and the Wikipedia Web pages. Four periods were defined: 2010 to 2011, 2012 to 2013, 2014 to 2015, and 2016 to 2017. Coding, subject analysis, descriptive statistical analysis, inferential statistical analysis, SOM approach, and n-gram approach were employed to explore the internal characteristics and external popularity evolutions of the topics. The findings illustrate that the external popularities of the family-health-related topics declined from 2010 to 2017, although their content on Wikipedia kept increasing. The emerged entries had three features: specialization, summarization, and internationalization. The subjects derived from the entries became increasingly diverse during the investigated periods. Meanwhile, the developing trajectories of the subjects varied from one to another. According to the developing trajectories, the subjects were grouped into three categories: growing subject, diminishing subject, and fluctuating subject. The popularities of the topics among the Wikipedia viewers were consistent, while among the editors were not. For each topic, its popularity trend among the editors and the viewers was inconsistent. Child Maltreatment was the most popular among the three topics, Women’s Health was the second most popular, while Family Planning was the least popular among the three. The implications of this study include: (1) helping health professionals and general users get a more comprehensive understanding of the investigated topics; (2) contributing to the developments of health ontologies and consumer health vocabularies; (3) assisting Website designers in organizing online health information and helping them identify popular family-health-related topics; (4) providing a new approach for query recommendation in information retrieval systems; (5) supporting temporal information retrieval by presenting the temporal changes of family-health-related topics; and (6) providing a new combination of data collection and analysis methods for researchers

    The impact of information quality awareness on users\u27 behaviors toward information quality practices

    Get PDF
    Healthcare organization rely more on electronic information to optimize most of their processes. Additional information sources and more diverse information increase the relevance and importance of information quality (IQ). The quality of information needs to be improved to support a more efficient and reliable utilization of information systems (IS). This improvement can only be achieved through the implementation of initiatives followed by most users across the organization. The purpose of this study is to develop a model related to how awareness of IS users about IQ issues would affect their actual practices toward IQ initiatives. It is posited that users’ motivation is influenced by their awareness on beneficial and problematic situations generated by IQ. The motivation that users may have regarding IQ impact, will influence their behavior regarding IQ practices. Social influences and facilitating conditions are considered as moderators of the interaction between intention and actual users’ behavior

    Information Quality in Secondary Use of EHR Data : A Case Study of Quality Management in a Norwegian Hospital

    Get PDF
    The motivation for undertaking this study relates to my experiences from practice in a public hospital, where I have observed variations in reaching organizational goals of quality management informed by electronic health records (EHR) data. For example, while some departments and units have long-time traditions in meeting the quality goals that are set locally, regionally, or nationally, other departments and units struggle to meet the same quality goals. Thus, generating actionable information by reusing routinely collected EHR data does not necessary lead to action in response to the information. This process of generating information from existing EHR data, and communicating and using such information for organizational purposes, may be challenging in a highly complex environment such as health care organizations. Within this process, information quality (IQ) may influence actors’ perceptions of action possibilities the information offers, thus influencing the actual use of the information required to reach organizational goals. EHR data can be used for clinical purposes at the point-of-care (i.e., primary use) and reused for purposes that do not involve patient treatment directly (i.e., secondary use). Examples of such secondary use includes quality management, research, and policy development. Though it is widely accepted that IQ influences the use of EHR systems and the information generated by EHR systems, research on the implications of IQ on health care processes is limited: the focus of the current literature is concerned with defining and assessing IQ in primary use of EHR data, whereas the role of IQ in secondary use of EHR data remains unclear. Thus, this dissertation investigates the role of IQ in secondary use of EHR data in an organizational context. This dissertation addresses this practical and theoretical challenge by focusing on the overall research objective of understanding the role of IQ in secondary use of EHR data. To address this research objective, this dissertation explores the following research questions: RQ1. How do human actors influence in transformation of IQ while generating, communicating, and using information in secondary use of EHR data? RQ2. What are the underlying generative mechanisms through which IQ transforms in the process of secondary use of EHR data?publishedVersio

    Data, Technology, and People: Demystifying Master Data Management

    Get PDF
    With the amount of data constantly increasing, better practices are needed to manage it. Master data management (MDM) is an organizationally horizontal flow of activities aimed at managing core business data (i.e., master data). MDM differs from traditional data management practices as an organization-wide function. The idea of managing an organization’s most important data is impossible to achieve if MDM is simply treated as a data management practice or a technology-driven phenomenon. Establishing an MDM function involves introducing changes to an organization, which can relate to people and their ways of working, or technology and how it is used. If only a certain aspect is emphasized, the function will not deliver desired results.The object of this thesis is to study MDM not as a straightforward IT project, but as a complicated and multi-dimensional function. The goal is to understand the factors that should be taken into account in the development of an MDM function. The empirical part of this study is an ethnographic case study in a public sector organization, where MDM development was in early phases when the observation began. Altogether, the two data collection periods lasted for 32 months and during this, two MDM development projects were carried out, and MDM development became rooted as part of the organization’s routine operations.MDM development was analyzed as an ensemble that includes social and material components. Its theorization begins with understanding the role of master data in an organization’s information landscape and continues to examine the different views of MDM. Theories of change assist in understanding how change should be observed, understood, and managed.The study indicates that MDM effects multiple levels of an organization. Many organizational factors influence its development, and extensive dependencies exist between these factors. Especially in terms of ownership, other roles and responsibilities assume key positions. By understanding these factors and their roles in MDM development, it is easier to manage them.The study sheds light on the strong alignment between the complex concept of MDM and the organization. MDM literature is scarce and literature of public sector MDM is almost nonexistent. This dissertation contributes to research by widening the understanding of MDM in the public sector context, and by presenting a framework for establishing an MDM function as an organizational function that is closely linked with technology
    corecore