33 research outputs found
Evaluating Generative Ad Hoc Information Retrieval
Recent advances in large language models have enabled the development of
viable generative information retrieval systems. A generative retrieval system
returns a grounded generated text in response to an information need instead of
the traditional document ranking. Quantifying the utility of these types of
responses is essential for evaluating generative retrieval systems. As the
established evaluation methodology for ranking-based ad hoc retrieval may seem
unsuitable for generative retrieval, new approaches for reliable, repeatable,
and reproducible experimentation are required. In this paper, we survey the
relevant information retrieval and natural language processing literature,
identify search tasks and system architectures in generative retrieval, develop
a corresponding user model, and study its operationalization. This theoretical
analysis provides a foundation and new insights for the evaluation of
generative ad hoc retrieval systems.Comment: 14 pages, 5 figures, 1 tabl
Missed urinary tract infection in patients with chronic recalcitrant LUTS and recurrent cystitis
Background:
MSU culture and Urinary dipsticks as a diagnostic method for urinary infection (UTI) are discredited despite commonly used to exclude UTI in patients with lower urinary tract symptoms (LUTS). The phenotype of painful LUTS has been recast as Interstitial Cystitis (IC) or Bladder Pains Syndrome (BPS) because infection has been excluded on the evidence of these methods. Given that these all-important tests have been found insensitive and misleading, there is justification in re-examining IC/BPS to ascertain whether we have been mistaken. I studied patients with âChronic recalcitrant bladder pain and recurrent cystitisâ (abbreviated âpainful LUTSâ) who had been diagnosed with IC/PBS in order to re-assess their pathophysiology.//
Aim:
I characterised these patients using the scientific method of consilience, which scrutinised them from unrelated perspectives. These studies implied that infection was a most probable aetiological factor. Therefore, I moved on to test infection as a causal factor using Pearlâs three rungs of causation: Correlation, intervention and the counterfactual.//
Methods:
Data on quality of life and disease experience were obtained. Symptoms and pathophysiological variables in 146 patients presenting with painful LUTS were studied. To achieve Pearlâs specifications, an observational study studied intervention and a cross-over study analysed the counter factual of arbitrary treatment cessation. The evolution of treatment of these patients, using first generation, narrow spectrum urinary agents in protracted courses is reported. Since protracted antibiotic exposure is feared as a cause of antimicrobial resistance (AMR), I measured this in order to round off my findings//
Results:
The consilience studies incriminated UTI in the aetiology of painful LUTS. It is also clear that the patients suffer terribly, and this is aggravated by professional scepticism catalysed by a misinterpretation of urinalysis data. Antibiotic intervention demonstrated a regression in all disease indicators but there was resurgence of symptoms and signs during trials without treatment. The data on AMR demonstrated a rise in resistance in response to a first prescription without this increasing with persistence of the antibiotic regimen.//
Conclusion:
These data imply that IC/BPS (painful LUTS) is caused by a treatable urinary tract infection and are sufficient to merit a RCT. Whilst, treatment requires protracted exposure to antibiotics, my data on AMR amongst these patients is surprisingly reassuring. This requires further exploration. Contemporaneous to this thesis, other have published definitive data that refute urine culture and dipstick analysis./
On the enhancement of Big Data Pipelines through Data Preparation, Data Quality, and the distribution of Optimisation Problems
Nowadays, data are fundamental for companies, providing operational support by facilitating daily
transactions. Data has also become the cornerstone of strategic decision-making processes in
businesses. For this purpose, there are numerous techniques that allow to extract knowledge and
value from data. For example, optimisation algorithms excel at supporting decision-making
processes to improve the use of resources, time and costs in the organisation. In the current
industrial context, organisations usually rely on business processes to orchestrate their daily
activities while collecting large amounts of information from heterogeneous sources. Therefore,
the support of Big Data technologies (which are based on distributed environments) is required
given the volume, variety and speed of data. Then, in order to extract value from the data, a set
of techniques or activities is applied in an orderly way and at different stages. This set of
techniques or activities, which facilitate the acquisition, preparation, and analysis of data, is known
in the literature as Big Data pipelines.
In this thesis, the improvement of three stages of the Big Data pipelines is tackled: Data
Preparation, Data Quality assessment, and Data Analysis. These improvements can be
addressed from an individual perspective, by focussing on each stage, or from a more complex
and global perspective, implying the coordination of these stages to create data workflows.
The first stage to improve is the Data Preparation by supporting the preparation of data with
complex structures (i.e., data with various levels of nested structures, such as arrays).
Shortcomings have been found in the literature and current technologies for transforming complex
data in a simple way. Therefore, this thesis aims to improve the Data Preparation stage through
Domain-Specific Languages (DSLs). Specifically, two DSLs are proposed for different use cases.
While one of them is a general-purpose Data Transformation language, the other is a DSL aimed
at extracting event logs in a standard format for process mining algorithms.
The second area for improvement is related to the assessment of Data Quality. Depending on the
type of Data Analysis algorithm, poor-quality data can seriously skew the results. A clear example
are optimisation algorithms. If the data are not sufficiently accurate and complete, the search
space can be severely affected. Therefore, this thesis formulates a methodology for modelling
Data Quality rules adjusted to the context of use, as well as a tool that facilitates the automation
of their assessment. This allows to discard the data that do not meet the quality criteria defined
by the organisation. In addition, the proposal includes a framework that helps to select actions to
improve the usability of the data.
The third and last proposal involves the Data Analysis stage. In this case, this thesis faces the
challenge of supporting the use of optimisation problems in Big Data pipelines. There is a lack of
methodological solutions that allow computing exhaustive optimisation problems in distributed
environments (i.e., those optimisation problems that guarantee the finding of an optimal solution
by exploring the whole search space). The resolution of this type of problem in the Big Data
context is computationally complex, and can be NP-complete. This is caused by two different
factors. On the one hand, the search space can increase significantly as the amount of data to
be processed by the optimisation algorithms increases. This challenge is addressed through a
technique to generate and group problems with distributed data. On the other hand, processing
optimisation problems with complex models and large search spaces in distributed environments
is not trivial. Therefore, a proposal is presented for a particular case in this type of scenario.
As a result, this thesis develops methodologies that have been published in scientific journals and
conferences.The methodologies have been implemented in software tools that are integrated with
the Apache Spark data processing engine. The solutions have been validated through tests and use cases with real datasets
Analysis of Family-Health-Related Topics on Wikipedia
New concepts, terms, and topics always emerge; and meanings of existing terms and topics keep changing all the time. These phenomena occur more frequently on social media than on conventional media because social media allows a huge number of users to generate information online. Retrieving relevant results in different time periods of a fast-changing topic becomes one of the most difficult challenges in the information retrieval field. Among numerous topics discussed on social media, health-related topics are a major category which attracts increasing attention from the general public.
This study investigated and explored the evolution patterns of family-health-related topics on Wikipedia. Three family-health-related topics (Child Maltreatment, Family Planning, and Womenâs Health) were selected from the World Health Organization Website and their associated entries were retrieved on Wikipedia. Historical numeric and text data of the entries from 2010 to 2017 were collected from a Wikipedia data dump and the Wikipedia Web pages. Four periods were defined: 2010 to 2011, 2012 to 2013, 2014 to 2015, and 2016 to 2017. Coding, subject analysis, descriptive statistical analysis, inferential statistical analysis, SOM approach, and n-gram approach were employed to explore the internal characteristics and external popularity evolutions of the topics.
The findings illustrate that the external popularities of the family-health-related topics declined from 2010 to 2017, although their content on Wikipedia kept increasing. The emerged entries had three features: specialization, summarization, and internationalization. The subjects derived from the entries became increasingly diverse during the investigated periods. Meanwhile, the developing trajectories of the subjects varied from one to another. According to the developing trajectories, the subjects were grouped into three categories: growing subject, diminishing subject, and fluctuating subject. The popularities of the topics among the Wikipedia viewers were consistent, while among the editors were not. For each topic, its popularity trend among the editors and the viewers was inconsistent. Child Maltreatment was the most popular among the three topics, Womenâs Health was the second most popular, while Family Planning was the least popular among the three.
The implications of this study include: (1) helping health professionals and general users get a more comprehensive understanding of the investigated topics; (2) contributing to the developments of health ontologies and consumer health vocabularies; (3) assisting Website designers in organizing online health information and helping them identify popular family-health-related topics; (4) providing a new approach for query recommendation in information retrieval systems; (5) supporting temporal information retrieval by presenting the temporal changes of family-health-related topics; and (6) providing a new combination of data collection and analysis methods for researchers
The impact of information quality awareness on users\u27 behaviors toward information quality practices
Healthcare organization rely more on electronic information to optimize most of their processes. Additional information sources and more diverse information increase the relevance and importance of information quality (IQ). The quality of information needs to be improved to support a more efficient and reliable utilization of information systems (IS). This improvement can only be achieved through the implementation of initiatives followed by most users across the organization. The purpose of this study is to develop a model related to how awareness of IS users about IQ issues would affect their actual practices toward IQ initiatives. It is posited that usersâ motivation is influenced by their awareness on beneficial and problematic situations generated by IQ. The motivation that users may have regarding IQ impact, will influence their behavior regarding IQ practices. Social influences and facilitating conditions are considered as moderators of the interaction between intention and actual usersâ behavior
Information Quality in Secondary Use of EHR Data : A Case Study of Quality Management in a Norwegian Hospital
The motivation for undertaking this study relates to my experiences from practice in a public hospital, where I have observed variations in reaching organizational goals of quality management informed by electronic health records (EHR) data. For example, while some departments and units have long-time traditions in meeting the quality goals that are set locally, regionally, or nationally, other departments and units struggle to meet the same quality goals. Thus, generating actionable information by reusing routinely collected EHR data does not necessary lead to action in response to the information. This process of generating information from existing EHR data, and communicating and using such information for organizational purposes, may be challenging in a highly complex environment such as health care organizations. Within this process, information quality (IQ) may influence actorsâ perceptions of action possibilities the information offers, thus influencing the actual use of the information required to reach organizational goals.
EHR data can be used for clinical purposes at the point-of-care (i.e., primary use) and reused for purposes that do not involve patient treatment directly (i.e., secondary use). Examples of such secondary use includes quality management, research, and policy development. Though it is widely accepted that IQ influences the use of EHR systems and the information generated by EHR systems, research on the implications of IQ on health care processes is limited: the focus of the current literature is concerned with defining and assessing IQ in primary use of EHR data, whereas the role of IQ in secondary use of EHR data remains unclear. Thus, this dissertation investigates the role of IQ in secondary use of EHR data in an organizational context.
This dissertation addresses this practical and theoretical challenge by focusing on the overall research objective of understanding the role of IQ in secondary use of EHR data. To address this research objective, this dissertation explores the following research questions:
RQ1. How do human actors influence in transformation of IQ while generating, communicating, and using information in secondary use of EHR data?
RQ2. What are the underlying generative mechanisms through which IQ transforms in the process of secondary use of EHR data?publishedVersio
Data, Technology, and People: Demystifying Master Data Management
With the amount of data constantly increasing, better practices are needed to manage it. Master data management (MDM) is an organizationally horizontal flow of activities aimed at managing core business data (i.e., master data). MDM differs from traditional data management practices as an organization-wide function. The idea of managing an organizationâs most important data is impossible to achieve if MDM is simply treated as a data management practice or a technology-driven phenomenon. Establishing an MDM function involves introducing changes to an organization, which can relate to people and their ways of working, or technology and how it is used. If only a certain aspect is emphasized, the function will not deliver desired results.The object of this thesis is to study MDM not as a straightforward IT project, but as a complicated and multi-dimensional function. The goal is to understand the factors that should be taken into account in the development of an MDM function. The empirical part of this study is an ethnographic case study in a public sector organization, where MDM development was in early phases when the observation began. Altogether, the two data collection periods lasted for 32 months and during this, two MDM development projects were carried out, and MDM development became rooted as part of the organizationâs routine operations.MDM development was analyzed as an ensemble that includes social and material components. Its theorization begins with understanding the role of master data in an organizationâs information landscape and continues to examine the different views of MDM. Theories of change assist in understanding how change should be observed, understood, and managed.The study indicates that MDM effects multiple levels of an organization. Many organizational factors influence its development, and extensive dependencies exist between these factors. Especially in terms of ownership, other roles and responsibilities assume key positions. By understanding these factors and their roles in MDM development, it is easier to manage them.The study sheds light on the strong alignment between the complex concept of MDM and the organization. MDM literature is scarce and literature of public sector MDM is almost nonexistent. This dissertation contributes to research by widening the understanding of MDM in the public sector context, and by presenting a framework for establishing an MDM function as an organizational function that is closely linked with technology
Recommended from our members
Developing a data quality scorecard that measures data quality in a data warehouse
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participantsâ found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development