1,504 research outputs found
Information Discovery on Electronic Health Records Using Authority Flow Techniques
<p>Abstract</p> <p>Background</p> <p>As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs.</p> <p>Methods</p> <p>We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease.</p> <p>Results</p> <p>Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians.</p> <p>Conclusions</p> <p>Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.</p
Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering
We introduce the concept of "thinking assistants", an approach that
encourages users to engage in deep reflection and critical thinking through
brainstorming and thought-provoking queries. We instantiate one such thinking
assistant, Gradschool.chat, as a virtual assistant tailored to assist
prospective graduate students. We posit that thinking assistants are
particularly relevant to situations like applying to graduate school, a phase
often characterized by the challenges of academic preparation and the
development of a unique research identity. In such situations, students often
lack direct mentorship from professors, or may feel hesitant to approach
faculty with their queries, making thinking assistants particularly useful.
Leveraging a Large Language Model (LLM), Gradschool.chat is a demonstration
system built as a thinking assistant for working with specific professors in
the field of human-computer interaction (HCI). It was designed through training
on information specific to these professors and a validation processes in
collaboration with these academics. This technical report delineates the
system's architecture and offers a preliminary analysis of our deployment
study. Additionally, this report covers the spectrum of questions posed to our
chatbots by users. The system recorded 223 conversations, with participants
responding positively to approximately 65% of responses. Our findings indicate
that users who discuss and brainstorm their research interests with
Gradschool.chat engage more deeply, often interacting with the chatbot twice as
long compared to those who only pose questions about professors
COMPUTING APPROXIMATE CUSTOMIZED RANKING
As the amount of information grows and as users become more
sophisticated, ranking techniques become important building blocks
to meet user needs when answering queries. PageRank is one of the
most successful link-based ranking methods, which iteratively
computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization.
We address the scalability challenges for two types of customized
ranking. The first challenge is to compute the ranking of a
subgraph. Various Web applications focus on identifying a
subgraph, such as focused crawlers and localized search engines.
The second challenge is to compute online personalized ranking.
Personalized search improves the quality of search results for each
user. The user needs are represented by a personalized set of pages
or personalized link importance in an entity relationship graph.
This requires an efficient online computation.
To solve the subgraph ranking problem efficiently, we estimate the
ranking scores for a subgraph. We propose a framework of an exact
solution (IdealRank) and an approximate solution (ApproxRank) for
computing ranking on a subgraph. Both IdealRank and ApproxRank
represent the set of external pages with an external node
and modify the PageRank-style transition matrix with respect to . The IdealRank algorithm assumes that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge to the true PageRank scores. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. We analyze the distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a
constant factor of the distance of the external pages. We demonstrate with real and synthetic data that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.
We consider online personalization using ObjectRank; it is an
authority flow based ranking for entity relationship graphs. We formalize the concept of an aggregate surfer on a data graph; the surfer's behavior is controlled by multiple personalized rankings. We prove a linearity
theorem over these rankings which can be used as a tool to scale
this type of personalization. DataApprox uses a repository of precomputed rankings for a given set of link weights assignments. We define DataApprox as an optimization problem; it selects a subset of the precomputed rankings from the repository and produce a weighted combination of these rankings. We analyze the distance between the DataApprox scores and the real authority flow ranking scores and show that DataApprox has a theoretical bound. Our experiments on the DBLP data graph show that DataApprox performs well in practice and allows fast and accurate personalized authority flow ranking
Gauging the Path of Private Canadian Pensions: 2010 Update on the State of Defined Benefit and Defined Contribution Pension Plans
The issue of under-funded defined benefit (DB) pension plans has become one of the most perplexing financial issues facing business executives, legislators and Canadian pensioners who are or will in the future be reliant on pension income as an important component of their overall retirement incomes. In 2004, CGA-Canada issued a comprehensive paper on defined benefit pension plans titled “Addressing the Pensions Dilemma in Canada”. The goal of that release was to advance understanding of DB pension plans and to impart a reasonable estimate of the standing of DB pension plans at December 31, 2003. In 2009, this analysis was further advanced by examining the funding status of private DB pension plans at December 31, 2008. The results of the analysis show that funding deficits have intensified with funding ratios eroding to unsustainable levels. The vast majority (92%) of private DB pension plans were in a deficit position as at December 31, 2008. The average funding ratio has decreased from 112% to 77% on a ‘without indexation’ basis and from 71% to 57% on a ‘with indexation’ basis. The aggregate funding shortfall is expected to exceed $350 billion.defined benefit pension plan, defined contribution pension plan, funding postion of pension plans, household savings, retirement savings, retirement income, household finance, pension accounting
How Far Will You Go? Characterizing Online Search Stopping Behaviors Using Information Scent and Need for Cognition
This research sought to explain online searchers' stopping behaviors when interacting with search engine result pages (SERPs) using the theories of Information Scent and Need for Cognition (NFC). Specifically, the problems addressed were how: (1) information scent level, operationalized as the number of relevant documents on the first SERP, (2) information scent pattern, operationalized as the distribution of relevant and non-relevant results on the first SERP, and (3) NFC, a person's tendency to engage in and enjoy effortful cognitive activities measured by the Need for Cognition scale, impacted a person's search stopping behaviors. The two search stopping behaviors that were examined were query stopping, or the point at which a person decides to issue a new query, and task stopping, or the point at which a person decides to end the search task. A laboratory experiment was conducted with 48 participants, who were asked to gather information for six open-ended search tasks. Participants were interviewed about their search stopping behaviors at the end of the study using recordings of their search processes to stimulate recall. The results showed significant effects of Information Scent and NFC on search stopping behaviors. When there were more relevant results on the first SERP, participants examined more documents and explored deeper in the search results list. Participants' behaviors were also affected by the distribution of relevant results on the first SERP: when relevant results were found at the top of the SERP, participants left the SERP after viewing only the first few results. When participants encountered relevant results dispersed across the first SERP at the start of a search task, participants issued more queries subsequently to solve the search task. Participants with lower NFC searched deeper but reformulated queries less frequently during a task. Moreover, the time participants with lower NFC spent evaluating search results was more variable depending on the number of relevant results displayed on the first SERP than the time spent by higher NFC participants. Finally, participants reported that they tended to examine results beyond the first SERP when they conducted people, product, image and literature searches in daily life.Doctor of Philosoph
Diverse and proportional size-1 object summaries for keyword search
The abundance and ubiquity of graphs (e.g., Online Social Networks such as Google+ and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a Data Subject (DS), a recently proposed relational keyword search paradigm produces, as a query result, a set of Object Summaries (OSs). An OS is a tree structure rooted at the DS node (i.e., a tuple containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. Size-l OSs are partial OSs containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper we investigate the effective and efficient generation of two novel types of OS snippets, i.e. diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, apart from the importance of each node, we also consider its frequency in the OS and its repetitions in the snippets. We conduct an extensive evaluation on two real graphs (DBLP and Google+). We verify effectiveness by collecting user feedback, e.g. by asking DBLP authors (i.e. the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.postprin
A study of lawyers’ information behaviour leading to the development of two methods for evaluating electronic resources
In this thesis we examine the information behaviour displayed by a broad cross-section of
academic and practicing lawyers and feed our findings into the development of the
Information Behaviour (IB) methods - two novel methods for evaluating the functionality
and usability of electronic resources.
We captured lawyers’ information behaviour by conducting naturalistic observations, where we
asked participants to think aloud whilst using existing resources to ‘find information required for
their work.’ Lawyers’ information behaviours closely matched those observed in other disciplines
by Ellis and others, serving to validate Ellis’s existing model in the legal domain. Our findings also
extend Ellis’s model to include behaviours pertinent to legal information-seeking, broaden the
scope of the model to cover information use (in addition to information-seeking) behaviours and
enhance the potential analytical detail of the model through the identification of a range of
behavioural ‘subtypes’ and levels at which behaviours can operate.
The identified behaviours were used as the basis for developing two methods for evaluating
electronic resources – the IB functionality method (which mainly involves examining whether and
how information behaviours are currently, or might in future be, supported by an electronic
resource) and the IB usability method (which involves setting users behaviour-focused tasks, asking
them to think aloud whilst performing the tasks, and identifying usability issues from the think-
aloud data).
Finally the IB methods were themselves evaluated by stakeholders working for LexisNexis
Butterworths – a large electronic legal resource development firm. Stakeholders were recorded
using the methods and focus group and questionnaire data was collected, with the aim of
ascertaining how usable, useful and learnable they considered the methods to be and how likely
they would be to use them in future. Overall, findings were positive regarding both methods and
useful suggestions for improving the methods were made
Developing Data Stories as Enhanced Publications in Digital Humanities
This paper discusses the development of data-driven stories and the editorial processes underlying their production. Such ‘data stories’ have proliferated in journalism but are also increasingly developed within academia. Although ‘data stories’ lack a clear definition, there are similarities between the processes that underlie journalistic and academic data stories. However, there are also differences, specifically when it comes to epistemological claims. In this paper data stories as phenomenon and their use in journalism and in the Humanities form the context for the editorial protocol developed for CLARIAH Media Suite Data Stories
Recommended from our members
Developing Learning Analytics for Epistemic Commitments in a Collaborative Information Seeking Environment
Learning analytics sits at the confluence of learning, information, and computer sciences. Using a distinctive account of learning analytics as a form of assessment, I first argue for its potential in pedagogically motivated learning design, suggesting a particular construct – epistemic cognition in literacy contexts – to probe using learning analytics. I argue for a recasting of epistemic cognition as ‘epistemic commitments’ in collaborative information tasks drawing a novel alignment between information seeking and multiple document processing (MDP) models, with empirical and theoretical grounding given for a focus on collaboration and dialogue in such activities. Thus, epistemic commitments are seen in the ways students seek, select, and integrate claims from multiple sources, and the ways in which their collaborative dialogue is brought to bear in this activity. Accordingly, the empirical element of the thesis develops two pedagogically grounded literacy based tasks: a MDP task, in which pre-selected documents were provided to students; and a collaborative information seeking task (CIS), in which students could search the web. These tasks were deployed at scale (n > 500) and involved writing an evaluative review, followed by a pedagogically supported peer assessment task. Assessment outcomes were analysed in the context of a new epistemic commitments-oriented set of trace data, and psychometric data regarding the participants’ epistemic cognition. Demonstrating the value of the methodological and conceptual approach taken, qualitative analyses indicate clear epistemic activity, and stark differences in behaviour between groups, the complexity of which is challenging to model computationally. Despite this complexity, quantitative analyses indicate that up to 30% of variance in output scores can be modelled using behavioural indicators. The explanatory potential of behaviourally-oriented models of epistemic commitments grounded in tool-interaction and collaborative dialogue is demonstrated. The thesis provides an exemplification of theoretically positioned analytic development, drawing on interdisciplinary literatures in addressing complex learning contexts
- …