1,504 research outputs found

    Information Discovery on Electronic Health Records Using Authority Flow Techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs.</p> <p>Methods</p> <p>We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease.</p> <p>Results</p> <p>Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians.</p> <p>Conclusions</p> <p>Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.</p

    Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering

    Full text link
    We introduce the concept of "thinking assistants", an approach that encourages users to engage in deep reflection and critical thinking through brainstorming and thought-provoking queries. We instantiate one such thinking assistant, Gradschool.chat, as a virtual assistant tailored to assist prospective graduate students. We posit that thinking assistants are particularly relevant to situations like applying to graduate school, a phase often characterized by the challenges of academic preparation and the development of a unique research identity. In such situations, students often lack direct mentorship from professors, or may feel hesitant to approach faculty with their queries, making thinking assistants particularly useful. Leveraging a Large Language Model (LLM), Gradschool.chat is a demonstration system built as a thinking assistant for working with specific professors in the field of human-computer interaction (HCI). It was designed through training on information specific to these professors and a validation processes in collaboration with these academics. This technical report delineates the system's architecture and offers a preliminary analysis of our deployment study. Additionally, this report covers the spectrum of questions posed to our chatbots by users. The system recorded 223 conversations, with participants responding positively to approximately 65% of responses. Our findings indicate that users who discuss and brainstorm their research interests with Gradschool.chat engage more deeply, often interacting with the chatbot twice as long compared to those who only pose questions about professors

    COMPUTING APPROXIMATE CUSTOMIZED RANKING

    Get PDF
    As the amount of information grows and as users become more sophisticated, ranking techniques become important building blocks to meet user needs when answering queries. PageRank is one of the most successful link-based ranking methods, which iteratively computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization. We address the scalability challenges for two types of customized ranking. The first challenge is to compute the ranking of a subgraph. Various Web applications focus on identifying a subgraph, such as focused crawlers and localized search engines. The second challenge is to compute online personalized ranking. Personalized search improves the quality of search results for each user. The user needs are represented by a personalized set of pages or personalized link importance in an entity relationship graph. This requires an efficient online computation. To solve the subgraph ranking problem efficiently, we estimate the ranking scores for a subgraph. We propose a framework of an exact solution (IdealRank) and an approximate solution (ApproxRank) for computing ranking on a subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node Λ\Lambda and modify the PageRank-style transition matrix with respect to Λ\Lambda. The IdealRank algorithm assumes that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge to the true PageRank scores. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. We analyze the L1L_1 distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the L1L_1 distance of the external pages. We demonstrate with real and synthetic data that ApproxRank provides a good approximation to PageRank for a variety of subgraphs. We consider online personalization using ObjectRank; it is an authority flow based ranking for entity relationship graphs. We formalize the concept of an aggregate surfer on a data graph; the surfer's behavior is controlled by multiple personalized rankings. We prove a linearity theorem over these rankings which can be used as a tool to scale this type of personalization. DataApprox uses a repository of precomputed rankings for a given set of link weights assignments. We define DataApprox as an optimization problem; it selects a subset of the precomputed rankings from the repository and produce a weighted combination of these rankings. We analyze the L1L_1 distance between the DataApprox scores and the real authority flow ranking scores and show that DataApprox has a theoretical bound. Our experiments on the DBLP data graph show that DataApprox performs well in practice and allows fast and accurate personalized authority flow ranking

    Using On-line Academic Forums

    Get PDF
    Comprend des références bibliographiques

    Gauging the Path of Private Canadian Pensions: 2010 Update on the State of Defined Benefit and Defined Contribution Pension Plans

    Get PDF
    The issue of under-funded defined benefit (DB) pension plans has become one of the most perplexing financial issues facing business executives, legislators and Canadian pensioners who are or will in the future be reliant on pension income as an important component of their overall retirement incomes. In 2004, CGA-Canada issued a comprehensive paper on defined benefit pension plans titled “Addressing the Pensions Dilemma in Canada”. The goal of that release was to advance understanding of DB pension plans and to impart a reasonable estimate of the standing of DB pension plans at December 31, 2003. In 2009, this analysis was further advanced by examining the funding status of private DB pension plans at December 31, 2008. The results of the analysis show that funding deficits have intensified with funding ratios eroding to unsustainable levels. The vast majority (92%) of private DB pension plans were in a deficit position as at December 31, 2008. The average funding ratio has decreased from 112% to 77% on a ‘without indexation’ basis and from 71% to 57% on a ‘with indexation’ basis. The aggregate funding shortfall is expected to exceed $350 billion.defined benefit pension plan, defined contribution pension plan, funding postion of pension plans, household savings, retirement savings, retirement income, household finance, pension accounting

    How Far Will You Go? Characterizing Online Search Stopping Behaviors Using Information Scent and Need for Cognition

    Get PDF
    This research sought to explain online searchers' stopping behaviors when interacting with search engine result pages (SERPs) using the theories of Information Scent and Need for Cognition (NFC). Specifically, the problems addressed were how: (1) information scent level, operationalized as the number of relevant documents on the first SERP, (2) information scent pattern, operationalized as the distribution of relevant and non-relevant results on the first SERP, and (3) NFC, a person's tendency to engage in and enjoy effortful cognitive activities measured by the Need for Cognition scale, impacted a person's search stopping behaviors. The two search stopping behaviors that were examined were query stopping, or the point at which a person decides to issue a new query, and task stopping, or the point at which a person decides to end the search task. A laboratory experiment was conducted with 48 participants, who were asked to gather information for six open-ended search tasks. Participants were interviewed about their search stopping behaviors at the end of the study using recordings of their search processes to stimulate recall. The results showed significant effects of Information Scent and NFC on search stopping behaviors. When there were more relevant results on the first SERP, participants examined more documents and explored deeper in the search results list. Participants' behaviors were also affected by the distribution of relevant results on the first SERP: when relevant results were found at the top of the SERP, participants left the SERP after viewing only the first few results. When participants encountered relevant results dispersed across the first SERP at the start of a search task, participants issued more queries subsequently to solve the search task. Participants with lower NFC searched deeper but reformulated queries less frequently during a task. Moreover, the time participants with lower NFC spent evaluating search results was more variable depending on the number of relevant results displayed on the first SERP than the time spent by higher NFC participants. Finally, participants reported that they tended to examine results beyond the first SERP when they conducted people, product, image and literature searches in daily life.Doctor of Philosoph

    Diverse and proportional size-1 object summaries for keyword search

    Get PDF
    The abundance and ubiquity of graphs (e.g., Online Social Networks such as Google+ and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a Data Subject (DS), a recently proposed relational keyword search paradigm produces, as a query result, a set of Object Summaries (OSs). An OS is a tree structure rooted at the DS node (i.e., a tuple containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. Size-l OSs are partial OSs containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper we investigate the effective and efficient generation of two novel types of OS snippets, i.e. diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, apart from the importance of each node, we also consider its frequency in the OS and its repetitions in the snippets. We conduct an extensive evaluation on two real graphs (DBLP and Google+). We verify effectiveness by collecting user feedback, e.g. by asking DBLP authors (i.e. the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.postprin

    A study of lawyers’ information behaviour leading to the development of two methods for evaluating electronic resources

    Get PDF
    In this thesis we examine the information behaviour displayed by a broad cross-section of academic and practicing lawyers and feed our findings into the development of the Information Behaviour (IB) methods - two novel methods for evaluating the functionality and usability of electronic resources. We captured lawyers’ information behaviour by conducting naturalistic observations, where we asked participants to think aloud whilst using existing resources to ‘find information required for their work.’ Lawyers’ information behaviours closely matched those observed in other disciplines by Ellis and others, serving to validate Ellis’s existing model in the legal domain. Our findings also extend Ellis’s model to include behaviours pertinent to legal information-seeking, broaden the scope of the model to cover information use (in addition to information-seeking) behaviours and enhance the potential analytical detail of the model through the identification of a range of behavioural ‘subtypes’ and levels at which behaviours can operate. The identified behaviours were used as the basis for developing two methods for evaluating electronic resources – the IB functionality method (which mainly involves examining whether and how information behaviours are currently, or might in future be, supported by an electronic resource) and the IB usability method (which involves setting users behaviour-focused tasks, asking them to think aloud whilst performing the tasks, and identifying usability issues from the think- aloud data). Finally the IB methods were themselves evaluated by stakeholders working for LexisNexis Butterworths – a large electronic legal resource development firm. Stakeholders were recorded using the methods and focus group and questionnaire data was collected, with the aim of ascertaining how usable, useful and learnable they considered the methods to be and how likely they would be to use them in future. Overall, findings were positive regarding both methods and useful suggestions for improving the methods were made

    Developing Data Stories as Enhanced Publications in Digital Humanities

    Get PDF
    This paper discusses the development of data-driven stories and the editorial processes underlying their production. Such ‘data stories’ have proliferated in journalism but are also increasingly developed within academia. Although ‘data stories’ lack a clear definition, there are similarities between the processes that underlie journalistic and academic data stories. However, there are also differences, specifically when it comes to epistemological claims. In this paper data stories as phenomenon and their use in journalism and in the Humanities form the context for the editorial protocol developed for CLARIAH Media Suite Data Stories
    corecore