1,490 research outputs found
Balancing human and system visualization during document triage
People must frequently sort through and identify relevant materials from a large
set of documents. Document triage is a specific form of information collecting where
people quickly evaluate a large set of documents from the Internet by reading (or
skimming) documents and organizing them into a personal information collection.
During triage people can re-read documents, progressively refine their organization, and
share results with others. People usually perform triage using multiple applications in
concert: a search engine interface presents lists of potentially relevant documents; a
document reader displays their content; and a text editor or a more specialized
application records notes and assessments. However, people often become disoriented
while switching between these subtasks in document triage. This can hinder the
interaction between the subtasks and can distract people from focusing on documents of
interest. To support document triage, we have developed an environment that infers
users’ interests based on their interactions with multiple applications and on an analysis
of the characteristics and content of the documents they are interacting with. The
inferred user interest is used to relieve disorientation by generating visualizations in
multiple applications that help people find documents of interest as well as interesting
sections within documents
Toward Theoretical Techniques for Measuring the Use of Human Effort in Visual Analytic Systems
Visual analytic systems have long relied on user studies and standard datasets to demonstrate advances to the state of the art, as well as to illustrate the efficiency of solutions to domain-specific challenges. This approach has enabled some important comparisons between systems, but unfortunately the narrow scope required to facilitate these comparisons has prevented many of these lessons from being generalized to new areas. At the same time, advanced visual analytic systems have made increasing use of human-machine collaboration to solve problems not tractable by machine computation alone. To continue to make progress in modeling user tasks in these hybrid visual analytic systems, we must strive to gain insight into what makes certain tasks more complex than others. This will require the development of mechanisms for describing the balance to be struck between machine and human strengths with respect to analytical tasks and workload. In this paper, we argue for the necessity of theoretical tools for reasoning about such balance in visual analytic systems and demonstrate the utility of the Human Oracle Model for this purpose in the context of sensemaking in visual analytics. Additionally, we make use of the Human Oracle Model to guide the development of a new system through a case study in the domain of cybersecurity
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.
Published literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate. In addition, biomedical document classification often amounts to identifying a small subset of relevant publications within a much larger collection of available documents. As such, addressing class imbalance is essential to a practical classifier. We present here an effective classification scheme for automatically identifying papers among a large pool of biomedical publications that contain information relevant to a specific topic, which the curators are interested in annotating. The proposed scheme is based on a meta-classification framework using cluster-based under-sampling combined with named-entity recognition and statistical feature selection strategies. We examined the performance of our method over a large imbalanced data set that was originally manually curated by the Jackson Laboratory\u27s Gene Expression Database (GXD). The set consists of more than 90 000 PubMed abstracts, of which about 13 000 documents are labeled as relevant to GXD while the others are not relevant. Our results, 0.72 precision, 0.80 recall and 0.75 f-measure, demonstrate that our proposed classification scheme effectively categorizes such a large data set in the face of data imbalance
Recommended from our members
Behind the screens: Clinical decision support methodologies - A review
Clinical decision support systems (CDSSs) are interactive software systems designed to assist clinicians with decision making tasks, such as determining diagnosis of patient data. CDSSs are a widely researched topic in the Computer Science community but their workings are less well understood by clinicians. The purpose of this review is to introduce clinicians and policy makers to the most commonly computer-based methodologies employed to construct decision models to compute clinical decisions in a non-technical manner. We hope that a better understanding of CDSSs will open up discussion about the future of CDSSs as a part of healthcare delivery as well as engage clinicians and policy makers in the development and deployment of CDSSs that can meaningfully help with decision making tasks
- …