Search CORE

406 research outputs found

Search Still Matters: Information Retrieval in the Era of Generative AI

Author: Hersh William R.
Publication venue
Publication date: 17/12/2023
Field of study

Objective: Information retrieval (IR, also known as search) systems are ubiquitous in modern times. How does the emergence of generative artificial intelligence (AI), based on large language models (LLMs), fit into the IR process? Process: This perspective explores the use of generative AI in the context of the motivations, considerations, and outcomes of the IR process with a focus on the academic use of such systems. Conclusions: There are many information needs, from simple to complex, that motivate use of IR. Users of such systems, particularly academics, have concerns for authoritativeness, timeliness, and contextualization of search. While LLMs may provide functionality that aids the IR process, the continued need for search systems, and research into their improvement, remains essential.Comment: 7 pages, no figure

arXiv.org e-Print Archive

The TREC 2004 genomics track categorization task: classifying full text biomedical documents

Author: Cohen Aaron M
Hersh William R
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The TREC 2004 Genomics Track focused on applying information retrieval and text mining techniques to improve the use of genomic information in biomedicine. The Genomics Track consisted of two main tasks, ad hoc retrieval and document categorization. In this paper, we describe the categorization task, which focused on the classification of full-text documents, simulating the task of curators of the Mouse Genome Informatics (MGI) system and consisting of three subtasks. One subtask of the categorization task required the triage of articles likely to have experimental evidence warranting the assignment of GO terms, while the other two subtasks were concerned with the assignment of the three top-level GO categories to each paper containing evidence for these categories. RESULTS: The track had 33 participating groups. The mean and maximum utility measure for the triage subtask was 0.3303, with a top score of 0.6512. No system was able to substantially improve results over simply using the MeSH term Mice. Analysis of significant feature overlap between the training and test sets was found to be less than expected. Sample coverage of GO terms assigned to papers in the collection was very sparse. Determining papers containing GO term evidence will likely need to be treated as separate tasks for each concept represented in GO, and therefore require much denser sampling than was available in the data sets. The annotation subtask had a mean F-measure of 0.3824, with a top score of 0.5611. The mean F-measure for the annotation plus evidence codes subtask was 0.3676, with a top score of 0.4224. Gene name recognition was found to be of benefit for this task. CONCLUSION: Automated classification of documents for GO annotation is a challenging task, as was the automated extraction of GO code hierarchies and evidence codes. However, automating these tasks would provide substantial benefit to biomedical curation, and therefore work in this area must continue. Additional experience will allow comparison and further analysis about which algorithmic features are most useful in biomedical document classification, and better understanding of the task characteristics that make automated classification feasible and useful for biomedical document curation. The TREC Genomics Track will be continuing in 2005 focusing on a wider range of triage tasks and improving results from 2004

Springer - Publisher Connector

PubMed Central

Ethics and mono-disciplinarity: positivism, informed consent and informed participation

Author: Hersh Marion A.
Tucker William David
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

There are a number of pressures on researchers in academia and industry to behave unethically or compromise their ethical standards, for instance in order to obtain funding or publish frequently. In this paper a case study of Deaf telephony is used to discuss the pressures to unethical behaviour in terms of withholding information or misleading participants that can result from mono-disciplinary orthodoxies. The Deaf telephony system attempts to automate multiple aspects of relayed communication between Deaf and hearing users. The study is analysed in terms of consequentialist and deontological ethics, as well as multi-loop action learning. Discussion of a number of examples of bad practice is used to indicate both the compatibility of ethical behaviour and good scientific method and that ethical behaviour is a pre-requisite for obtaining meaningful results.Telkom, Cisco, Siemens, THRI

Crossref

Enlighten

University of the Western Cape Research Repository

Synthesis of dinucleoside acylphosphonites by phosphonodiamidite chemistry and investigation of phosphorus epimerization

Author: Hersh William H.
Publication venue: CUNY Academic Works
Publication date: 01/01/2015
Field of study

The reaction of the diamidite, (iPr2N)2PH, with acyl chlorides proceeds with the loss of HCl to give the corresponding acyl diamidites, RC(O)P(N(iPr)2)2 (R = Me (7), Ph (9)), without the intervention of sodium to give a phosphorus anion. The structure of 9 was confirmed by single-crystal X-ray diffraction. The coupling of the diamidites 7 and 9 with 5′-O-DMTr-thymidine was carried out with N-methylimidazolium triflate as the activator to give the monoamidites 3′-O-(P(N(iPr)2)C(O)R)-5′-O-DMTr-thymidine, and further coupling with 3′-O-(tert-butyldimethylsilyl)thymidine was carried out with activation by pyridinium trifluoroacetate/Nmethylimidazole. The new dinucleoside acylphosphonites could be further oxidized, hydrolyzed to the H-phosphonates, and sulfurized to give the known mixture of diastereomeric phosphorothioates. The goal of this work was the measurement of the barrier to inversion of the acylphosphonites, which was expected to be low by analogy to the low barrier found in acylphosphines. However, the barrier was found to be high as no epimerization was detected up to 150 °C, and consistent with this, density functional theory calculations give an inversion barrier of over 40 kcal/mol

City University of New York

Directory of Open Access Journals

PubMed Central

GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment

Author: Devlin Jacob
Hersh William
Jaana Kalervo
Roberts Kirk
Robertson Stephen E
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/11/2019
Field of study

Effective biomedical literature retrieval (BLR) plays a central role in precision medicine informatics. In this paper, we propose GRAPHENE, which is a deep learning based framework for precise BLR. GRAPHENE consists of three main different modules 1) graph-augmented document representation learning; 2) query expansion and representation learning and 3) learning to rank biomedical articles. The graph-augmented document representation learning module constructs a document-concept graph containing biomedical concept nodes and document nodes so that global biomedical related concept from external knowledge source can be captured, which is further connected to a BiLSTM so both local and global topics can be explored. Query expansion and representation learning module expands the query with abbreviations and different names, and then builds a CNN-based model to convolve the expanded query and obtain a vector representation for each query. Learning to rank minimizes a ranking loss between biomedical articles with the query to learn the retrieval function. Experimental results on applying our system to TREC Precision Medicine track data are provided to demonstrate its effectiveness.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

The MERG Suite: Tools for discovering competencies and associated learning resources

Author: C Fallon
DJ Brailer
Michael Fordis
P Batalden
Peter S Greene
Ravi Teja Bhupatiraju
RM Harden
Valerie Smothers
William R Hersh
WR Hersh
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

A stimulus to define informatics and health information technology

Author: A Butte
A Dohm
A Hasman
Anonymous
Anonymous
C Friedman
C Friedman
C Mulrow
C Safran
D Detmer
D Detmer
E Bernstam
E Zerhouni
F Mostashari
F Mostashari
H Covvey
H Oh
J Moehr
M Bloomrosen
M Collen
M Dawes
P Embi
R Fletcher
R Greenes
S Bakken
W Hersh
W Hersh
W Hersh
W Hersh
W Hersh
William Hersh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Despite the growing interest by leaders, policy makers, and others, the terminology of health information technology as well as biomedical and health informatics is poorly understood and not even agreed upon by academics and professionals in the field. Discussion The paper, presented as a Debate to encourage further discussion and disagreement, provides definitions of the major terminology used in biomedical and health informatics and health information technology. For informatics, it focuses on the words that modify the term as well as individuals who practice the discipline. Other categories of related terms are covered as well, from the associated disciplines of computer science, information technolog and health information management to the major application categories of applications used. The discussion closes with a classification of individuals who work in the largest segment of the field, namely clinical informatics. Summary The goal of presenting in Debate format is to provide a starting point for discussion to reach a documented consensus on the definition and use of these terms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection

Author: Gorman Paul N.
Hersh William R.
Jensen Jeffery R.
Müller Henning
Ruch Patrick
Yang Jianji
Publication venue
Publication date: 02/08/2017
Field of study

Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval system

RERO DOC Digital Library

A Standardised Format for Exchanging User Study Instruments

Author: Bierig Ralf
Bogers Toine
Elaine
Hersh William
Ingwersen Peter
Maria
Nordlie Ragnar
Toms Elaine G
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

Increasing re-use in Interactive Information Retrieval (IIR) has been an ongoing aim in IIR for a significant amount of time, however progress has been limited and patchy. While re-use of some study aspects can be difficult due to the varied nature of IIR studies, the use of pre- and post-task self-reported measures is widespread and relatively standardised. Nevertheless, re-use of elements in this area is also limited, in part because systems used to implement them are not able to exchange question, instruments, or complete study setups. To address this, this paper presents a standardised, but extendable, format for IIR survey instrument exchange

Crossref

Open Research Online

VBN