406 research outputs found
Search Still Matters: Information Retrieval in the Era of Generative AI
Objective: Information retrieval (IR, also known as search) systems are
ubiquitous in modern times. How does the emergence of generative artificial
intelligence (AI), based on large language models (LLMs), fit into the IR
process? Process: This perspective explores the use of generative AI in the
context of the motivations, considerations, and outcomes of the IR process with
a focus on the academic use of such systems. Conclusions: There are many
information needs, from simple to complex, that motivate use of IR. Users of
such systems, particularly academics, have concerns for authoritativeness,
timeliness, and contextualization of search. While LLMs may provide
functionality that aids the IR process, the continued need for search systems,
and research into their improvement, remains essential.Comment: 7 pages, no figure
The TREC 2004 genomics track categorization task: classifying full text biomedical documents
BACKGROUND: The TREC 2004 Genomics Track focused on applying information retrieval and text mining techniques to improve the use of genomic information in biomedicine. The Genomics Track consisted of two main tasks, ad hoc retrieval and document categorization. In this paper, we describe the categorization task, which focused on the classification of full-text documents, simulating the task of curators of the Mouse Genome Informatics (MGI) system and consisting of three subtasks. One subtask of the categorization task required the triage of articles likely to have experimental evidence warranting the assignment of GO terms, while the other two subtasks were concerned with the assignment of the three top-level GO categories to each paper containing evidence for these categories. RESULTS: The track had 33 participating groups. The mean and maximum utility measure for the triage subtask was 0.3303, with a top score of 0.6512. No system was able to substantially improve results over simply using the MeSH term Mice. Analysis of significant feature overlap between the training and test sets was found to be less than expected. Sample coverage of GO terms assigned to papers in the collection was very sparse. Determining papers containing GO term evidence will likely need to be treated as separate tasks for each concept represented in GO, and therefore require much denser sampling than was available in the data sets. The annotation subtask had a mean F-measure of 0.3824, with a top score of 0.5611. The mean F-measure for the annotation plus evidence codes subtask was 0.3676, with a top score of 0.4224. Gene name recognition was found to be of benefit for this task. CONCLUSION: Automated classification of documents for GO annotation is a challenging task, as was the automated extraction of GO code hierarchies and evidence codes. However, automating these tasks would provide substantial benefit to biomedical curation, and therefore work in this area must continue. Additional experience will allow comparison and further analysis about which algorithmic features are most useful in biomedical document classification, and better understanding of the task characteristics that make automated classification feasible and useful for biomedical document curation. The TREC Genomics Track will be continuing in 2005 focusing on a wider range of triage tasks and improving results from 2004
Ethics and mono-disciplinarity: positivism, informed consent and informed participation
There are a number of pressures on researchers in academia and industry to behave unethically or compromise their ethical standards, for instance in order to obtain funding or publish frequently. In this paper a case study of Deaf telephony is used to discuss the pressures to unethical behaviour in terms of withholding information or misleading participants that can result from mono-disciplinary orthodoxies. The Deaf telephony system attempts to automate multiple aspects of relayed communication between Deaf and hearing users. The study is analysed in terms of consequentialist and deontological ethics, as well as multi-loop action learning. Discussion of a number of examples of bad practice is used to indicate both the compatibility of ethical behaviour and good scientific method and that ethical behaviour is a pre-requisite for obtaining meaningful results.Telkom, Cisco, Siemens, THRI
Synthesis of dinucleoside acylphosphonites by phosphonodiamidite chemistry and investigation of phosphorus epimerization
The reaction of the diamidite, (iPr2N)2PH, with acyl chlorides proceeds with the loss of HCl to give the corresponding acyl diamidites, RC(O)P(N(iPr)2)2 (R = Me (7), Ph (9)), without the intervention of sodium to give a phosphorus anion. The structure of 9 was confirmed by single-crystal X-ray diffraction. The coupling of the diamidites 7 and 9 with 5′-O-DMTr-thymidine was carried out with N-methylimidazolium triflate as the activator to give the monoamidites 3′-O-(P(N(iPr)2)C(O)R)-5′-O-DMTr-thymidine, and further coupling with 3′-O-(tert-butyldimethylsilyl)thymidine was carried out with activation by pyridinium trifluoroacetate/Nmethylimidazole. The new dinucleoside acylphosphonites could be further oxidized, hydrolyzed to the H-phosphonates, and sulfurized to give the known mixture of diastereomeric phosphorothioates. The goal of this work was the measurement of the barrier to inversion of the acylphosphonites, which was expected to be low by analogy to the low barrier found in acylphosphines. However, the barrier was found to be high as no epimerization was detected up to 150 °C, and consistent with this, density functional theory calculations give an inversion barrier of over 40 kcal/mol
GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment
Effective biomedical literature retrieval (BLR) plays a central role in
precision medicine informatics. In this paper, we propose GRAPHENE, which is a
deep learning based framework for precise BLR. GRAPHENE consists of three main
different modules 1) graph-augmented document representation learning; 2) query
expansion and representation learning and 3) learning to rank biomedical
articles. The graph-augmented document representation learning module
constructs a document-concept graph containing biomedical concept nodes and
document nodes so that global biomedical related concept from external
knowledge source can be captured, which is further connected to a BiLSTM so
both local and global topics can be explored. Query expansion and
representation learning module expands the query with abbreviations and
different names, and then builds a CNN-based model to convolve the expanded
query and obtain a vector representation for each query. Learning to rank
minimizes a ranking loss between biomedical articles with the query to learn
the retrieval function. Experimental results on applying our system to TREC
Precision Medicine track data are provided to demonstrate its effectiveness.Comment: CIKM 201
The MERG Suite: Tools for discovering competencies and associated learning resources
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens
A stimulus to define informatics and health information technology
<p>Abstract</p> <p>Background</p> <p>Despite the growing interest by leaders, policy makers, and others, the terminology of health information technology as well as biomedical and health informatics is poorly understood and not even agreed upon by academics and professionals in the field.</p> <p>Discussion</p> <p>The paper, presented as a Debate to encourage further discussion and disagreement, provides definitions of the major terminology used in biomedical and health informatics and health information technology. For informatics, it focuses on the words that modify the term as well as individuals who practice the discipline. Other categories of related terms are covered as well, from the associated disciplines of computer science, information technolog and health information management to the major application categories of applications used. The discussion closes with a classification of individuals who work in the largest segment of the field, namely clinical informatics.</p> <p>Summary</p> <p>The goal of presenting in Debate format is to provide a starting point for discussion to reach a documented consensus on the definition and use of these terms.</p
Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection
Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval system
A Standardised Format for Exchanging User Study Instruments
Increasing re-use in Interactive Information Retrieval (IIR) has been an ongoing aim in IIR for a significant amount of time, however progress has been limited and patchy. While re-use of some study aspects can be difficult due to the varied nature of IIR studies, the use of pre- and post-task self-reported measures is widespread and relatively standardised. Nevertheless, re-use of elements in this area is also limited, in part because systems used to implement them are not able to exchange question, instruments, or complete study setups. To address this, this paper presents a standardised, but extendable, format for IIR survey instrument exchange
- …