17 research outputs found
Recommended from our members
Biomedical knowledge graph-optimized prompt generation for large language models
MotivationLarge Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge.ResultsCompared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion.Availability and implementationSPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository.Supplementary informationSupplementary data are available at Bioinformatics online
Biomedical knowledge graph-enhanced prompt generation for large language models
Large Language Models (LLMs) have been driving progress in AI at an
unprecedented rate, yet still face challenges in knowledge-intensive domains
like biomedicine. Solutions such as pre-training and domain-specific
fine-tuning add substantial computational overhead, and the latter require
domain-expertise. External knowledge infusion is task-specific and requires
model training. Here, we introduce a task-agnostic Knowledge Graph-based
Retrieval Augmented Generation (KG-RAG) framework by leveraging the massive
biomedical KG SPOKE with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to
generate meaningful biomedical text rooted in established knowledge. KG-RAG
consistently enhanced the performance of LLMs across various prompt types,
including one-hop and two-hop prompts, drug repurposing queries, biomedical
true/false questions, and multiple-choice questions (MCQ). Notably, KG-RAG
provides a remarkable 71% boost in the performance of the Llama-2 model on the
challenging MCQ dataset, demonstrating the framework's capacity to empower
open-source models with fewer parameters for domain-specific questions.
Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as
GPT-3.5 which exhibited improvement over GPT-4 in context utilization on MCQ
data. Our approach was also able to address drug repurposing questions,
returning meaningful repurposing suggestions. In summary, the proposed
framework combines explicit and implicit knowledge of KG and LLM, respectively,
in an optimized fashion, thus enhancing the adaptability of general-purpose
LLMs to tackle domain-specific questions in a unified framework.Comment: 28 pages, 5 figures, 2 tables, 1 supplementary fil
Producción de leche y grasa de vacas libres e infectadas subclínicamente con el virus de la leucosis bovina
The objective of the present study was to compare prospectively the milk and fat production, standardized to 305-day mature equivalent of bovine leukemia virus (BLV) naturally infected (89) and uninfected (104) cows. The animals were Holstein and Holstein x Chilean-Friesian, from a herd of the región Metropolitana, Chile, representing standard management for dairy cattle and with a known epidemiological condition related to BLV infection.BLV seropositive cows produced 6,501 ± 1,505 kg of milk and 168 ± 53 kg of fat (305-day, M.E), compared to 6,677 ± 1,728 of milk and 166 ± 46 kg of fat from seronegative cows. No statistical differences in milk and fat production were observed between the two groups of cows using variance analysis (P ≥ 0.05). No significant differences were also observed in milk and fat production, according to the number of lactations, between BLV seronegative and seropositive cows.Se comparó la producción real de leche y de grasa, proyectada a 305 días y estandarizada a madurez equivalente de 89 vacas seropositivas y 104 negativas a la infección con el virus leucosis bovina (VLB), con el objeto de evaluar el efecto del estado subclínico de infección. Los animales utilizados eran de raza Holstein y cruza Holstein x Frisón Negro, pertenecientes a una lechería de la Región Metropolitana, con manejo estándar para la zona y una situación conocida de la dinámica de la infección con el VLB. Los anticuerpos contra la glicoproteína del VLB fueron detectados por inmunodifusión en gel de agar en tres exámenes semestrales sucesivos. Los animales seropositivos presentaron una producción de 6.501 ± 1.505 kg de leche estandarizada a madurez equivalente y de 168 ± 53 kg de grasa, las cuales no fueron estadísticamente diferentes (p ≥ 0,05) a las vacas seronegativas con una producción de leche de 6.677 ± 1.728 kg y de grasa de 166 ± 46 kg. Cuando se compararon las producciones de leche y de grasa de ambos grupos de animales, de acuerdo al número de la lactancia, tampoco se observaron diferencias significativas. Se concluye que para las condiciones de manejo y los niveles productivos señalados, la infección subclínica con el VLB no afectaría la producción láctea
A Tabu Search Algorithm with Direct Representation for Strip Packing
Date du colloque : 04/2009International audienceThis paper introduces a new tabu search algorithm for a two-dimensional (2D) Strip Packing Problem (2D-SPP). It integrates several key features: A direct representation of the problem, a satisfaction-based solving scheme, two different complementary neighborhoods, a diversification mechanism and a particular tabu structure. The representation allows inexpensive basic operations. The solving scheme considers the 2D-SPP as a succession of satisfaction problems. The goal of the combination of two neighborhoods is (to try) to reduce the height of the packing while avoiding solutions with (hard to fill) tall and thin wasted spaces. Diversification relies on a set of historically “interesting” packings. The tabu structure avoids visiting similar packings. To assess the proposed approach, experimental results are shown on a set of well-known benchmark instances and compared with previously reported tabu search algorithms as well as the best performing algorithms.</p