79 research outputs found
Extracting detailed oncologic history and treatment plan from medical oncology notes with large language models
Both medical care and observational studies in oncology require a thorough
understanding of a patient's disease progression and treatment history, often
elaborately documented in clinical notes. Despite their vital role, no current
oncology information representation and annotation schema fully encapsulates
the diversity of information recorded within these notes. Although large
language models (LLMs) have recently exhibited impressive performance on
various medical natural language processing tasks, due to the current lack of
comprehensively annotated oncology datasets, an extensive evaluation of LLMs in
extracting and reasoning with the complex rhetoric in oncology notes remains
understudied. We developed a detailed schema for annotating textual oncology
information, encompassing patient characteristics, tumor characteristics,
tests, treatments, and temporality. Using a corpus of 10 de-identified breast
cancer progress notes at University of California, San Francisco, we applied
this schema to assess the abilities of three recently-released LLMs (GPT-4,
GPT-3.5-turbo, and FLAN-UL2) to perform zero-shot extraction of detailed
oncological history from two narrative sections of clinical progress notes. Our
team annotated 2750 entities, 2874 modifiers, and 1623 relationships. The GPT-4
model exhibited overall best performance, with an average BLEU score of 0.69,
an average ROUGE score of 0.72, and an average accuracy of 67% on complex tasks
(expert manual evaluation). Notably, it was proficient in tumor characteristic
and medication extraction, and demonstrated superior performance in inferring
symptoms due to cancer and considerations of future medications. The analysis
demonstrates that GPT-4 is potentially already usable to extract important
facts from cancer progress notes needed for clinical research, complex
population management, and documenting quality patient care.Comment: Source code available at:
https://github.com/MadhumitaSushil/OncLLMExtractio
On-Sky Operations with the ALES Integral Field Spectrograph
The integral field spectrograph configuration of the LMIRCam science camera
within the Large Binocular Telescope Interferometer (LBTI) facilitates 2 to 5
m spectroscopy of directly imaged gas-giant exoplanets. The mode, dubbed
ALES, comprises magnification optics, a lenslet array, and direct-vision
prisms, all of which are included within filter wheels in LMIRCam. Our
observing approach includes manual adjustments to filter wheel positions to
optimize alignment, on/off nodding to track sky-background variations, and
wavelength calibration using narrow band filters in series with ALES optics.
For planets with separations outside our 1"x1" field of view, we use a
three-point nod pattern to visit the primary, secondary and sky. To minimize
overheads we select the longest exposure times and nod periods given observing
conditions, especially sky brightness and variability. Using this strategy we
collected several datasets of low-mass companions to nearby stars
Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif
MicroRNAs modulate tumorigenesis through suppression of specific genes. As many tumour types rely on overlapping oncogenic pathways, a core set of microRNAs may exist, which consistently drives or suppresses tumorigenesis in many cancer types. Here we integrate The Cancer Genome Atlas (TCGA) pan-cancer data set with a microRNA target atlas composed of publicly available Argonaute Crosslinking Immunoprecipitation (AGO-CLIP) data to identify pan-tumour microRNA drivers of cancer. Through this analysis, we show a pan-cancer, coregulated oncogenic microRNA âsuperfamilyâ consisting of the miR-17, miR-19, miR-130, miR-93, miR-18, miR-455 and miR-210 seed families, which cotargets critical tumour suppressors via a central GUGC core motif. We subsequently define mutations in microRNA target sites using the AGO-CLIP microRNA target atlas and TCGA exome-sequencing data. These combined analyses identify pan-cancer oncogenic cotargeting of the phosphoinositide 3-kinase, TGFβ and p53 pathways by the miR-17-19-130 superfamily members
Integrated Genomic Analysis of the 8q24 Amplification in Endometrial Cancers Identifies ATAD2 as Essential to MYC-Dependent Cancers
Chromosome 8q24 is the most commonly amplified region across multiple cancer types, and the typical length of the amplification suggests that it may target additional genes to MYC. To explore the roles of the genes most frequently included in 8q24 amplifications, we analyzed the relation between copy number alterations and gene expression in three sets of endometrial cancers (N = 252); and in glioblastoma, ovarian, and breast cancers profiled by TCGA. Among the genes neighbouring MYC, expression of the bromodomain-containing gene ATAD2 was the most associated with amplification. Bromodomain-containing genes have been implicated as mediators of MYC transcriptional function, and indeed ATAD2 expression was more closely associated with expression of genes known to be upregulated by MYC than was MYC itself. Amplifications of 8q24, expression of genes downstream from MYC, and overexpression of ATAD2 predicted poor outcome and increased from primary to metastatic lesions. Knockdown of ATAD2 and MYC in seven endometrial and 21 breast cancer cell lines demonstrated that cell lines that were dependent on MYC also depended upon ATAD2. These same cell lines were also the most sensitive to the histone deacetylase (HDAC) inhibitor Trichostatin-A, consistent with prior studies identifying bromodomain-containing proteins as targets of inhibition by HDAC inhibitors. Our data indicate high ATAD2 expression is a marker of aggressive endometrial cancers, and suggest specific inhibitors of ATAD2 may have therapeutic utility in these and other MYC-dependent cancers
Characterizing genomic alterations in cancer by complementary functional associations.
Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment. We used REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations, demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes
Digitization workflows for flat sheets and packets of plants, algae, and fungi
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/141708/1/aps31500065.pd
Recommended from our members
Absolute quantification of somatic DNA alterations in human cancer
We developed a computational method (ABSOLUTE) that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. ABSOLUTE can detect subclonal heterogeneity, somatic homozygosity, and calculate statistical sensitivity to detect specific aberrations. We used ABSOLUTE to analyze ovarian cancer data and identified pervasive subclonal somatic point mutations. In contrast, mutations occurring in key tumor suppressor genes, TP53 and NF1 were predominantly clonal and homozygous, as were mutations in a candidate tumor suppressor gene, CDK12. Analysis of absolute allelic copy-number profiles from 3,155 cancer specimens revealed that genome-doubling events are common in human cancer, and likely occur in already aneuploid cells. By correlating genome-doubling status with mutation data, we found that homozygous mutations in NF1 occurred predominantly in non-doubled samples. This finding suggests that genome doubling influences the pathways of tumor progression, with recessive inactivation being less common after genome doubling
Digitization Workflows for Flat Sheets and Packets of Plants, Algae, and Fungi
Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundationâs (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7â36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged
Recommended from our members
Comprehensive molecular characterization of gastric adenocarcinoma
Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for EpsteinâBarr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also knownasPD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies
Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer
Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3 and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity. Conversely, GATA3 mutations and high expression characterized Luminal A IDC, suggesting differential modulation of ER activity in ILC and IDC. Proliferation and immune-related signatures determined three ILC transcriptional subtypes associated with survival differences. Mixed IDC/ILC cases were molecularly classified as ILC-like and IDC-like revealing no true hybrid features. This multidimensional molecular atlas sheds new light on the genetic bases of ILC and provides potential clinical options
- âŚ