84 research outputs found
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
We present a new text-to-SQL dataset for electronic health records (EHRs).
The utterances were collected from 222 hospital staff, including physicians,
nurses, insurance review and health records teams, and more. To construct the
QA dataset on structured EHR data, we conducted a poll at a university hospital
and templatized the responses to create seed questions. Then, we manually
linked them to two open-source EHR databases, MIMIC-III and eICU, and included
them with various time expressions and held-out unanswerable questions in the
dataset, which were all collected from the poll. Our dataset poses a unique set
of challenges: the model needs to 1) generate SQL queries that reflect a wide
range of needs in the hospital, including simple retrieval and complex
operations such as calculating survival rate, 2) understand various time
expressions to answer time-sensitive questions in healthcare, and 3)
distinguish whether a given question is answerable or unanswerable based on the
prediction confidence. We believe our dataset, EHRSQL, could serve as a
practical benchmark to develop and assess QA models on structured EHR data and
take one step further towards bridging the gap between text-to-SQL research and
its real-life deployment in healthcare. EHRSQL is available at
https://github.com/glee4810/EHRSQL.Comment: Published as a conference paper at NeurIPS 2022 (Track on Datasets
and Benchmarks)
Alterations in Brain Morphometric Networks and Their Relationship with Memory Dysfunction in Patients with Type 2 Diabetes Mellitus
Cognitive dysfunction, a significant complication of type 2 diabetes mellitus (T2DM), can potentially manifest even from the early stages of the disease. Despite evidence of global brain atrophy and related cognitive dysfunction in early-stage T2DM patients, specific regions vulnerable to these changes have not yet been identified. The study enrolled patients with T2DM of less than five years’ duration and without chronic complications (T2DM group, n=100) and demographically similar healthy controls (control group, n=50). High-resolution T1-weighted magnetic resonance imaging data were subjected to independent component analysis to identify structurally significant components indicative of morphometric networks. Within these networks, the groups’ gray matter volumes were compared, and distinctions in memory performance were assessed. In the T2DM group, the relationship between changes in gray matter volume within these networks and declines in memory performance was examined. Among the identified morphometric networks, the T2DM group exhibited reduced gray matter volumes in both the precuneus (Bonferroni-corrected p=0.003) and insular-opercular (Bonferroni-corrected p=0.024) networks relative to the control group. Patients with T2DM demonstrated significantly lower memory performance than the control group (p=0.001). In the T2DM group, reductions in gray matter volume in both the precuneus (r=0.316, p=0.001) and insular-opercular (r=0.199, p=0.047) networks were correlated with diminished memory performance. Our findings indicate that structural alterations in the precuneus and insular-opercular networks, along with memory dysfunction, can manifest within the first 5 years following a diagnosis of T2DM
Glial cell proteome using targeted quantitative methods for potential multi-diagnostic biomarkers
Glioblastoma is one of the most malignant primary brain cancer. Despite surgical resection with modern technology followed by chemo-radiation therapy with temozolomide, resistance to the treatment and recurrence is common due to its aggressive and infiltrating nature of the tumor with high proliferation index. The median survival time of the patients with glioblastomas is less than 15 months. Till now there has been no report of molecular target specific for glioblastomas. Early diagnosis and development of molecular target specific for glioblastomas are essential for longer survival of the patients with glioblastomas. Development of biomarkers specific for glioblastomas is most important for early diagnosis, estimation of the prognosis, and molecular target therapy of glioblastomas. To that end, in this study, we have conducted a comprehensive proteome study using primary cells and tissues from patients with glioblastoma. In the discovery stage, we have identified 7429 glioblastoma-specific proteins, where 476 proteins were quantitated using Tandem Mass Tag (TMT) method; 228 and 248 proteins showed up and down-regulated pattern, respectively. In the validation stage (20 selected target proteins), we developed quantitative targeted method (MRM: Multiple reaction monitoring) using stable isotope standards (SIS) peptide. In this study, five proteins (CCT3, PCMT1, TKT, TOMM34, UBA1) showed the significantly different protein levels (t-test: p value ≤ 0.05, AUC ≥ 0.7) between control and cancer groups and the result of multiplex assay using logistic regression showed the 5-marker panel showed better sensitivity (0.80 and 0.90), specificity (0.92 and 1.00), error rate (10 and 2%), and AUC value (0.94 and 0.98) than the best single marker (TOMM34) in primary cells and tissues, respectively. Although we acknowledge that the model requires further validation in a large sample size, the 5 protein marker panel can be used as baseline data for the discovery of novel biomarkers of the glioblastoma.This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. NRF-2017M3A9G4052982, NRF-2022M3A9G8082637) This research was partly supported by the Bio & Medical Technology Development Program of the National Research Foundation (Grant Nos. 2015M3C7A1028926 & 2020M3A9G8022029); the National Research Foundation of Korea Grant (Grant No. NRF2017M3C7A1047392) of the Ministry of Science and ICT, Republic of Korea; the Korea Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program (KGM456212109816); Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (21YB1500); Soonchunhyang University Research Fund; the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2023R1A2C200769911). H.J. Oh was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1C1C1011255) and a Korea University Gran
A complete chloroplast genome sequence of Viola albida Palibin 1899 (Violaceae), a member of VIOLA ALBIDA complex
The VIOLA ALBIDA complex is a complicated group with taxonomic problems having continuous leaf variations and composed of taxa related to the following names: Viola albida, V. albida var. takahashii, and V. chaerophylloides. As a first step to understanding the genomic nature of this complex, this study identified the whole chloroplast genome of V. albida. The genome is 157,692 bp in length (36.3% of GC content) and contains four subregions: a large single copy region of 86,220 bp, a small single copy region of 17,248 bp, and a pair of inverted regions of 27,112 bp each. An annotation of the gene identifies 111 unique genes, including 77 protein-coding genes, four rRNA genes, and 30 tRNA genes. The phylogenetic analysis of this genome with selected cp genomes from Viola identifies the close relationship between V. albida and V. ulleungdoensis. It is noteworthy that V. chaerophylloides, traditionally recognized as a member of the VIOLA ALBIDA complex, is genetically distant from V. albida and forms a sister group of all other members of the subsection Patellares. Our genome report is expected to serve as a basis for understanding the identity of the VIOLA ALBIDA complex
A chloroplast genome sequence of Viola arcuata distributed in Korea
Recently, the chloroplast genome of Viola verecunda from a sample collected in Japan has been published. Although the name is often recognized as a taxonomic synonym of Viola arcuata, the genetic identity of the two species has never been compared intensively. We report the complete chloroplast genome sequence of V. arcuata, of which sample was collected from Seoul, Korea. The cp genome of V. arcuata (OM301625) has 157,870 bp in length and is composed of four regions: 86,366 bp of a large single-copy (LSC) region, 17,298 bp of a small single-copy (SSC) region, and 27,103 bp of a pair of inverted repeats (IRs). The complete genome contains 130 genes, including 84 protein-coding genes, eight rRNA genes, and 37 tRNA genes. When comparing chloroplast genomes between V. verecunda, and V. arcuata, 34 different loci were recognized: 12 SNPs and 22 indels. In the coding regions, there were two amino acid insertions (ndhI) caused by one base deletion, three synonymous substitutions (ndhF, ccsA, and ndhI), and six nonsynonymous substitutions (matK, rpoC2, ndhF, ycf1, and two rpl2s on each IR region). In non-coding regions, variants of 19 polyN sites, one microsatellite, two insertions, and two SNPs were recognized. Phylogenetic analysis confirms a sister or nearly identical relationship between two genomes. This study will provide the genetic basis for solving a taxonomic problem between V. arcuata and V. verecunda
Auraptene, a Major Compound of Supercritical Fluid Extract of Phalsak (Citrus Hassaku Hort ex Tanaka), Induces Apoptosis through the Suppression of mTOR Pathways in Human Gastric Cancer SNU-1 Cells
The supercritical extraction method is a widely used process to obtain volatile and nonvolatile compounds by avoiding thermal degradation and solvent residue in the extracts. In search of phytochemicals with potential therapeutic application in gastric cancer, the supercritical fluid extract (SFE) of phalsak (Citrus hassaku Hort ex Tanaka) fruits was analyzed by gas chromatography-mass spectrometry (GC-MS). Compositional analysis in comparison with the antiproliferative activities of peel and flesh suggested auraptene as the most prominent anticancer compound against gastric cancer cells. SNU-1 cells were the most susceptible to auraptene-induced toxicity among the tested gastric cancer cell lines. Auraptene induced the death of SNU-1 cells through apoptosis, as evidenced by the increased cell population in the sub-G1 phase, the appearance of fragmented nuclei, the proteolytic cleavage of caspase-3 and poly(ADP-ribose) polymerase (PARP) protein, and depolarization of the mitochondrial membrane. Interestingly, auraptene induces an increase in the phosphorylation of Akt, which is reminiscent of the effect of rapamycin, the mTOR inhibitor that triggers a negative feedback loop on Akt/mTOR pathway. Taken together, these findings provide valuable insights into the anticancer effects of the SFE of the phalsak peel by revealing that auraptene, the major compound of it, induced apoptosis in accompanied with the inhibition of mTOR in SNU-1 cells
The Mechanical Aspects of Formation and Application of PDMS Bilayers Rolled into a Cylindrical Structure
A polydimethylsiloxane (PDMS) film with its surface
being oxidized by a plasma treatment or a UV-ozone
(UVO) treatment, that is, a bilayer made of PDMS and
its oxidized surface layer, is known to roll into a
cylindrical structure upon exposure to the chloroform
vapor due to the mismatch in the swelling ratio
between PDMS and the oxidized layer by the chloroform
vapor. Here we analyzed the formation of the rolled
bilayer with the mechanical aspects: how the mismatch
in the swelling ratio of the bilayer induces rolling
of the bilayer, why any form of trigger that breaks
the symmetry in the in-plane stress level is needed to
roll the bilayer uniaxially, why the rolled bilayer
does not unroll in the dry state when there is no more
mismatch in the swelling ratio, and how the measured
curvature of rolled bilayer matches well with the
prediction by the theory. Moreover, for the use of the
rolled bilayer as the channel of the microfluidic
device, we examined whether the rolled bilayer
deforms or unrolls by the flow of the aqueous solution
that exerts the circumferential stress on the rolled
bilayer
Emissions of Volatile Organic Compounds (VOCs) from an Open-Circuit Dry Cleaning Machine Using a Petroleum-Based Organic Solvent: Implications for Impacts on Air Quality
Volatile organic compounds (VOCs) are known to play an important role in tropospheric chemistry, contributing to ozone and secondary organic aerosol (SOA) generation. Laundry facilities, using petroleum-based organic solvents, are one of the sources of VOCs emissions. However, little is known about the significance of VOCs, emitted from laundry facilities, in the ozone and SOA generation. In this study, we characterized VOCs emission from a dry-cleaning process using petroleum-based organic solvents. We also assessed the impact of the VOCs on air quality by using photochemical ozone creation potential and secondary organic aerosol potential. Among 94 targeted compounds including toxic organic air pollutants and ozone precursors, 36 compounds were identified in the exhaust gas from a drying machine. The mass emitted from one cycle of drying operation (40 min) was the highest in decane (2.04 g/dry cleaning). Decane, nonane, and n-undecane were the three main contributors to ozone generation (more than 70% of the total generation). N-undecane, decane, and n-dodecane were the three main contributors to the SOA generation (more than 80% of the total generation). These results help to understand VOCs emission from laundry facilities and impacts on air quality
Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation
Graph pattern cardinality estimation is the problem of estimating the number of embeddings |M| of a query graph in a data graph. This fundamental problem arises, for example, during query planning in subgraph matching algorithms. There are two major approaches to solving the problem: sampling and synopsis. Synopsis (or summary)-based methods are fast and accurate if synopses capture information of graphs well. However, these methods suffer from large errors due to loss of information during summarization and inherent assumptions. Sampling-based methods are unbiased but suffer from large estimation variance due to large sample space. To address these limitations, we propose Alley, a hybrid method that combines both sampling and synopses. Alley employs 1) a novel sampling strategy, random walk with intersection, which effectively reduces the sample space, 2) branching to further reduce variance, and 3) a novel mining approach that extracts and indexes tangled patterns as synopses which are inherently difficult to estimate by sampling. By using them in the online estimation phase, we can effectively reduce the sample space while still ensuring unbiasedness. We establish that Alley has worst-case optimal runtime and approximation quality guarantees for any given error bound and required confidence . In addition to the theoretical aspect of Alley, our extensive experiments show that Alley outperforms the state-of-the-art methods by up to orders of magnitude higher accuracy with similar efficiency.1
- …