79 research outputs found

    EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

    Full text link
    We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff, including physicians, nurses, insurance review and health records teams, and more. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and templatized the responses to create seed questions. Then, we manually linked them to two open-source EHR databases, MIMIC-III and eICU, and included them with various time expressions and held-out unanswerable questions in the dataset, which were all collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable based on the prediction confidence. We believe our dataset, EHRSQL, could serve as a practical benchmark to develop and assess QA models on structured EHR data and take one step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.Comment: Published as a conference paper at NeurIPS 2022 (Track on Datasets and Benchmarks)

    A chloroplast genome sequence of Viola arcuata distributed in Korea

    No full text
    Recently, the chloroplast genome of Viola verecunda from a sample collected in Japan has been published. Although the name is often recognized as a taxonomic synonym of Viola arcuata, the genetic identity of the two species has never been compared intensively. We report the complete chloroplast genome sequence of V. arcuata, of which sample was collected from Seoul, Korea. The cp genome of V. arcuata (OM301625) has 157,870 bp in length and is composed of four regions: 86,366 bp of a large single-copy (LSC) region, 17,298 bp of a small single-copy (SSC) region, and 27,103 bp of a pair of inverted repeats (IRs). The complete genome contains 130 genes, including 84 protein-coding genes, eight rRNA genes, and 37 tRNA genes. When comparing chloroplast genomes between V. verecunda, and V. arcuata, 34 different loci were recognized: 12 SNPs and 22 indels. In the coding regions, there were two amino acid insertions (ndhI) caused by one base deletion, three synonymous substitutions (ndhF, ccsA, and ndhI), and six nonsynonymous substitutions (matK, rpoC2, ndhF, ycf1, and two rpl2s on each IR region). In non-coding regions, variants of 19 polyN sites, one microsatellite, two insertions, and two SNPs were recognized. Phylogenetic analysis confirms a sister or nearly identical relationship between two genomes. This study will provide the genetic basis for solving a taxonomic problem between V. arcuata and V. verecunda

    A complete chloroplast genome sequence of Viola albida Palibin 1899 (Violaceae), a member of VIOLA ALBIDA complex

    No full text
    The VIOLA ALBIDA complex is a complicated group with taxonomic problems having continuous leaf variations and composed of taxa related to the following names: Viola albida, V. albida var. takahashii, and V. chaerophylloides. As a first step to understanding the genomic nature of this complex, this study identified the whole chloroplast genome of V. albida. The genome is 157,692 bp in length (36.3% of GC content) and contains four subregions: a large single copy region of 86,220 bp, a small single copy region of 17,248 bp, and a pair of inverted regions of 27,112 bp each. An annotation of the gene identifies 111 unique genes, including 77 protein-coding genes, four rRNA genes, and 30 tRNA genes. The phylogenetic analysis of this genome with selected cp genomes from Viola identifies the close relationship between V. albida and V. ulleungdoensis. It is noteworthy that V. chaerophylloides, traditionally recognized as a member of the VIOLA ALBIDA complex, is genetically distant from V. albida and forms a sister group of all other members of the subsection Patellares. Our genome report is expected to serve as a basis for understanding the identity of the VIOLA ALBIDA complex

    Auraptene, a Major Compound of Supercritical Fluid Extract of Phalsak (Citrus Hassaku Hort ex Tanaka), Induces Apoptosis through the Suppression of mTOR Pathways in Human Gastric Cancer SNU-1 Cells

    Get PDF
    The supercritical extraction method is a widely used process to obtain volatile and nonvolatile compounds by avoiding thermal degradation and solvent residue in the extracts. In search of phytochemicals with potential therapeutic application in gastric cancer, the supercritical fluid extract (SFE) of phalsak (Citrus hassaku Hort ex Tanaka) fruits was analyzed by gas chromatography-mass spectrometry (GC-MS). Compositional analysis in comparison with the antiproliferative activities of peel and flesh suggested auraptene as the most prominent anticancer compound against gastric cancer cells. SNU-1 cells were the most susceptible to auraptene-induced toxicity among the tested gastric cancer cell lines. Auraptene induced the death of SNU-1 cells through apoptosis, as evidenced by the increased cell population in the sub-G1 phase, the appearance of fragmented nuclei, the proteolytic cleavage of caspase-3 and poly(ADP-ribose) polymerase (PARP) protein, and depolarization of the mitochondrial membrane. Interestingly, auraptene induces an increase in the phosphorylation of Akt, which is reminiscent of the effect of rapamycin, the mTOR inhibitor that triggers a negative feedback loop on Akt/mTOR pathway. Taken together, these findings provide valuable insights into the anticancer effects of the SFE of the phalsak peel by revealing that auraptene, the major compound of it, induced apoptosis in accompanied with the inhibition of mTOR in SNU-1 cells

    The Mechanical Aspects of Formation and Application of PDMS Bilayers Rolled into a Cylindrical Structure

    Get PDF
    A polydimethylsiloxane (PDMS) film with its surface being oxidized by a plasma treatment or a UV-ozone (UVO) treatment, that is, a bilayer made of PDMS and its oxidized surface layer, is known to roll into a cylindrical structure upon exposure to the chloroform vapor due to the mismatch in the swelling ratio between PDMS and the oxidized layer by the chloroform vapor. Here we analyzed the formation of the rolled bilayer with the mechanical aspects: how the mismatch in the swelling ratio of the bilayer induces rolling of the bilayer, why any form of trigger that breaks the symmetry in the in-plane stress level is needed to roll the bilayer uniaxially, why the rolled bilayer does not unroll in the dry state when there is no more mismatch in the swelling ratio, and how the measured curvature of rolled bilayer matches well with the prediction by the theory. Moreover, for the use of the rolled bilayer as the channel of the microfluidic device, we examined whether the rolled bilayer deforms or unrolls by the flow of the aqueous solution that exerts the circumferential stress on the rolled bilayer

    Emissions of Volatile Organic Compounds (VOCs) from an Open-Circuit Dry Cleaning Machine Using a Petroleum-Based Organic Solvent: Implications for Impacts on Air Quality

    No full text
    Volatile organic compounds (VOCs) are known to play an important role in tropospheric chemistry, contributing to ozone and secondary organic aerosol (SOA) generation. Laundry facilities, using petroleum-based organic solvents, are one of the sources of VOCs emissions. However, little is known about the significance of VOCs, emitted from laundry facilities, in the ozone and SOA generation. In this study, we characterized VOCs emission from a dry-cleaning process using petroleum-based organic solvents. We also assessed the impact of the VOCs on air quality by using photochemical ozone creation potential and secondary organic aerosol potential. Among 94 targeted compounds including toxic organic air pollutants and ozone precursors, 36 compounds were identified in the exhaust gas from a drying machine. The mass emitted from one cycle of drying operation (40 min) was the highest in decane (2.04 g/dry cleaning). Decane, nonane, and n-undecane were the three main contributors to ozone generation (more than 70% of the total generation). N-undecane, decane, and n-dodecane were the three main contributors to the SOA generation (more than 80% of the total generation). These results help to understand VOCs emission from laundry facilities and impacts on air quality

    Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation

    No full text
    Graph pattern cardinality estimation is the problem of estimating the number of embeddings |M| of a query graph in a data graph. This fundamental problem arises, for example, during query planning in subgraph matching algorithms. There are two major approaches to solving the problem: sampling and synopsis. Synopsis (or summary)-based methods are fast and accurate if synopses capture information of graphs well. However, these methods suffer from large errors due to loss of information during summarization and inherent assumptions. Sampling-based methods are unbiased but suffer from large estimation variance due to large sample space. To address these limitations, we propose Alley, a hybrid method that combines both sampling and synopses. Alley employs 1) a novel sampling strategy, random walk with intersection, which effectively reduces the sample space, 2) branching to further reduce variance, and 3) a novel mining approach that extracts and indexes tangled patterns as synopses which are inherently difficult to estimate by sampling. By using them in the online estimation phase, we can effectively reduce the sample space while still ensuring unbiasedness. We establish that Alley has worst-case optimal runtime and approximation quality guarantees for any given error bound and required confidence . In addition to the theoretical aspect of Alley, our extensive experiments show that Alley outperforms the state-of-the-art methods by up to orders of magnitude higher accuracy with similar efficiency.1
    corecore