17 research outputs found

    Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats

    Full text link
    Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured texts. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin.Comment: NeurIPS 2022 : ENLS

    Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models

    Full text link
    Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured conversational data. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. We stress-test our proposed Topic Segmentation approach by experimenting with multiple loss functions, in order to mitigate effects of imbalance in unstructured conversational datasets. Our empirical evaluation indicates that Focal Loss function is a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats.Comment: Accepted to IntelliSys 2023. arXiv admin note: substantial text overlap with arXiv:2211.1495

    Applicability and safety of dual-frequency ultrasonic treatment for the transdermal delivery of drugs

    Get PDF
    Low-frequency ultrasound presents an attractive method for transdermal drug delivery. The controlled, yet non-specific nature of enhancement broadens the range of therapeutics that can be delivered, while minimizing necessary reformulation efforts for differing compounds. Long and inconsistent treatment times, however, have partially limited the attractiveness of this method. Building on recent advances made in this area, the simultaneous use of low- and high-frequency ultrasound is explored in a physiologically relevant experimental setup to enable the translation of this treatment to testing in vivo. Dual-frequency ultrasound, utilizing 20 kHz and 1 MHz wavelengths simultaneously, was found to significantly enhance the size of localized transport regions (LTRs) in both in vitro and in vivo models while decreasing the necessary treatment time compared to 20 kHz alone. Additionally, LTRs generated by treatment with 20 kHz + 1 MHz were found to be more permeable than those generated with 20 kHz alone. This was further corroborated with pore-size estimates utilizing hindered-transport theory, in which the pores in skin treated with 20 kHz + 1 MHz were calculated to be significantly larger than the pores in skin treated with 20 kHz alone. This demonstrates for the first time that LTRs generated with 20 kHz + 1 MHz are also more permeable than those generated with 20 kHz alone, which could broaden the range of therapeutics and doses administered transdermally. With regard to safety, treatment with 20 kHz + 1 MHz both in vitro and in vivo appeared to result in no greater skin disruption than that observed in skin treated with 20 kHz alone, an FDA-approved modality. This study demonstrates that dual-frequency ultrasound is more efficient and effective than single-frequency ultrasound and is well-tolerated in vivo.National Institutes of Health (U.S.) (Grant EB-00351)National Institutes of Health (U.S.) (Grant CA014051

    High-throughput mapping of regulatory DNA

    Get PDF
    Quantifying the effects of cis-regulatory DNA on gene expression is a major challenge. Here, we present the multiplexed editing regulatory assay (MERA), a high-throughput CRISPR-Cas9–based approach that analyzes the functional impact of the regulatory genome in its native context. MERA tiles thousands of mutations across ~40 kb of cis-regulatory genomic space and uses knock-in green fluorescent protein (GFP) reporters to read out gene activity. Using this approach, we obtain quantitative information on the contribution of cis-regulatory regions to gene expression. We identify proximal and distal regulatory elements necessary for expression of four embryonic stem cell–specific genes. We show a consistent contribution of neighboring gene promoters to gene expression and identify unmarked regulatory elements (UREs) that control gene expression but do not have typical enhancer epigenetic or chromatin features. We compare thousands of functional and nonfunctional genotypes at a genomic location and identify the base pair–resolution functional motifs of regulatory elements.National Institutes of Health (U.S.) (1U01HG007037

    Integration of energy storage systems in DC ships

    No full text
    Electric propulsion technology in the realm of marine shipping is more than a century old. Incorporation of technologies such as DC distribution and energy storage for vessels is further widening the horizons to look towards the same direction. DC Ships aid in the integration of variable speed diesel generators thus supporting the fuel consumption mitigation and controlling the emissions. For alternative forms of energy to be tapped, energy storage systems would be required for optimal level of operations. Development has been vigorous in the rechargeable batteries domain to meet the ever so increasing demand in terms of energy specially to solve the intermittent nature of the power requirement in a DC ship in the event of mechanical/electrical breakdown. It is essentially important for the modelling of a cell to be used in such DC ships to track the state of charge, voltage level of the battery in real time by feeding in the physical state of the battery as an input. An overview of working of different battery chemistries is done in order to conclude regarding the best kind which could be used for marine systems. Following this, a summary of the various equivalent models, starting from ideal model to DP model is obtained based on complexity. By understanding the different models, an equivalent model inspired from the Shephard's model has been implemented based on selection of points from the experimental curve so that it could be incorporated in a marine system to carry out fault management in a marine network. The model has been tested for 3 discharge rates to justify its robustness. The graph obtained from this battery model is then divided into two components to derive an electrophysical meaning by proving a strong correlation with the Single Particle Model under certain assumptions and a generalized exponential function.Master of Science (Power Engineering

    Cloning-free CRISPR

    Get PDF
    Summary We present self-cloning CRISPR/Cas9 (scCRISPR), a technology that allows for CRISPR/Cas9-mediated genomic mutation and site-specific knockin transgene creation within several hours by circumventing the need to clone a site-specific single-guide RNA (sgRNA) or knockin homology construct for each target locus. We introduce a self-cleaving palindromic sgRNA plasmid and a short double-stranded DNA sequence encoding the desired locus-specific sgRNA into target cells, allowing them to produce a locus-specific sgRNA plasmid through homologous recombination. scCRISPR enables efficient generation of gene knockouts (∼88% mutation rate) at approximately one-sixth the cost of plasmid-based sgRNA construction with only 2 hr of preparation for each targeted site. Additionally, we demonstrate efficient site-specific knockin of GFP transgenes without any plasmid cloning or genome-integrated selection cassette in mouse and human embryonic stem cells (2%–4% knockin rate) through PCR-based addition of short homology arms. scCRISPR substantially lowers the bar on mouse and human transgenesis

    Cas9 Functionally Opens Chromatin

    No full text
    Using a nuclease-dead Cas9 mutant, we show that Cas9 reproducibly induces chromatin accessibility at previously inaccessible genomic loci. Cas9 chromatin opening is sufficient to enable adjacent binding and transcriptional activation by the settler transcription factor retinoic acid receptor at previously unbound motifs. Thus, we demonstrate a new use for Cas9 in increasing surrounding chromatin accessibility to alter local transcription factor binding

    dCas9 induces chromatin accessibility at previously inaccessible genomic loci.

    No full text
    <p><b>a</b>. mESCs were co-transfected with a Tol2 transposase (TPase), a Tol2 transposon-flanked dCas9 expression cassette, and a Tol2 transposon-flanked sgRNA cassette to yield stable expression of dCas9 and a sgRNA targeted to a region with inaccessible chromatin. <b>b</b>. 16/16 loci in previously inaccessible chromatin had statistically significant increases in DNase hypersensitivity (y-axis) upon sgRNA targeting as measured by DNase-qPCR (gray dots). DNase hypersensitivity at each locus is normalized to its level in the absence of sgRNA (blue dot), and the average normalized DNase hypersensitivity in the presence of gRNA for all loci is shown (red dot), which is statistically significantly increased over–sgRNA control. At least two replicates were performed for all conditions, and a two-tailed Student’s t-test used to calculate significance. <b>c</b>. DNase-qPCR measurement of DNase hypersensitivity (y-axis) is shown +/-150 bp from the sgRNA site (x-axis) at four targeted loci. DNase-qPCR values at each datapoint are normalized to hypersensitivity in the absence of sgRNA, and all loci are oriented such that the 20 bp sgRNA sequence is immediately to the left of 0 and the NGG PAM sequence is immediately to the right of 0. Three replicates were performed for all experiments.</p

    dCas9 chromatin opening enables adjacent RAR binding and RA-dependent gene activation.

    No full text
    <p><b>a</b>. Anti-retinoic acid receptor (RAR) ChIP followed by qPCR at three loci (RAR1-3, x-axis) in the presence of sgRNAs targeting each locus (blue, red, and green). ChIP-qPCR values are normalized to the average of two of the strongest genomic RAR binding sites. Three replicates were performed for all experiments, and a two-tailed Student’s t-test was used to calculate significance, and values with P<0.01 are denoted with a *. <b>b.</b> The Tol2 transposon-based reporter system involves stable integration of a 2x RAR motif, a minimal promoter, and GFP. dCas9 was recruited through sgRNA sequences upstream (sgRAR Up, red) or downstream (sgRAR Down, blue) of the 2x RAR motif. <b>c</b>. Average flow cytometric GFP induction by RA in the presence of control sgRNA (black), sgRAR Up (red), or sgRAR Down (blue) sgRNAs. <b>d</b>. dCas9 is able to bind to sgRNA sequences in inaccessible chromatin and induce accessibility, which directly enables the settler factor RAR to bind to previously obscured motifs.</p
    corecore