40 research outputs found
DRAFT: Dense Retrieval Augmented Few-shot Topic classifier Framework
With the growing volume of diverse information, the demand for classifying
arbitrary topics has become increasingly critical. To address this challenge,
we introduce DRAFT, a simple framework designed to train a classifier for
few-shot topic classification. DRAFT uses a few examples of a specific topic as
queries to construct Customized dataset with a dense retriever model.
Multi-query retrieval (MQR) algorithm, which effectively handles multiple
queries related to a specific topic, is applied to construct the Customized
dataset. Subsequently, we fine-tune a classifier using the Customized dataset
to identify the topic. To demonstrate the efficacy of our proposed approach, we
conduct evaluations on both widely used classification benchmark datasets and
manually constructed datasets with 291 diverse topics, which simulate diverse
contents encountered in real-world applications. DRAFT shows competitive or
superior performance compared to baselines that use in-context learning, such
as GPT-3 175B and InstructGPT 175B, on few-shot topic classification tasks
despite having 177 times fewer parameters, demonstrating its effectiveness
MEMTO: Memory-guided Transformer for Multivariate Time Series Anomaly Detection
Detecting anomalies in real-world multivariate time series data is
challenging due to complex temporal dependencies and inter-variable
correlations. Recently, reconstruction-based deep models have been widely used
to solve the problem. However, these methods still suffer from an
over-generalization issue and fail to deliver consistently high performance. To
address this issue, we propose the MEMTO, a memory-guided Transformer using a
reconstruction-based approach. It is designed to incorporate a novel memory
module that can learn the degree to which each memory item should be updated in
response to the input data. To stabilize the training procedure, we use a
two-phase training paradigm which involves using K-means clustering for
initializing memory items. Additionally, we introduce a bi-dimensional
deviation-based detection criterion that calculates anomaly scores considering
both input space and latent space. We evaluate our proposed method on five
real-world datasets from diverse domains, and it achieves an average anomaly
detection F1-score of 95.74%, significantly outperforming the previous
state-of-the-art methods. We also conduct extensive experiments to empirically
validate the effectiveness of our proposed model's key components
Additional file 2 of Mut2Vec: distributed representation of cancerous mutations
It contains the most enriched clusters with IntOGen driver mutations obtained by six clustering methods(K-Means, Agglomerative hierarchical clustering, BIRCH, Spectral clustering, Affinity Propagation, and Gaussian Mixture) and five options of the number of clusters(50, 100, 200, 300 and 500); except Affinity Propagation. (PDF 108 kb
Tunable translation-level CRISPR interference by dCas13 and engineered gRNA in bacteria
Abstract Although CRISPR-dCas13, the RNA-guided RNA-binding protein, was recently exploited as a translation-level gene expression modulator, it has still been difficult to precisely control the level due to the lack of detailed characterization. Here, we develop a synthetic tunable translation-level CRISPR interference (Tl-CRISPRi) system based on the engineered guide RNAs that enable precise and predictable down-regulation of mRNA translation. First, we optimize the Tl-CRISPRi system for specific and multiplexed repression of genes at the translation level. We also show that the Tl-CRISPRi system is more suitable for independently regulating each gene in a polycistronic operon than the transcription-level CRISPRi (Tx-CRISPRi) system. We further engineer the handle structure of guide RNA for tunable and predictable repression of various genes in Escherichia coli and Vibrio natriegens. This tunable Tl-CRISPRi system is applied to increase the production of 3-hydroxypropionic acid (3-HP) by 14.2-fold via redirecting the metabolic flux, indicating the usefulness of this system for the flux optimization in the microbial cell factories based on the RNA-targeting machinery
Additional file 1 of Mut2Vec: distributed representation of cancerous mutations
It contains the visualization results with mutation vectors trained with an autoencoder and a denoising autoencoder. (PDF 427 kb
Added Value of Structured Reporting for US of the Pediatric Appendix: Additional CT Examinations and Negative Appendectomy
Purpose This study aimed to determine the incremental value of using a structured report (SR) for
US examinations of the pediatric appendix.
Materials and Methods Between January 2009 and June 2016, 1150 pediatric patients with suspected
appendicitis who underwent US examinations of the appendix were included retrospectively. In
November 2012, we developed a five-point scale SR for appendix US examinations. The patients
were divided into two groups according to the form of the US report: free-text or SR. The primary clinical
outcomes were compared between the two groups, including the rate of CT imaging following US
examinations, the negative appendectomy rate (NAR), and the appendiceal perforation rate (PR).
Results In total, 550 patients were included in the free-text group and 600 patients in the SR group.
The rate of additional CT examinations decreased by 5.3% in the SR group (8.2%, p = 0.003), and the
NAR decreased by 8.4% in the SR group (7.8%, p = 0.028). There was no statistical difference in the appendiceal
PR (37.6% vs. 48.0%, p = 0.078).
Conclusion The use of an SR to evaluate US examinations for suspected pediatric appendicitis results
in lower CT use and fewer negative appendectomies without an increase in appendiceal PR
WALDIO: Eliminating the Filesystem Journaling in Resolving the Journaling of Journal Anomaly
This work is dedicated to resolve the Journaling of Journal Anomaly in Android IO stack.We orchestrate SQLite and EXT4 filesystem so that SQLite???s file-backed journaling activity can dispense with the expensive filesystem intervention, the journaling, without compromising the file integrity under unexpected filesystem failure. In storing the logs, we exploit the direct IO to suppress the filesystem interference. This work consists of three key ingredients: (i) Preallocation with Explicit Journaling, (ii) Header Embedding, and (iii) Group Synchronization. Preallocation with Explicit Journaling eliminates the filesystem journaling properly protecting the file metadata against the unexpected system crash. We redesign the SQLite B-tree structure with Header Embedding to make it direct IO compatible and block IO friendly. With Group Synch, we minimize the synchronization overhead of direct IO and make the SQLite operation NAND Flash friendly. Combining the three technical ingredients, we develop a new journal mode in SQLite, the WALDIO. We implement it on the commercially available smartphone. WALDIO mode achieves 5.1x performance (insert/sec) against WAL mode which is the fastest journaling mode in SQLite. It yields 2.7x performance (inserts/ sec) against the LS-MVBT, the fastest SQLite journaling mode known to public. WALDIO mode achieves 7.4x performance (insert/sec) against WAL mode when it is relieved from the overhead of explicitly synchronizing individual log-commit operations. WALDIO mode reduces the IO volume to 1/6 compared against the WAL mode