149 research outputs found
Improving Small Language Models on PubMedQA via Generative Data Augmentation
Large Language Models (LLMs) have made remarkable advancements in the field
of natural language processing. However, their increasing size poses challenges
in terms of computational cost. On the other hand, Small Language Models (SLMs)
are known for their efficiency, but they often struggle with limited capacity
and training data, especially in specific domains. In this paper, we introduce
a novel method aimed at improving SLMs in the medical domain using LLM-based
generative data augmentation. The objective of our approach is to develop more
efficient and capable models that are specifically tailored for specialized
applications. Through experiments conducted on the PubMedQA dataset, we
demonstrate the effectiveness of LLMs in refining and diversifying existing
question-answer pairs. This refinement process leads to improved performance in
a significantly smaller model after fine-tuning. Notably, our best SLM, with
under 1.6 billion parameters, outperforms the few-shot GPT-4 on the PubMedQA
dataset. Our code and generated data are publicly available to facilitate
further explorations
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Neural Network (NN) accelerators with emerging ReRAM (resistive random access
memory) technologies have been investigated as one of the promising solutions
to address the \textit{memory wall} challenge, due to the unique capability of
\textit{processing-in-memory} within ReRAM-crossbar-based processing elements
(PEs). However, the high efficiency and high density advantages of ReRAM have
not been fully utilized due to the huge communication demands among PEs and the
overhead of peripheral circuits.
In this paper, we propose a full system stack solution, composed of a
reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and
its software system including neural synthesizer, temporal-to-spatial mapper,
and placement & routing. We highly leverage the software system to make the
hardware design compact and efficient. To satisfy the high-performance
communication demand, we optimize it with a reconfigurable routing architecture
and the placement & routing tool. To improve the computational density, we
greatly simplify the PE circuit with the spiking schema and then adopt neural
synthesizer to enable the high density computation-resources to support
different kinds of NN operations. In addition, we provide spiking memory blocks
(SMBs) and configurable logic blocks (CLBs) in hardware and leverage the
temporal-to-spatial mapper to utilize them to balance the storage and
computation requirements of NN. Owing to the end-to-end software system, we can
efficiently deploy existing deep neural networks to FPSA. Evaluations show
that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME,
the computational density of FPSA improves by 31x; for representative NNs, its
inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201
Using Multiple Instance Learning to Build Multimodal Representations
Image-text multimodal representation learning aligns data across modalities
and enables important medical applications, e.g., image classification, visual
grounding, and cross-modal retrieval. In this work, we establish a connection
between multimodal representation learning and multiple instance learning.
Based on this connection, we propose a generic framework for constructing
permutation-invariant score functions with many existing multimodal
representation learning approaches as special cases. Furthermore, we use the
framework to derive a novel contrastive learning approach and demonstrate that
our method achieves state-of-the-art results on a number of downstream tasks
ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field
Neural Radiance Fields (NeRF) have demonstrated impressive potential in
synthesizing novel views from dense input, however, their effectiveness is
challenged when dealing with sparse input. Existing approaches that incorporate
additional depth or semantic supervision can alleviate this issue to an extent.
However, the process of supervision collection is not only costly but also
potentially inaccurate, leading to poor performance and generalization ability
in diverse scenarios. In our work, we introduce a novel model: the
Collaborative Neural Radiance Fields (ColNeRF) designed to work with sparse
input. The collaboration in ColNeRF includes both the cooperation between
sparse input images and the cooperation between the output of the neural
radiation field. Through this, we construct a novel collaborative module that
aligns information from various views and meanwhile imposes self-supervised
constraints to ensure multi-view consistency in both geometry and appearance. A
Collaborative Cross-View Volume Integration module (CCVI) is proposed to
capture complex occlusions and implicitly infer the spatial location of
objects. Moreover, we introduce self-supervision of target rays projected in
multiple directions to ensure geometric and color consistency in adjacent
regions. Benefiting from the collaboration at the input and output ends,
ColNeRF is capable of capturing richer and more generalized scene
representation, thereby facilitating higher-quality results of the novel view
synthesis. Extensive experiments demonstrate that ColNeRF outperforms
state-of-the-art sparse input generalizable NeRF methods. Furthermore, our
approach exhibits superiority in fine-tuning towards adapting to new scenes,
achieving competitive performance compared to per-scene optimized NeRF-based
methods while significantly reducing computational costs. Our code is available
at: https://github.com/eezkni/ColNeRF
DGI: Easy and Efficient Inference for GNNs
While many systems have been developed to train Graph Neural Networks (GNNs),
efficient model inference and evaluation remain to be addressed. For instance,
using the widely adopted node-wise approach, model evaluation can account for
up to 94% of the time in the end-to-end training process due to neighbor
explosion, which means that a node accesses its multi-hop neighbors. On the
other hand, layer-wise inference avoids the neighbor explosion problem by
conducting inference layer by layer such that the nodes only need their one-hop
neighbors in each layer. However, implementing layer-wise inference requires
substantial engineering efforts because users need to manually decompose a GNN
model into layers for computation and split workload into batches to fit into
device memory. In this paper, we develop Deep Graph Inference (DGI) -- a system
for easy and efficient GNN model inference, which automatically translates the
training code of a GNN model for layer-wise execution. DGI is general for
various GNN models and different kinds of inference requests, and supports
out-of-core execution on large graphs that cannot fit in CPU memory.
Experimental results show that DGI consistently outperforms layer-wise
inference across different datasets and hardware settings, and the speedup can
be over 1,000x.Comment: 10 pages, 10 figure
- …