376,251 research outputs found
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Large-scale vision language (VL) models use Transformers to perform
cross-modal interactions between the input text and image. These cross-modal
interactions are computationally expensive and memory-intensive due to the
quadratic complexity of processing the input image and text. We present PuMer:
a token reduction framework that uses text-informed Pruning and modality-aware
Merging strategies to progressively reduce the tokens of input image and text,
improving model inference speed and reducing memory footprint. PuMer learns to
keep salient image tokens related to the input text and merges similar textual
and visual tokens by adding lightweight token reducer modules at several
cross-modal layers in the VL model. Training PuMer is mostly the same as
finetuning the original VL model but faster. Our evaluation for two vision
language models on four downstream VL tasks shows PuMer increases inference
throughput by up to 2x and reduces memory footprint by over 50% while incurring
less than a 1% accuracy drop.Comment: Accepted to ACL 2023 Main Conferenc
Large Model Visualization : Techniques and Applications
The size of datasets in scientific computing is rapidly
increasing. This increase is caused by a boost of processing power in
the past years, which in turn was invested in an increase of the
accuracy and the size of the models. A similar trend enabled a
significant improvement of medical scanners; more than 1000 slices of
a resolution of 512x512 can be generated by modern scanners in daily
practice. Even in computer-aided engineering typical models eas-ily
contain several million polygons. Unfortunately, the data complexity
is growing faster than the rendering performance of modern computer
systems. This is not only due to the slower growing graphics
performance of the graphics subsystems, but in particular because of
the significantly slower growing memory bandwidth for the transfer of
the geometry and image data from the main memory to the graphics
accelerator.
Large model visualization addresses this growing divide between data
complexity and rendering performance. Most methods focus on the
reduction of the geometric or pixel complexity, and hence also the
memory bandwidth requirements are reduced.
In this dissertation, we discuss new approaches from three different
research areas. All approaches target at the reduction of the
processing complexity to achieve an interactive visualization of large
datasets. In the second part, we introduce applications of the
presented ap-proaches. Specifically, we introduce the new VIVENDI
system for the interactive virtual endoscopy and other applications
from mechanical engineering, scientific computing, and architecture.The size of datasets in scientific computing is rapidly
increasing. This increase is caused by a boost of processing power in
the past years, which in turn was invested in an increase of the
accuracy and the size of the models. A similar trend enabled a
significant improvement of medical scanners; more than 1000 slices of
a resolution of 512x512 can be generated by modern scanners in daily
practice. Even in computer-aided engineering typical models eas-ily
contain several million polygons. Unfortunately, the data complexity
is growing faster than the rendering performance of modern computer
systems. This is not only due to the slower growing graphics
performance of the graphics subsystems, but in particular because of
the significantly slower growing memory bandwidth for the transfer of
the geometry and image data from the main memory to the graphics
accelerator.
Large model visualization addresses this growing divide between data
complexity and rendering performance. Most methods focus on the
reduction of the geometric or pixel complexity, and hence also the
memory bandwidth requirements are reduced.
In this dissertation, we discuss new approaches from three different
research areas. All approaches target at the reduction of the
processing complexity to achieve an interactive visualization of large
datasets. In the second part, we introduce applications of the
presented ap-proaches. Specifically, we introduce the new VIVENDI
system for the interactive virtual endoscopy and other applications
from mechanical engineering, scientific computing, and architecture
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
- …