547 research outputs found
Applications of Sparse Signal Recovery: 2D-Pattern Matching and Sparse Walsh-Hadamard Transform Computation
We study two problems related to sparse signal recovery.
The first problem considered is querying a sub-image of size square of M in a large image database of size square of N to determine all the locations where sub-image appears. We use sparse graph based codes Fourier transform computation to compute the peaks in the 2-D correlation to determine the matching positions in a computationally efficient manner.
We then design a 2-D pattern that can facilitate vision based positioning by enabling the use of our algorithm for fast pattern matching. The second problem studied is the computation of sparse Walsh-Hadamard transform for binary data. We consider signals that are sparse in Walsh-Hadamard tranform domain where the non-zero coefficients are all ones. A possible application of this algorithm is learning an undirected unweighted graph by using a sub-sample version of its evaluation. We design an adaptive algorithm for sparse WHT computation. Adaptivity provides an opportunity to recover more than one non-zero coefficient aliased together in each iteration so that a faster recovery can be expected given the same amount of sub-samples. It is shown that with the same amount sample, the probability of error of our proposed algorithm is lower compared to the earlier work
Unsupervised Low Light Image Enhancement Using SNR-Aware Swin Transformer
Image captured under low-light conditions presents unpleasing artifacts,
which debilitate the performance of feature extraction for many upstream visual
tasks. Low-light image enhancement aims at improving brightness and contrast,
and further reducing noise that corrupts the visual quality. Recently, many
image restoration methods based on Swin Transformer have been proposed and
achieve impressive performance. However, On one hand, trivially employing Swin
Transformer for low-light image enhancement would expose some artifacts,
including over-exposure, brightness imbalance and noise corruption, etc. On the
other hand, it is impractical to capture image pairs of low-light images and
corresponding ground-truth, i.e. well-exposed image in same visual scene. In
this paper, we propose a dual-branch network based on Swin Transformer, guided
by a signal-to-noise ratio prior map which provides the spatial-varying
information for low-light image enhancement. Moreover, we leverage unsupervised
learning to construct the optimization objective based on Retinex model, to
guide the training of proposed network. Experimental results demonstrate that
the proposed model is competitive with the baseline models
Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
Clinical Named Entity Recognition (CNER) aims to identify and classify
clinical terms such as diseases, symptoms, treatments, exams, and body parts in
electronic health records, which is a fundamental and crucial task for clinical
and translation research. In recent years, deep learning methods have achieved
significant success in CNER tasks. However, these methods depend greatly on
Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations
that are propagated through time, thus causing too much time to train models.
In this paper, we propose a Residual Dilated Convolutional Neural Network with
Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese
characters and dictionary features are first projected into dense vector
representations, then they are fed into the residual dilated convolutional
neural network to capture contextual features. Finally, a conditional random
field is employed to capture dependencies between neighboring tags.
Computational results on the CCKS-2017 Task 2 benchmark dataset show that our
proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based
methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE
International Conference on Bioinformatics and Biomedicine. arXiv admin note:
text overlap with arXiv:1804.0501
PerceptionGPT: Effectively Fusing Visual Perception into LLM
The integration of visual inputs with large language models (LLMs) has led to
remarkable advancements in multi-modal capabilities, giving rise to visual
large language models (VLLMs). However, effectively harnessing VLLMs for
intricate visual perception tasks remains a challenge. In this paper, we
present a novel end-to-end framework named PerceptionGPT, which efficiently and
effectively equips the VLLMs with visual perception abilities by leveraging the
representation power of LLMs' token embedding. Our proposed method treats the
token embedding of the LLM as the carrier of spatial information, then leverage
lightweight visual task encoders and decoders to perform visual perception
tasks (e.g., detection, segmentation). Our approach significantly alleviates
the training difficulty suffered by previous approaches that formulate the
visual outputs as discrete tokens, and enables achieving superior performance
with fewer trainable parameters, less training data and shorted training time.
Moreover, as only one token embedding is required to decode the visual outputs,
the resulting sequence length during inference is significantly reduced.
Consequently, our approach enables accurate and flexible representations,
seamless integration of visual perception tasks, and efficient handling of a
multiple of visual outputs. We validate the effectiveness and efficiency of our
approach through extensive experiments. The results demonstrate significant
improvements over previous methods with much fewer trainable parameters and GPU
hours, which facilitates future research in enabling LLMs with visual
perception abilities
Coordinated optimal control of secondary cooling and final electromagnetic stirring for continuous casting billets
Secondary cooling and final electromagnetic stirring (F-EMS) are both key technologies for continuous casting. These parameters are usually optimized and controlled separately which caused internal quality fluctuations in unsteady conditions. In this paper, a coordinated optimal control strategy based on a multiobjective particle swarm optimization (MOPSO) algorithm is proposed for the parameter optimization of secondary cooling and F-EMS, which is solved based on multiobjective particle swarm optimization (MOPSO) algorithm. The solidification and heat transfer model are developed for the computation of billet temperature and the solidification, and the adaptive grid method is used to improve the diversity and robustness of optimal solutions. The secondary cooling water and F-EMS’ stirring current are dynamically controlled based on the optimization results. The results of field trials showed that the maximum carbon segregation and other quality indexes of billets can be improved significantly
- …