547 research outputs found

    Applications of Sparse Signal Recovery: 2D-Pattern Matching and Sparse Walsh-Hadamard Transform Computation

    Get PDF
    We study two problems related to sparse signal recovery. The first problem considered is querying a sub-image of size square of M in a large image database of size square of N to determine all the locations where sub-image appears. We use sparse graph based codes Fourier transform computation to compute the peaks in the 2-D correlation to determine the matching positions in a computationally efficient manner. We then design a 2-D pattern that can facilitate vision based positioning by enabling the use of our algorithm for fast pattern matching. The second problem studied is the computation of sparse Walsh-Hadamard transform for binary data. We consider signals that are sparse in Walsh-Hadamard tranform domain where the non-zero coefficients are all ones. A possible application of this algorithm is learning an undirected unweighted graph by using a sub-sample version of its evaluation. We design an adaptive algorithm for sparse WHT computation. Adaptivity provides an opportunity to recover more than one non-zero coefficient aliased together in each iteration so that a faster recovery can be expected given the same amount of sub-samples. It is shown that with the same amount sample, the probability of error of our proposed algorithm is lower compared to the earlier work

    Unsupervised Low Light Image Enhancement Using SNR-Aware Swin Transformer

    Full text link
    Image captured under low-light conditions presents unpleasing artifacts, which debilitate the performance of feature extraction for many upstream visual tasks. Low-light image enhancement aims at improving brightness and contrast, and further reducing noise that corrupts the visual quality. Recently, many image restoration methods based on Swin Transformer have been proposed and achieve impressive performance. However, On one hand, trivially employing Swin Transformer for low-light image enhancement would expose some artifacts, including over-exposure, brightness imbalance and noise corruption, etc. On the other hand, it is impractical to capture image pairs of low-light images and corresponding ground-truth, i.e. well-exposed image in same visual scene. In this paper, we propose a dual-branch network based on Swin Transformer, guided by a signal-to-noise ratio prior map which provides the spatial-varying information for low-light image enhancement. Moreover, we leverage unsupervised learning to construct the optimization objective based on Retinex model, to guide the training of proposed network. Experimental results demonstrate that the proposed model is competitive with the baseline models

    Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions

    Full text link
    Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a Residual Dilated Convolutional Neural Network with Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE International Conference on Bioinformatics and Biomedicine. arXiv admin note: text overlap with arXiv:1804.0501

    PerceptionGPT: Effectively Fusing Visual Perception into LLM

    Full text link
    The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate visual perception tasks remains a challenge. In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding. Our proposed method treats the token embedding of the LLM as the carrier of spatial information, then leverage lightweight visual task encoders and decoders to perform visual perception tasks (e.g., detection, segmentation). Our approach significantly alleviates the training difficulty suffered by previous approaches that formulate the visual outputs as discrete tokens, and enables achieving superior performance with fewer trainable parameters, less training data and shorted training time. Moreover, as only one token embedding is required to decode the visual outputs, the resulting sequence length during inference is significantly reduced. Consequently, our approach enables accurate and flexible representations, seamless integration of visual perception tasks, and efficient handling of a multiple of visual outputs. We validate the effectiveness and efficiency of our approach through extensive experiments. The results demonstrate significant improvements over previous methods with much fewer trainable parameters and GPU hours, which facilitates future research in enabling LLMs with visual perception abilities

    Coordinated optimal control of secondary cooling and final electromagnetic stirring for continuous casting billets

    Get PDF
    Secondary cooling and final electromagnetic stirring (F-EMS) are both key technologies for continuous casting. These parameters are usually optimized and controlled separately which caused internal quality fluctuations in unsteady conditions. In this paper, a coordinated optimal control strategy based on a multiobjective particle swarm optimization (MOPSO) algorithm is proposed for the parameter optimization of secondary cooling and F-EMS, which is solved based on multiobjective particle swarm optimization (MOPSO) algorithm. The solidification and heat transfer model are developed for the computation of billet temperature and the solidification, and the adaptive grid method is used to improve the diversity and robustness of optimal solutions. The secondary cooling water and F-EMS’ stirring current are dynamically controlled based on the optimization results. The results of field trials showed that the maximum carbon segregation and other quality indexes of billets can be improved significantly
    • …
    corecore