685 research outputs found
GPU-Accelerated BWT Construction for Large Collection of Short Reads
Advances in DNA sequencing technology have stimulated the development of
algorithms and tools for processing very large collections of short strings
(reads). Short-read alignment and assembly are among the most well-studied
problems. Many state-of-the-art aligners, at their core, have used the
Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome
(typical example, NCBI human genome). Recently, BWT has also found its use in
string-graph assembly, for indexing the reads (i.e., raw data from DNA
sequencers). In a typical data set, the volume of reads is tens of times of the
sequenced genome and can be up to 100 Gigabases. Note that a reference genome
is relatively stable and computing the index is not a frequent task. For reads,
the index has to computed from scratch for each given input. The ability of
efficient BWT construction becomes a much bigger concern than before. In this
paper, we present a practical method called CX1 for constructing the BWT of
very large string collections. CX1 is the first tool that can take advantage of
the parallelism given by a graphics processing unit (GPU, a relative cheap
device providing a thousand or more primitive cores), as well as simultaneously
the parallelism from a multi-core CPU and more interestingly, from a cluster of
GPU-enabled nodes. Using CX1, the BWT of a short-read collection of up to 100
Gigabases can be constructed in less than 2 hours using a machine equipped with
a quad-core CPU and a GPU, or in about 43 minutes using a cluster with 4 such
machines (the speedup is almost linear after excluding the first 16 minutes for
loading the reads from the hard disk). The previously fastest tool BRC is
measured to take 12 hours to process 100 Gigabases on one machine; it is
non-trivial how BRC can be parallelized to take advantage a cluster of
machines, let alone GPUs.Comment: 11 page
MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
MEGAHIT is a NGS de novo assembler for assembling large and complex
metagenomics data in a time- and cost-efficient manner. It finished assembling
a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a
single computing node with and without a GPU, respectively. MEGAHIT assembles
the data as a whole, i.e., it avoids pre-processing like partitioning and
normalization, which might compromise on result integrity. MEGAHIT generates 3
times larger assembly, with longer contig N50 and average contig length than
the previous assembly. 55.8% of the reads were aligned to the assembly, which
is 4 times higher than the previous. The source code of MEGAHIT is freely
available at https://github.com/voutcn/megahit under GPLv3 license.Comment: 2 pages, 2 tables, 1 figure, submitted to Oxford Bioinformatics as an
Application Not
Hidden Trends in 90 Years of Harvard Business Review
In this paper, we demonstrate and discuss results of our mining the abstracts
of the publications in Harvard Business Review between 1922 and 2012.
Techniques for computing n-grams, collocations, basic sentiment analysis, and
named-entity recognition were employed to uncover trends hidden in the
abstracts. We present findings about international relationships, sentiment in
HBR's abstracts, important international companies, influential technological
inventions, renown researchers in management theories, US presidents via
chronological analyses.Comment: 6 pages, 14 figures, Proceedings of 2012 International Conference on
Technologies and Applications of Artificial Intelligenc
Explicit Visual Prompting for Universal Foreground Segmentations
Foreground segmentation is a fundamental problem in computer vision, which
includes salient object detection, forgery detection, defocus blur detection,
shadow detection, and camouflage object detection. Previous works have
typically relied on domain-specific solutions to address accuracy and
robustness issues in those applications. In this paper, we present a unified
framework for a number of foreground segmentation tasks without any
task-specific designs. We take inspiration from the widely-used pre-training
and then prompt tuning protocols in NLP and propose a new visual prompting
model, named Explicit Visual Prompting (EVP). Different from the previous
visual prompting which is typically a dataset-level implicit embedding, our key
insight is to enforce the tunable parameters focusing on the explicit visual
content from each individual image, i.e., the features from frozen patch
embeddings and high-frequency components. Our method freezes a pre-trained
model and then learns task-specific knowledge using a few extra parameters.
Despite introducing only a small number of tunable parameters, EVP achieves
superior performance than full fine-tuning and other parameter-efficient
fine-tuning methods. Experiments in fourteen datasets across five tasks show
the proposed method outperforms other task-specific methods while being
considerably simple. The proposed method demonstrates the scalability in
different architectures, pre-trained weights, and tasks. The code is available
at: https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1088
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition
Raw videos have been proven to own considerable feature redundancy where in
many cases only a portion of frames can already meet the requirements for
accurate recognition. In this paper, we are interested in whether such
redundancy can be effectively leveraged to facilitate efficient inference in
continuous sign language recognition (CSLR). We propose a novel adaptive model
(AdaBrowse) to dynamically select a most informative subsequence from input
video sequences by modelling this problem as a sequential decision task. In
specific, we first utilize a lightweight network to quickly scan input videos
to extract coarse features. Then these features are fed into a policy network
to intelligently select a subsequence to process. The corresponding subsequence
is finally inferred by a normal CSLR model for sentence prediction. As only a
portion of frames are processed in this procedure, the total computations can
be considerably saved. Besides temporal redundancy, we are also interested in
whether the inherent spatial redundancy can be seamlessly integrated together
to achieve further efficiency, i.e., dynamically selecting a lowest input
resolution for each sample, whose model is referred to as AdaBrowse+. Extensive
experimental results on four large-scale CSLR datasets, i.e., PHOENIX14,
PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and
AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with
1.44 throughput and 2.12 fewer FLOPs. Comparisons with other
commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness
of AdaBrowse. Code is available at
\url{https://github.com/hulianyuyy/AdaBrowse}.Comment: ACMMM202
COMMA: Co-Articulated Multi-Modal Learning
Pretrained large-scale vision-language models such as CLIP have demonstrated
excellent generalizability over a series of downstream tasks. However, they are
sensitive to the variation of input text prompts and need a selection of prompt
templates to achieve satisfactory performance. Recently, various methods have
been proposed to dynamically learn the prompts as the textual inputs to avoid
the requirements of laboring hand-crafted prompt engineering in the fine-tuning
process. We notice that these methods are suboptimal in two aspects. First, the
prompts of the vision and language branches in these methods are usually
separated or uni-directionally correlated. Thus, the prompts of both branches
are not fully correlated and may not provide enough guidance to align the
representations of both branches. Second, it's observed that most previous
methods usually achieve better performance on seen classes but cause
performance degeneration on unseen classes compared to CLIP. This is because
the essential generic knowledge learned in the pretraining stage is partly
forgotten in the fine-tuning process. In this paper, we propose Co-Articulated
Multi-Modal Learning (COMMA) to handle the above limitations. Especially, our
method considers prompts from both branches to generate the prompts to enhance
the representation alignment of both branches. Besides, to alleviate forgetting
about the essential knowledge, we minimize the feature discrepancy between the
learned prompts and the embeddings of hand-crafted prompts in the pre-trained
CLIP in the late transformer layers. We evaluate our method across three
representative tasks of generalization to novel classes, new target datasets
and unseen domain shifts. Experimental results demonstrate the superiority of
our method by exhibiting a favorable performance boost upon all tasks with high
efficiency.Comment: Accepted to AAAI2024. Code is available at
https://github.com/hulianyuyy/COMM
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Zero-shot Video Object Segmentation (ZSVOS) aims at segmenting the primary
moving object without any human annotations. Mainstream solutions mainly focus
on learning a single model on large-scale video datasets, which struggle to
generalize to unseen videos. In this work, we introduce a test-time training
(TTT) strategy to address the problem. Our key insight is to enforce the model
to predict consistent depth during the TTT process. In detail, we first train a
single network to perform both segmentation and depth prediction tasks. This
can be effectively learned with our specifically designed depth modulation
layer. Then, for the TTT process, the model is updated by predicting consistent
depth maps for the same frame under different data augmentations. In addition,
we explore different TTT weight updating strategies. Our empirical results
suggest that the momentum-based weight initialization and looping-based
training scheme lead to more stable improvements. Experiments show that the
proposed method achieves clear improvements on ZSVOS. Our proposed video TTT
strategy provides significant superiority over state-of-the-art TTT methods.
Our code is available at: https://nifangbaage.github.io/DATTT.Comment: Accepted by CVPR 202
Boosting Few-Shot Semantic Segmentation Via Segment Anything Model
In semantic segmentation, accurate prediction masks are crucial for
downstream tasks such as medical image analysis and image editing. Due to the
lack of annotated data, few-shot semantic segmentation (FSS) performs poorly in
predicting masks with precise contours. Recently, we have noticed that the
large foundation model segment anything model (SAM) performs well in processing
detailed features. Inspired by SAM, we propose FSS-SAM to boost FSS methods by
addressing the issue of inaccurate contour. The FSS-SAM is training-free. It
works as a post-processing tool for any FSS methods and can improve the
accuracy of predicted masks. Specifically, we use predicted masks from FSS
methods to generate prompts and then use SAM to predict new masks. To avoid
predicting wrong masks with SAM, we propose a prediction result selection (PRS)
algorithm. The algorithm can remarkably decrease wrong predictions. Experiment
results on public datasets show that our method is superior to base FSS methods
in both quantitative and qualitative aspects
- …