685 research outputs found

    GPU-Accelerated BWT Construction for Large Collection of Short Reads

    Full text link
    Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI human genome). Recently, BWT has also found its use in string-graph assembly, for indexing the reads (i.e., raw data from DNA sequencers). In a typical data set, the volume of reads is tens of times of the sequenced genome and can be up to 100 Gigabases. Note that a reference genome is relatively stable and computing the index is not a frequent task. For reads, the index has to computed from scratch for each given input. The ability of efficient BWT construction becomes a much bigger concern than before. In this paper, we present a practical method called CX1 for constructing the BWT of very large string collections. CX1 is the first tool that can take advantage of the parallelism given by a graphics processing unit (GPU, a relative cheap device providing a thousand or more primitive cores), as well as simultaneously the parallelism from a multi-core CPU and more interestingly, from a cluster of GPU-enabled nodes. Using CX1, the BWT of a short-read collection of up to 100 Gigabases can be constructed in less than 2 hours using a machine equipped with a quad-core CPU and a GPU, or in about 43 minutes using a cluster with 4 such machines (the speedup is almost linear after excluding the first 16 minutes for loading the reads from the hard disk). The previously fastest tool BRC is measured to take 12 hours to process 100 Gigabases on one machine; it is non-trivial how BRC can be parallelized to take advantage a cluster of machines, let alone GPUs.Comment: 11 page

    MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

    Get PDF
    MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.Comment: 2 pages, 2 tables, 1 figure, submitted to Oxford Bioinformatics as an Application Not

    Hidden Trends in 90 Years of Harvard Business Review

    Full text link
    In this paper, we demonstrate and discuss results of our mining the abstracts of the publications in Harvard Business Review between 1922 and 2012. Techniques for computing n-grams, collocations, basic sentiment analysis, and named-entity recognition were employed to uncover trends hidden in the abstracts. We present findings about international relationships, sentiment in HBR's abstracts, important international companies, influential technological inventions, renown researchers in management theories, US presidents via chronological analyses.Comment: 6 pages, 14 figures, Proceedings of 2012 International Conference on Technologies and Applications of Artificial Intelligenc

    Explicit Visual Prompting for Universal Foreground Segmentations

    Full text link
    Foreground segmentation is a fundamental problem in computer vision, which includes salient object detection, forgery detection, defocus blur detection, shadow detection, and camouflage object detection. Previous works have typically relied on domain-specific solutions to address accuracy and robustness issues in those applications. In this paper, we present a unified framework for a number of foreground segmentation tasks without any task-specific designs. We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP). Different from the previous visual prompting which is typically a dataset-level implicit embedding, our key insight is to enforce the tunable parameters focusing on the explicit visual content from each individual image, i.e., the features from frozen patch embeddings and high-frequency components. Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters. Despite introducing only a small number of tunable parameters, EVP achieves superior performance than full fine-tuning and other parameter-efficient fine-tuning methods. Experiments in fourteen datasets across five tasks show the proposed method outperforms other task-specific methods while being considerably simple. The proposed method demonstrates the scalability in different architectures, pre-trained weights, and tasks. The code is available at: https://github.com/NiFangBaAGe/Explicit-Visual-Prompt.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1088

    AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition

    Full text link
    Raw videos have been proven to own considerable feature redundancy where in many cases only a portion of frames can already meet the requirements for accurate recognition. In this paper, we are interested in whether such redundancy can be effectively leveraged to facilitate efficient inference in continuous sign language recognition (CSLR). We propose a novel adaptive model (AdaBrowse) to dynamically select a most informative subsequence from input video sequences by modelling this problem as a sequential decision task. In specific, we first utilize a lightweight network to quickly scan input videos to extract coarse features. Then these features are fed into a policy network to intelligently select a subsequence to process. The corresponding subsequence is finally inferred by a normal CSLR model for sentence prediction. As only a portion of frames are processed in this procedure, the total computations can be considerably saved. Besides temporal redundancy, we are also interested in whether the inherent spatial redundancy can be seamlessly integrated together to achieve further efficiency, i.e., dynamically selecting a lowest input resolution for each sample, whose model is referred to as AdaBrowse+. Extensive experimental results on four large-scale CSLR datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with 1.44×\times throughput and 2.12×\times fewer FLOPs. Comparisons with other commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness of AdaBrowse. Code is available at \url{https://github.com/hulianyuyy/AdaBrowse}.Comment: ACMMM202

    COMMA: Co-Articulated Multi-Modal Learning

    Full text link
    Pretrained large-scale vision-language models such as CLIP have demonstrated excellent generalizability over a series of downstream tasks. However, they are sensitive to the variation of input text prompts and need a selection of prompt templates to achieve satisfactory performance. Recently, various methods have been proposed to dynamically learn the prompts as the textual inputs to avoid the requirements of laboring hand-crafted prompt engineering in the fine-tuning process. We notice that these methods are suboptimal in two aspects. First, the prompts of the vision and language branches in these methods are usually separated or uni-directionally correlated. Thus, the prompts of both branches are not fully correlated and may not provide enough guidance to align the representations of both branches. Second, it's observed that most previous methods usually achieve better performance on seen classes but cause performance degeneration on unseen classes compared to CLIP. This is because the essential generic knowledge learned in the pretraining stage is partly forgotten in the fine-tuning process. In this paper, we propose Co-Articulated Multi-Modal Learning (COMMA) to handle the above limitations. Especially, our method considers prompts from both branches to generate the prompts to enhance the representation alignment of both branches. Besides, to alleviate forgetting about the essential knowledge, we minimize the feature discrepancy between the learned prompts and the embeddings of hand-crafted prompts in the pre-trained CLIP in the late transformer layers. We evaluate our method across three representative tasks of generalization to novel classes, new target datasets and unseen domain shifts. Experimental results demonstrate the superiority of our method by exhibiting a favorable performance boost upon all tasks with high efficiency.Comment: Accepted to AAAI2024. Code is available at https://github.com/hulianyuyy/COMM

    Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

    Full text link
    Zero-shot Video Object Segmentation (ZSVOS) aims at segmenting the primary moving object without any human annotations. Mainstream solutions mainly focus on learning a single model on large-scale video datasets, which struggle to generalize to unseen videos. In this work, we introduce a test-time training (TTT) strategy to address the problem. Our key insight is to enforce the model to predict consistent depth during the TTT process. In detail, we first train a single network to perform both segmentation and depth prediction tasks. This can be effectively learned with our specifically designed depth modulation layer. Then, for the TTT process, the model is updated by predicting consistent depth maps for the same frame under different data augmentations. In addition, we explore different TTT weight updating strategies. Our empirical results suggest that the momentum-based weight initialization and looping-based training scheme lead to more stable improvements. Experiments show that the proposed method achieves clear improvements on ZSVOS. Our proposed video TTT strategy provides significant superiority over state-of-the-art TTT methods. Our code is available at: https://nifangbaage.github.io/DATTT.Comment: Accepted by CVPR 202

    Boosting Few-Shot Semantic Segmentation Via Segment Anything Model

    Full text link
    In semantic segmentation, accurate prediction masks are crucial for downstream tasks such as medical image analysis and image editing. Due to the lack of annotated data, few-shot semantic segmentation (FSS) performs poorly in predicting masks with precise contours. Recently, we have noticed that the large foundation model segment anything model (SAM) performs well in processing detailed features. Inspired by SAM, we propose FSS-SAM to boost FSS methods by addressing the issue of inaccurate contour. The FSS-SAM is training-free. It works as a post-processing tool for any FSS methods and can improve the accuracy of predicted masks. Specifically, we use predicted masks from FSS methods to generate prompts and then use SAM to predict new masks. To avoid predicting wrong masks with SAM, we propose a prediction result selection (PRS) algorithm. The algorithm can remarkably decrease wrong predictions. Experiment results on public datasets show that our method is superior to base FSS methods in both quantitative and qualitative aspects
    • …
    corecore