191 research outputs found
Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays
The performance of speaker verification degrades significantly in adverse
acoustic environments with strong reverberation and noise. To address this
issue, this paper proposes a spatial-temporal graph convolutional network (GCN)
method for the multi-channel speaker verification with ad-hoc microphone
arrays. It includes a feature aggregation block and a channel selection block,
both of which are built on graphs. The feature aggregation block fuses speaker
features among different time and channels by a spatial-temporal GCN. The
graph-based channel selection block discards the noisy channels that may
contribute negatively to the system. The proposed method is flexible in
incorporating various kinds of graphs and prior knowledge. We compared the
proposed method with six representative methods in both real-world and
simulated environments.
Experimental results show that the proposed method achieves a relative equal
error rate (EER) reduction of lower than the strongest
referenced method in the simulated datasets, and lower than
the latter in the real datasets. Moreover, its performance is robust across
different signal-to-noise ratios and reverberation time
FCG-ASpredictor: An Approach for the Prediction of Average Speed of Road Segments with Floating Car GPS Data
The average speed (AS) of a road segment is an important factor for predicting traffic congestion, because the accuracy of AS can directly affect the implementation of traffic management. The traffic environment, spatiotemporal information, and the dynamic interaction between these two factors impact the predictive accuracy of AS in the existing literature, and floating car data comprehensively reflect the operation of urban road vehicles. In this paper, we proposed a novel road segment AS predictive model, which is based on floating car data. First, the impact of historical AS, weather, and date attributes on AS prediction has been analyzed. Then, through spatiotemporal correlations calculation based on the data from Global Positioning System (GPS), the predictive method utilizes the recursive least squares method to fuse the historical AS with other factors (such as weather, date attributes, etc.) and adopts an extended Kalman filter algorithm to accurately predict the AS of the target segment. Finally, we applied our approach on the traffic congestion prediction on four road segments in Chengdu, China. The results showed that the proposed predictive model is highly feasible and accurate.
Document type: Articl
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning
Large Language Models (LLMs) have demonstrated significant potential in
performing multiple tasks in multimedia applications, ranging from content
generation to interactive entertainment, and artistic creation. However, the
diversity of downstream tasks in multitask scenarios presents substantial
adaptation challenges for LLMs. While traditional methods often succumb to
knowledge confusion on their monolithic dense models, Mixture-of-Experts (MoE)
has been emerged as a promising solution with its sparse architecture for
effective task decoupling. Inspired by the principles of human cognitive
neuroscience, we design a novel framework \texttt{Intuition-MoR1E} that
leverages the inherent semantic clustering of instances to mimic the human
brain to deal with multitask, offering implicit guidance to router for
optimized feature allocation. Moreover, we introduce cutting-edge Rank-1
Experts formulation designed to manage a spectrum of intuitions, demonstrating
enhanced parameter efficiency and effectiveness in multitask LLM finetuning.
Extensive experiments demonstrate that Intuition-MoR1E achieves superior
efficiency and 2.15\% overall accuracy improvement across 14 public datasets
against other state-of-the-art baselines.Comment: 13 pages, 5 figure
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
The complicated architecture and high training cost of vision transformers
urge the exploration of post-training quantization. However, the heavy-tailed
distribution of vision transformer activations hinders the effectiveness of
previous post-training quantization methods, even with advanced quantizer
designs. Instead of tuning the quantizer to better fit the complicated
activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic
enhancement for the post-training activation quantization performance of vision
transformers. We make a surprising theoretical discovery that for a given
quantizer, adding a fixed Uniform noisy bias to the values being quantized can
significantly reduce the quantization error under provable conditions. Building
on the theoretical insight, NoisyQuant achieves the first success on actively
altering the heavy-tailed activation distribution with additive noisy bias to
fit a given quantizer. Extensive experiments show NoisyQuant largely improves
the post-training quantization performance of vision transformer with minimal
computation overhead. For instance, on linear uniform 6-bit activation
quantization, NoisyQuant improves SOTA top-1 accuracy on ImageNet by up to
1.7%, 1.1% and 0.5% for ViT, DeiT, and Swin Transformer respectively, achieving
on-par or even higher performance than previous nonlinear, mixed-precision
quantization.Comment: Accepted to CVPR202
Genome architecture changes and major gene variations of Andrias davidianus ranavirus (ADRV)
Ranaviruses are emerging pathogens that have led to global impact and public concern. As a rarely endangered species and the largest amphibian in the world, the Chinese giant salamander, Andrias davidianus, has recently undergone outbreaks of epidemic diseases with high mortality. In this study, we isolated and identified a novel ranavirus from the Chinese giant salamanders that exhibited systemic hemorrhage and swelling syndrome with high death rate in China during May 2011 to August 2012. The isolate, designated Andrias davidianus ranavirus (ADRV), not only could induce cytopathic effects in different fish cell lines and yield high viral titers, but also caused severely hemorrhagic lesions and resulted in 100% mortality in experimental infections of salamanders. The complete genome of ADRV was sequenced and compared with other sequenced amphibian ranaviruses. Gene content and phylogenetic analyses revealed that ADRV should belong to an amphibian subgroup in genus Ranavirus, and is more closely related to frog ranaviruses than to other salamander ranaviruses. Homologous gene comparisons show that ADRV contains 99%, 97%, 94%, 93% and 85% homologues in RGV, FV3, CMTV, TFV and ATV genomes respectively. In addition, several variable major genes, such as duplicate US22 family-like genes, viral eukaryotic translation initiation factor 2 alpha gene and novel 75L gene with both motifs of nuclear localization signal (NLS) and nuclear export signal (NES), were predicted to contribute to pathogen virulence and host susceptibility. These findings confirm the etiologic role of ADRV in epidemic diseases of Chinese giant salamanders, and broaden our understanding of evolutionary emergence of ranaviruses
Diverse Cotraining Makes Strong Semi-Supervised Segmentor
Deep co-training has been introduced to semi-supervised segmentation and
achieves impressive results, yet few studies have explored the working
mechanism behind it. In this work, we revisit the core assumption that supports
co-training: multiple compatible and conditionally independent views. By
theoretically deriving the generalization upper bound, we prove the prediction
similarity between two models negatively impacts the model's generalization
ability. However, most current co-training models are tightly coupled together
and violate this assumption. Such coupling leads to the homogenization of
networks and confirmation bias which consequently limits the performance. To
this end, we explore different dimensions of co-training and systematically
increase the diversity from the aspects of input domains, different
augmentations and model architectures to counteract homogenization. Our Diverse
Co-training outperforms the state-of-the-art (SOTA) methods by a large margin
across different evaluation protocols on the Pascal and Cityscapes. For
example. we achieve the best mIoU of 76.2%, 77.7% and 80.2% on Pascal with only
92, 183 and 366 labeled images, surpassing the previous best results by more
than 5%.Comment: ICCV2023, Camera Ready Version, Code:
\url{https://github.com/williamium3000/diverse-cotraining
Probabilistic Slope Stability Analysis for Embankment Dams
Slope instability is one of the most common forms of dam failure. The commonly used slope stability analysis methods ignore the uncertainty and randomness of dam materials, which may overestimate the stability of dams. In this chapter, a deterministic slope stability analysis based on strength reduction finite-element method is introduced first. After that, the slope is investigated using simple probabilistic concepts and classical slope stability techniques, and the shear strength is treated as a single random variable. Further, the random finite-element method (RFEM) is shown, in which spatial correlation and local averaging are illustrated in detail. Finally, the RFEM is applied to slope stability risk assessment, and the results can lead to higher probabilities of failure
Consistent Targets Provide Better Supervision in Semi-supervised Object Detection
In this study, we dive deep into the inconsistency of pseudo targets in
semi-supervised object detection (SSOD). Our core observation is that the
oscillating pseudo targets undermine the training of an accurate
semi-supervised detector. It not only inject noise into student training but
also lead to severe overfitting on the classification task. Therefore, we
propose a systematic solution, termed Consistent-Teacher, to reduce the
inconsistency. First, adaptive anchor assignment~(ASA) substitutes the static
IoU-based strategy, which enables the student network to be resistant to noisy
pseudo bounding boxes; Then we calibrate the subtask predictions by designing a
3D feature alignment module~(FAM-3D). It allows each classification feature to
adaptively query the optimal feature vector for the regression task at
arbitrary scales and locations. Lastly, a Gaussian Mixture Model (GMM)
dynamically revises the score threshold of the pseudo-bboxes, which stabilizes
the number of ground-truths at an early stage and remedies the unreliable
supervision signal during training. Consistent-Teacher provides strong results
on a large range of SSOD evaluations. It achieves 40.0 mAP with ResNet-50
backbone given only 10\% of annotated MS-COCO data, which surpasses previous
baselines using pseudo labels by around 3 mAP. When trained on fully annotated
MS-COCO with additional unlabeled data, the performance further increases to
47.2 mAP. Our code will be open-sourced soon
FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis
Text-to-SQL, which provides zero-code interface for operating relational
databases, has gained much attention in financial analysis; because, financial
professionals may not well-skilled in SQL programming. However, until now,
there is no practical Text-to-SQL benchmark dataset for financial analysis, and
existing Text-to-SQL methods have not considered the unique characteristics of
databases in financial applications, such as commonly existing wide tables. To
address these issues, we collect a practical Text-to-SQL benchmark dataset and
propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL
framework for financial analysis. The benchmark dataset, BULL, is collected
from the practical financial analysis business of Hundsun Technologies Inc.,
including databases for fund, stock, and macro economy. Besides, the proposed
LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for
financial Text-to-SQL from the perspectives of prompt construction,
parameter-efficient fine-tuning and output calibration. Extensive experimental
results on BULL demonstrate that FinSQL achieves the state-of-the-art
Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to
36.64% performance improvement in scenarios requiring few-shot cross-database
model transfer.Comment: 13 pages, 13 figure
- …