336 research outputs found
Detecting and quantifying natural selection at two linked loci from time series data of allele frequencies with forward-in-time simulations
Recent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such time series genomic data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modeling the sampled chromosomes that contain unknown alleles. Our approach is built on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for selection coefficients is computed by applying the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our approach can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We also illustrate the utility of our method on real data with an application to ancient DNA data associated with white spotting patterns in horses
Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval
Dense retrieval has made significant advancements in information retrieval
(IR) by achieving high levels of effectiveness while maintaining online
efficiency during a single-pass retrieval process. However, the application of
pseudo relevance feedback (PRF) to further enhance retrieval effectiveness
results in a doubling of online latency. To address this challenge, this paper
presents a single-pass dense retrieval framework that shifts the PRF process
offline through the utilization of pre-generated pseudo-queries. As a result,
online retrieval is reduced to a single matching with the pseudo-queries, hence
providing faster online retrieval. The effectiveness of the proposed approach
is evaluated on the standard TREC DL and HARD datasets, and the results
demonstrate its promise. Our code is openly available at
https://github.com/Rosenberg37/OPRF.Comment: Accepted at SIGIR202
SAM3D: Segment Anything in 3D Scenes
In this work, we propose SAM3D, a novel framework that is able to predict
masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB
images without further training or finetuning. For a point cloud of a 3D scene
with posed RGB images, we first predict segmentation masks of RGB images with
SAM, and then project the 2D masks into the 3D points. Later, we merge the 3D
masks iteratively with a bottom-up merging approach. At each step, we merge the
point cloud masks of two adjacent frames with the bidirectional merging
approach. In this way, the 3D masks predicted from different frames are
gradually merged into the 3D masks of the whole 3D scene. Finally, we can
optionally ensemble the result from our SAM3D with the over-segmentation
results based on the geometric information of the 3D scenes. Our approach is
experimented with ScanNet dataset and qualitative results demonstrate that our
SAM3D achieves reasonable and fine-grained 3D segmentation results without any
training or finetuning of SAM.Comment: Technical Report. The code is released at
https://github.com/Pointcept/SegmentAnything3
Understanding Differential Search Index for Text Retrieval
The Differentiable Search Index (DSI) is a novel information retrieval (IR)
framework that utilizes a differentiable function to generate a sorted list of
document identifiers in response to a given query. However, due to the
black-box nature of the end-to-end neural architecture, it remains to be
understood to what extent DSI possesses the basic indexing and retrieval
abilities. To mitigate this gap, in this study, we define and examine three
important abilities that a functioning IR framework should possess, namely,
exclusivity, completeness, and relevance ordering. Our analytical
experimentation shows that while DSI demonstrates proficiency in memorizing the
unidirectional mapping from pseudo queries to document identifiers, it falls
short in distinguishing relevant documents from random ones, thereby negatively
impacting its retrieval effectiveness. To address this issue, we propose a
multi-task distillation approach to enhance the retrieval quality without
altering the structure of the model and successfully endow it with improved
indexing abilities. Through experiments conducted on various datasets, we
demonstrate that our proposed method outperforms previous DSI baselines.Comment: Accepted to Findings of ACL 202
Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification
Data-Free Knowledge Distillation (DFKD) has recently attracted growing
attention in the academic community, especially with major breakthroughs in
computer vision. Despite promising results, the technique has not been well
applied to audio and signal processing. Due to the variable duration of audio
signals, it has its own unique way of modeling. In this work, we propose
feature-rich audio model inversion (FRAMI), a data-free knowledge distillation
framework for general sound classification tasks. It first generates
high-quality and feature-rich Mel-spectrograms through a feature-invariant
contrastive loss. Then, the hidden states before and after the statistics
pooling layer are reused when knowledge distillation is performed on these
feature-rich samples. Experimental results on the Urbansound8k, ESC-50, and
audioMNIST datasets demonstrate that FRAMI can generate feature-rich samples.
Meanwhile, the accuracy of the student model is further improved by reusing the
hidden state and significantly outperforms the baseline method.Comment: Accepted by ICASSP 2023. International Conference on Acoustics,
Speech and Signal Processing (ICASSP 2023
- …