3,801 research outputs found
Top Rank Optimization in Linear Time
Bipartite ranking aims to learn a real-valued ranking function that orders
positive instances before negative instances. Recent efforts of bipartite
ranking are focused on optimizing ranking accuracy at the top of the ranked
list. Most existing approaches are either to optimize task specific metrics or
to extend the ranking loss by emphasizing more on the error associated with the
top ranked instances, leading to a high computational cost that is super-linear
in the number of training instances. We propose a highly efficient approach,
titled TopPush, for optimizing accuracy at the top that has computational
complexity linear in the number of training instances. We present a novel
analysis that bounds the generalization error for the top ranked instances for
the proposed approach. Empirical study shows that the proposed approach is
highly competitive to the state-of-the-art approaches and is 10-100 times
faster
CUR Algorithm for Partially Observed Matrices
CUR matrix decomposition computes the low rank approximation of a given
matrix by using the actual rows and columns of the matrix. It has been a very
useful tool for handling large matrices. One limitation with the existing
algorithms for CUR matrix decomposition is that they need an access to the {\it
full} matrix, a requirement that can be difficult to fulfill in many real world
applications. In this work, we alleviate this limitation by developing a CUR
decomposition algorithm for partially observed matrices. In particular, the
proposed algorithm computes the low rank approximation of the target matrix
based on (i) the randomly sampled rows and columns, and (ii) a subset of
observed entries that are randomly sampled from the matrix. Our analysis shows
the relative error bound, measured by spectral norm, for the proposed algorithm
when the target matrix is of full rank. We also show that only
observed entries are needed by the proposed algorithm to perfectly recover a
rank matrix of size , which improves the sample complexity of
the existing algorithms for matrix completion. Empirical studies on both
synthetic and real-world datasets verify our theoretical claims and demonstrate
the effectiveness of the proposed algorithm
One Fits All:Power General Time Series Analysis by Pretrained LM
Although we have witnessed great success of pre-trained models in natural
language processing (NLP) and computer vision (CV), limited progress has been
made for general time series analysis. Unlike NLP and CV where a unified model
can be used to perform different tasks, specially designed approach still
dominates in each time series analysis task such as classification, anomaly
detection, forecasting, and few-shot learning. The main challenge that blocks
the development of pre-trained model for time series analysis is the lack of a
large amount of data for training. In this work, we address this challenge by
leveraging language or CV models, pre-trained from billions of tokens, for time
series analysis. Specifically, we refrain from altering the self-attention and
feedforward layers of the residual blocks in the pre-trained language or image
model. This model, known as the Frozen Pretrained Transformer (FPT), is
evaluated through fine-tuning on all major types of tasks involving time
series. Our results demonstrate that pre-trained models on natural language or
images can lead to a comparable or state-of-the-art performance in all main
time series analysis tasks, as illustrated in Figure 1. We also found both
theoretically and empirically that the self-attention module behaviors
similarly to principle component analysis (PCA), an observation that helps
explains how transformer bridges the domain gap and a crucial step towards
understanding the universality of a pre-trained transformer.The code is
publicly available at https://github.com/DAMO-DI-ML/One_Fits_All.Comment: Neurips 2023 Spotligh
- …