Search CORE

239 research outputs found

Faster Algorithms for Structured Linear and Kernel Support Vector Machines

Author: Gu Yuzhou
Song Zhao
Zhang Lichen
Publication venue
Publication date: 13/11/2023
Field of study

Quadratic programming is a ubiquitous prototype in convex programming. Many combinatorial optimizations on graphs and machine learning problems can be formulated as quadratic programming; for example, Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadratic program has an input size of

\Theta(n^2)

, where

n

is the number of variables. Assuming the Strong Exponential Time Hypothesis (

\textsf{SETH}

), it is known that no

O(n^{2-o(1)})

algorithm exists (Backurs, Indyk, and Schmidt, NIPS'17). However, problems such as SVMs usually feature much smaller input sizes: one is given

n

data points, each of dimension

d

, with

d \ll n

. Furthermore, SVMs are variants with only

O(1)

linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain underlying structures. In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective has small treewidth or admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain a variety of results for SVMs: * For linear SVM, where the quadratic constraint matrix has treewidth

\tau

, we can solve the corresponding program in time

\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))

; * For linear SVM, where the quadratic constraint matrix admits a low-rank factorization of rank-

k

, we can solve the corresponding program in time

\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))

; * For Gaussian kernel SVM, where the data dimension

d = \Theta(\log n)

and the squared dataset radius is small, we can solve it in time

O(n^{1+o(1)}\log(1/\epsilon))

. We also prove that when the squared dataset radius is large, then

\Omega(n^{2-o(1)})

time is required.Comment: New results: almost-linear time algorithm for Gaussian kernel SVM and complementary lower bounds. Abstract shortened to meet arxiv requiremen

arXiv.org e-Print Archive

Efficient Algorithm for Solving Hyperbolic Programs

Author: Deng Yichuan
Song Zhao
Zhang Lichen
Zhang Ruizhe
Publication venue
Publication date: 13/06/2023
Field of study

Hyperbolic polynomials is a class of real-roots polynomials that has wide range of applications in theoretical computer science. Each hyperbolic polynomial also induces a hyperbolic cone that is of particular interest in optimization due to its generality, as by choosing the polynomial properly, one can easily recover the classic optimization problems such as linear programming and semidefinite programming. In this work, we develop efficient algorithms for hyperbolic programming, the problem in each one wants to minimize a linear objective, under a system of linear constraints and the solution must be in the hyperbolic cone induced by the hyperbolic polynomial. Our algorithm is an instance of interior point method (IPM) that, instead of following the central path, it follows the central Swath, which is a generalization of central path. To implement the IPM efficiently, we utilize a relaxation of the hyperbolic program to a quadratic program, coupled with the first four moments of the hyperbolic eigenvalues that are crucial to update the optimization direction. We further show that, given an evaluation oracle of the polynomial, our algorithm only requires

O(n^2d^{2.5})

oracle calls, where

n

is the number of variables and

d

is the degree of the polynomial, with extra

O((n+m)^3 d^{0.5})

arithmetic operations, where

m

is the number of constraints

arXiv.org e-Print Archive

Jerk as a Method of Identifying Physical Fatigue and Skill Level in Construction Work

Author: Zhang Lichen
Publication venue: 'University of Waterloo'
Publication date: 23/04/2019
Field of study

Researchers have shown that physically demanding work, characterized by forceful exertions, repetition, and prolonged duration can result in fatigue. Physical fatigue has been identified as a risk factor for both acute and cumulative injuries. Thus, monitoring worker fatigue levels is highly important in health and safety programs as it supports proactive measures to prevent or reduce instances of injury to workers. Recent advancements in sensing technologies, including inertial measurement units (IMUs), present an opportunity for the real-time assessment of individuals' physical exposures. These sensors also exceed the ability of mature motion capture technologies to accurately provide fundamental parameters such as acceleration and its derivative, jerk. Although jerk has been used for a variety of clinical application to assess motor control, it has seldom been studied for applications in physically-demanding occupations that are directly related to physical fatigue detection. This research uses IMU-based motion tracking suits to evaluate the use of jerk to detect changes in motor control. Since fatigue degrades motor control, and thus motion smoothness, it is expected that jerk values will increase with fatigue. Jerk can be felt as the change in force on the body leading to biomechanical injuries over time. Although it is known that fatigue contributes to a decline in motor control, there are no explicit studies that show the relationship between jerk and fatigue. In addition, jerk as it relates to skill level of highly repetitive and demanding work has also remained unexplored. To examine these relationships, our first study evaluates: 1) the use of jerk to detect changes in motor control arising from physical exertion and 2) differences in jerk values between motions performed by workers with varying skill levels. Additionally, we conducted a second study to assess the suitability of machine learning techniques for automated physical fatigue monitoring. Bricklaying experiments were conducted with participants recruited from the Ontario Brick and Stone Mason apprenticeship program. Participants were classified into four groups based on their level of masonry experience including novices, first-year apprentices, third-year apprentices, and journeymen who have greater than five years of experience. In our first study, jerk analysis was carried out on eleven body segments, namely the pelvis, and the dominant and non-dominant upper and lower limb segments. Our findings show that jerk values were consistently lowest for journeymen and highest for third-year apprentices across all eleven body segments. These findings suggest that the experience that journeymen gain over the course of their career improves their ability to perform repetitive heavy lifts with smoother motions and greater control. Third-year apprentices performed lifts with the greatest jerk values, indicating poor motor performance. Attributed to this finding was the pressure that third-year apprentices felt to match their production levels to that of journeymen’s, leading third-year apprentices to use jerkier, less controlled motions. Novices and first-year apprentices showed more caution towards risks of injury, moving with greater motor control, compared to the more experienced third-year apprentices. However, the production levels of novices and first-year apprentices falter far behind the production levels of other experience groups. Detectable increases between jerk values during the beginning (rested) and end (exerted) of the task were found only for the journeymen, which is attributed to their greater interpersonal similarities in learned technique and work pace. In our second study, we investigated the use of support-vector machines (SVM) to automate the monitoring of physical exertion levels using jerk. The jerk values of the pelvis, upper arms, and thighs were used to classify inter-and intra-subject rested and exerted states. As expected, classification results demonstrated a significantly higher intra-subject rested/exerted classification than the inter-subject classification. On average, intra-subject classification achieved an accuracy of 94% for the wall building experiment and 80% for the first-course-of-masonry-units experiment. The thesis findings lead us to conclude that: 1) jerk changes resulting from physical exertion and skill level can be assessed using IMUs, and 2) SVMs have the ability to automatically classify rested and exerted movements. The investigated jerk analysis holds promise for in-situ and real-time monitoring of physical exertion and fatigue which can help in reducing work-related injuries and illnesses

University of Waterloo's Institutional Repository

Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

Author: Song Zhao
Ye Mingquan
Zhang Lichen
Publication venue
Publication date: 10/09/2023
Field of study

We study the problem of solving semidefinite programs (SDP) in the streaming model. Specifically,

m

constraint matrices and a target matrix

C

, all of size

n\times n

together with a vector

b\in \mathbb{R}^m

are streamed to us one-by-one. The goal is to find a matrix

X\in \mathbb{R}^{n\times n}

such that

\langle C, X\rangle

is maximized, subject to

\langle A_i, X\rangle=b_i

for all

i\in [m]

and

X\succeq 0

. Previous algorithmic studies of SDP primarily focus on \emph{time-efficiency}, and all of them require a prohibitively large

\Omega(mn^2)

space in order to store \emph{all the constraints}. Such space consumption is necessary for fast algorithms as it is the size of the input. In this work, we design an interior point method (IPM) that uses

\widetilde O(m^2+n^2)

space, which is strictly sublinear in the regime

n\gg m

. Our algorithm takes

O(\sqrt n\log(1/\epsilon))

passes, which is standard for IPM. Moreover, when

m

is much smaller than

n

, our algorithm also matches the time complexity of the state-of-the-art SDP solvers. To achieve such a sublinear space bound, we design a novel sketching method that enables one to compute a spectral approximation to the Hessian matrix in

O(m^2)

space. To the best of our knowledge, this is the first method that successfully applies sketching technique to improve SDP algorithm in terms of space (also time)

arXiv.org e-Print Archive

Dynamic Tensor Product Regression

Author: Reddy Aravind
Song Zhao
Zhang Lichen
Publication venue
Publication date: 08/10/2022
Field of study

In this work, we initiate the study of \emph{Dynamic Tensor Product Regression}. One has matrices

A_1\in \mathbb{R}^{n_1\times d_1},\ldots,A_q\in \mathbb{R}^{n_q\times d_q}

and a label vector

b\in \mathbb{R}^{n_1\ldots n_q}

, and the goal is to solve the regression problem with the design matrix

A

being the tensor product of the matrices

A_1, A_2, \dots, A_q

i.e.

\min_{x\in \mathbb{R}^{d_1\ldots d_q}}~\|(A_1\otimes \ldots\otimes A_q)x-b\|_2

. At each time step, one matrix

A_i

receives a sparse change, and the goal is to maintain a sketch of the tensor product

A_1\otimes\ldots \otimes A_q

so that the regression solution can be updated quickly. Recomputing the solution from scratch for each round is very slow and so it is important to develop algorithms which can quickly update the solution with the new design matrix. Our main result is a dynamic tree data structure where any update to a single matrix can be propagated quickly throughout the tree. We show that our data structure can be used to solve dynamic versions of not only Tensor Product Regression, but also Tensor Product Spline regression (which is a generalization of ridge regression) and for maintaining Low Rank Approximations for the tensor product.Comment: NeurIPS 202

arXiv.org e-Print Archive

Solving Attention Kernel Regression Problem via Pre-conditioner

Author: Song Zhao
Yin Junze
Zhang Lichen
Publication venue
Publication date: 01/04/2024
Field of study

The attention mechanism is the key to large language models, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms for proxy of attention matrix and solving regressions against them. Given an input matrix

A\in \mathbb{R}^{n\times d}

with

n\gg d

and a response vector

b

, we first consider the matrix exponential of the matrix

A^\top A

as a proxy, and we in turn design algorithms for two types of regression problems:

\min_{x\in \mathbb{R}^d}\|(A^\top A)^jx-b\|_2

and

\min_{x\in \mathbb{R}^d}\|A(A^\top A)^jx-b\|_2

for any positive integer

j

. Studying algorithms for these regressions is essential, as matrix exponential can be approximated term-by-term via these smaller problems. The second proxy is applying exponential entrywise to the Gram matrix, denoted by

\exp(AA^\top)

and solving the regression

\min_{x\in \mathbb{R}^n}\|\exp(AA^\top)x-b \|_2

. We call this problem the attention kernel regression problem, as the matrix

\exp(AA^\top)

could be viewed as a kernel function with respect to

A

. We design fast algorithms for these regression problems, based on sketching and preconditioning. We hope these efforts will provide an alternative perspective of studying efficient approximation of attention matrices.Comment: AISTATS 202

arXiv.org e-Print Archive

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

Author: Song Zhao
Xu Zhaozhuo
Yang Yuanyuan
Zhang Lichen
Publication venue
Publication date: 18/07/2022
Field of study

In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time

\tilde O(nd^{\omega-1}+dn^{1+o(1)})

and per iteration cost

\tilde O(d+n^\rho)

for small constant

\rho

. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time

\tilde O(nd)

and per iteration cost

\tilde O(d+n)

. The first algorithm improves the state-of-the-art (with preprocessing time

\tilde O(d^2n^{1+o(1)})

and per iteration cost

\tilde O(dn^\rho)

) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small

arXiv.org e-Print Archive

Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching

Author: Liu S. Cliff
Song Zhao
Zhang Hengjie
Zhang Lichen
Zhou Tianyi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

DROPS Dagstuhl Research Online Publication Server

Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time

Author: Gu Yuzhou
Song Zhao
Yin Junze
Zhang Lichen
Publication venue
Publication date: 20/08/2023
Field of study

Given a matrix

M\in \mathbb{R}^{m\times n}

, the low rank matrix completion problem asks us to find a rank-

k

approximation of

M

UV^\top

for

U\in \mathbb{R}^{m\times k}

and

V\in \mathbb{R}^{n\times k}

by only observing a few entries specified by a set of entries

\Omega\subseteq [m]\times [n]

. In particular, we examine an approach that is widely used in practice -- the alternating minimization framework. Jain, Netrapalli and Sanghavi~\cite{jns13} showed that if

M

has incoherent rows and columns, then alternating minimization provably recovers the matrix

M

by observing a nearly linear in

n

number of entries. While the sample complexity has been subsequently improved~\cite{glz17}, alternating minimization steps are required to be computed exactly. This hinders the development of more efficient algorithms and fails to depict the practical implementation of alternating minimization, where the updates are usually performed approximately in favor of efficiency. In this paper, we take a major step towards a more efficient and error-robust alternating minimization framework. To this end, we develop an analytical framework for alternating minimization that can tolerate moderate amount of errors caused by approximate updates. Moreover, our algorithm runs in time

\widetilde O(|\Omega| k)

, which is nearly linear in the time to verify the solution while preserving the sample complexity. This improves upon all prior known alternating minimization approaches which require

\widetilde O(|\Omega| k^2)

time.Comment: Improve the runtime from

O(mnk)

to $O|\Omega| k)

arXiv.org e-Print Archive