Search CORE

125 research outputs found

Faster Algorithms for Structured Linear and Kernel Support Vector Machines

Author: Gu Yuzhou
Song Zhao
Zhang Lichen
Publication venue
Publication date: 13/11/2023
Field of study

Quadratic programming is a ubiquitous prototype in convex programming. Many combinatorial optimizations on graphs and machine learning problems can be formulated as quadratic programming; for example, Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadratic program has an input size of

\Theta(n^2)

, where

n

is the number of variables. Assuming the Strong Exponential Time Hypothesis (

\textsf{SETH}

), it is known that no

O(n^{2-o(1)})

algorithm exists (Backurs, Indyk, and Schmidt, NIPS'17). However, problems such as SVMs usually feature much smaller input sizes: one is given

n

data points, each of dimension

d

, with

d \ll n

. Furthermore, SVMs are variants with only

O(1)

linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain underlying structures. In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective has small treewidth or admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain a variety of results for SVMs: * For linear SVM, where the quadratic constraint matrix has treewidth

\tau

, we can solve the corresponding program in time

\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))

; * For linear SVM, where the quadratic constraint matrix admits a low-rank factorization of rank-

k

, we can solve the corresponding program in time

\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))

; * For Gaussian kernel SVM, where the data dimension

d = \Theta(\log n)

and the squared dataset radius is small, we can solve it in time

O(n^{1+o(1)}\log(1/\epsilon))

. We also prove that when the squared dataset radius is large, then

\Omega(n^{2-o(1)})

time is required.Comment: New results: almost-linear time algorithm for Gaussian kernel SVM and complementary lower bounds. Abstract shortened to meet arxiv requiremen

arXiv.org e-Print Archive

Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time

Author: Gu Yuzhou
Song Zhao
Yin Junze
Zhang Lichen
Publication venue
Publication date: 20/08/2023
Field of study

Given a matrix

M\in \mathbb{R}^{m\times n}

, the low rank matrix completion problem asks us to find a rank-

k

approximation of

M

UV^\top

for

U\in \mathbb{R}^{m\times k}

and

V\in \mathbb{R}^{n\times k}

by only observing a few entries specified by a set of entries

\Omega\subseteq [m]\times [n]

. In particular, we examine an approach that is widely used in practice -- the alternating minimization framework. Jain, Netrapalli and Sanghavi~\cite{jns13} showed that if

M

has incoherent rows and columns, then alternating minimization provably recovers the matrix

M

by observing a nearly linear in

n

number of entries. While the sample complexity has been subsequently improved~\cite{glz17}, alternating minimization steps are required to be computed exactly. This hinders the development of more efficient algorithms and fails to depict the practical implementation of alternating minimization, where the updates are usually performed approximately in favor of efficiency. In this paper, we take a major step towards a more efficient and error-robust alternating minimization framework. To this end, we develop an analytical framework for alternating minimization that can tolerate moderate amount of errors caused by approximate updates. Moreover, our algorithm runs in time

\widetilde O(|\Omega| k)

, which is nearly linear in the time to verify the solution while preserving the sample complexity. This improves upon all prior known alternating minimization approaches which require

\widetilde O(|\Omega| k^2)

time.Comment: Improve the runtime from

O(mnk)

to $O|\Omega| k)

arXiv.org e-Print Archive

Galactic Phylogenetics

Author: Guanghui Xiao (402183)
Hui Xiao (185054)
Jianing Yu (3520661)
Limin Wang (87511)
Peng He (24579)
Peng Zhao (128233)
Xiaosi Wang (4221901)
Yuzhou Zhang (261075)
Publication venue
Publication date: 27/09/2017
Field of study

Phylogenetics is a widely used concept in evolutionary biology. It is the reconstruction of evolutionary history by building trees that represent branching patterns and sequences. These trees represent shared history, and it is our intention for this approach to be employed in the analysis of Galactic history. In Galactic archaeology the shared environment is the interstellar medium in which stars form and provides the basis for tree-building as a methodological tool. Using elemental abundances of solar-type stars as a proxy for DNA, we built in Jofre et al 2017 such an evolutionary tree to study the chemical evolution of the solar neighbourhood. In this proceeding we summarise these results and discuss future prospects.Comment: Contribution to IAU Symposium No. 334: Rediscovering our Galax

arXiv.org e-Print Archive

FigShare

COVIDanno, COVID-19 annotation in human

Author: Mengyuan Yang
Pora Kim
Weiling Zhao
Xiaobo Zhou
Xiaobo Zhou
Xiaobo Zhou
Yuzhou Feng
Yuzhou Feng
Zhiwei Fan
Zhiwei Fan
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2023
Field of study

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiologic agent of coronavirus disease 19 (COVID-19), has caused a global health crisis. Despite ongoing efforts to treat patients, there is no universal prevention or cure available. One of the feasible approaches will be identifying the key genes from SARS-CoV-2-infected cells. SARS-CoV-2-infected in vitro model, allows easy control of the experimental conditions, obtaining reproducible results, and monitoring of infection progression. Currently, accumulating RNA-seq data from SARS-CoV-2 in vitro models urgently needs systematic translation and interpretation. To fill this gap, we built COVIDanno, COVID-19 annotation in humans, available at http://biomedbdc.wchscu.cn/COVIDanno/. The aim of this resource is to provide a reference resource of intensive functional annotations of differentially expressed genes (DEGs) among different time points of COVID-19 infection in human in vitro models. To do this, we performed differential expression analysis for 136 individual datasets across 13 tissue types. In total, we identified 4,935 DEGs. We performed multiple bioinformatics/computational biology studies for these DEGs. Furthermore, we developed a novel tool to help users predict the status of SARS-CoV-2 infection for a given sample. COVIDanno will be a valuable resource for identifying SARS-CoV-2-related genes and understanding their potential functional roles in different time points and multiple tissue types

Directory of Open Access Journals

Subsurface imaging of stacking faults and dislocations in WS2 CVD grown flakes via Ultrasonic and Heterodyne Force Microscopy

Author: Hamers Bob
Jin Song
Kolosov Oleg Victor
San Juan Mucientes Marta
Shearer Melinda
Zhao Yuzhou
Publication venue
Publication date: 11/10/2017
Field of study

The two-dimensional (2D) materials have multiple applications including optoelectronics [1] and fabrication of micro and nanoelectromechanical systems (MEMS and NEMS respectively), in particular, the layered transition metal dichalcogenide tungsten disulphide (WS2) already applied in the aerospace, automotive, and defence industries due to its high robustness. One of the WS2 synthetic methods is the Chemical Vapour Deposition (CVD) growth. By this method, the material is deposited creating complex structures formed by the orientation change of the individual layers of material making screw dislocations [1]. Therefore, analysing the structure under the surface is possible to understand how the spiral structures are stacked. We used SPM nanomechanical techniques combined with ultrasound - the Ultrasonic Force Microscopy (UFM) and the Heterodyne Force Microscopy (HFM) to identify the dislocations and faults between several stacked WS2 layers. The UFM images allowed to identify different areas with different stiffness which in the topographic AFM images do not show any particular features. The HFM images have better contrast when the difference frequency is equal to the contact resonance of the cantilever (54.4kHz). References [1] M.J. Shearer, L. Samad, Y. Zhang, Y. Zhao, A. Puretzky, K.W. Eliceiri, J.C. Wright, R.J. Hamers, S. Jin, Journal of the American Chemical Society, 139 (2017) 3496-3504. [2] F. Dinelli, M.R. Castell, D.A. Ritchie, N.J. Mason, G.A.D. Briggs, O.V. Kolosov, Philosophical Magazine A, Physics of Condensed Matter Structure Defects and Mechanical Properties, 80 (2000) 2299-2323. [3] M.T. Cuberes, H.E. Assender, G.A.D. Briggs, O.V. Kolosov, Journal of Physics D-Applied Physics, 33 (2000) 2347-2355

Lancaster E-Prints

Nanomechanical Visualisation of Subsurface Defects in WS2/WSe2 CVD Flakes via Ultrasonic Force Microscopies

Author: Hamers Bob
Jin Song
Kolosov Oleg Victor
San Juan Mucientes Marta
Shearer Melinda J.
Zhao Yuzhou
Publication venue
Publication date: 04/04/2018
Field of study

Lancaster E-Prints