136 research outputs found
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Quadratic programming is a ubiquitous prototype in convex programming. Many
combinatorial optimizations on graphs and machine learning problems can be
formulated as quadratic programming; for example, Support Vector Machines
(SVMs). Linear and kernel SVMs have been among the most popular models in
machine learning over the past three decades, prior to the deep learning era.
Generally, a quadratic program has an input size of , where
is the number of variables. Assuming the Strong Exponential Time Hypothesis
(), it is known that no algorithm exists
(Backurs, Indyk, and Schmidt, NIPS'17). However, problems such as SVMs usually
feature much smaller input sizes: one is given data points, each of
dimension , with . Furthermore, SVMs are variants with only
linear constraints. This suggests that faster algorithms are feasible, provided
the program exhibits certain underlying structures.
In this work, we design the first nearly-linear time algorithm for solving
quadratic programs whenever the quadratic objective has small treewidth or
admits a low-rank factorization, and the number of linear constraints is small.
Consequently, we obtain a variety of results for SVMs:
* For linear SVM, where the quadratic constraint matrix has treewidth ,
we can solve the corresponding program in time ;
* For linear SVM, where the quadratic constraint matrix admits a low-rank
factorization of rank-, we can solve the corresponding program in time
;
* For Gaussian kernel SVM, where the data dimension and
the squared dataset radius is small, we can solve it in time
. We also prove that when the squared dataset
radius is large, then time is required.Comment: New results: almost-linear time algorithm for Gaussian kernel SVM and
complementary lower bounds. Abstract shortened to meet arxiv requiremen
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
Given a matrix , the low rank matrix completion
problem asks us to find a rank- approximation of as for and by only observing a
few entries specified by a set of entries . In
particular, we examine an approach that is widely used in practice -- the
alternating minimization framework. Jain, Netrapalli and Sanghavi~\cite{jns13}
showed that if has incoherent rows and columns, then alternating
minimization provably recovers the matrix by observing a nearly linear in
number of entries. While the sample complexity has been subsequently
improved~\cite{glz17}, alternating minimization steps are required to be
computed exactly. This hinders the development of more efficient algorithms and
fails to depict the practical implementation of alternating minimization, where
the updates are usually performed approximately in favor of efficiency.
In this paper, we take a major step towards a more efficient and error-robust
alternating minimization framework. To this end, we develop an analytical
framework for alternating minimization that can tolerate moderate amount of
errors caused by approximate updates. Moreover, our algorithm runs in time
, which is nearly linear in the time to verify the
solution while preserving the sample complexity. This improves upon all prior
known alternating minimization approaches which require time.Comment: Improve the runtime from to $O|\Omega| k)
COVIDanno, COVID-19 Annotation in Human
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiologic agent of coronavirus disease 19 (COVID-19), has caused a global health crisis. Despite ongoing efforts to treat patients, there is no universal prevention or cure available. One of the feasible approaches will be identifying the key genes from SARS-CoV-2-infected cells. SARS-CoV-2-infected in vitro model, allows easy control of the experimental conditions, obtaining reproducible results, and monitoring of infection progression. Currently, accumulating RNA-seq data from SARS-CoV-2 in vitro models urgently needs systematic translation and interpretation. To fill this gap, we built COVIDanno, COVID-19 annotation in humans, available at http://biomedbdc.wchscu.cn/COVIDanno/. The aim of this resource is to provide a reference resource of intensive functional annotations of differentially expressed genes (DEGs) among different time points of COVID-19 infection in human in vitro models. To do this, we performed differential expression analysis for 136 individual datasets across 13 tissue types. In total, we identified 4,935 DEGs. We performed multiple bioinformatics/computational biology studies for these DEGs. Furthermore, we developed a novel tool to help users predict the status of SARS-CoV-2 infection for a given sample. COVIDanno will be a valuable resource for identifying SARS-CoV-2-related genes and understanding their potential functional roles in different time points and multiple tissue types
Galactic Phylogenetics
Phylogenetics is a widely used concept in evolutionary biology. It is the
reconstruction of evolutionary history by building trees that represent
branching patterns and sequences. These trees represent shared history, and it
is our intention for this approach to be employed in the analysis of Galactic
history. In Galactic archaeology the shared environment is the interstellar
medium in which stars form and provides the basis for tree-building as a
methodological tool.
Using elemental abundances of solar-type stars as a proxy for DNA, we built
in Jofre et al 2017 such an evolutionary tree to study the chemical evolution
of the solar neighbourhood. In this proceeding we summarise these results and
discuss future prospects.Comment: Contribution to IAU Symposium No. 334: Rediscovering our Galax
COV2Var, a Function Annotation Database of Sars-Cov-2 Genetic Variation
The COVID-19 pandemic, caused by the coronavirus SARS-CoV-2, has resulted in the loss of millions of lives and severe global economic consequences. Every time SARS-CoV-2 replicates, the viruses acquire new mutations in their genomes. Mutations in SARS-CoV-2 genomes led to increased transmissibility, severe disease outcomes, evasion of the immune response, changes in clinical manifestations and reducing the efficacy of vaccines or treatments. To date, the multiple resources provide lists of detected mutations without key functional annotations. There is a lack of research examining the relationship between mutations and various factors such as disease severity, pathogenicity, patient age, patient gender, cross-species transmission, viral immune escape, immune response level, viral transmission capability, viral evolution, host adaptability, viral protein structure, viral protein function, viral protein stability and concurrent mutations. Deep understanding the relationship between mutation sites and these factors is crucial for advancing our knowledge of SARS-CoV-2 and for developing effective responses. To fill this gap, we built COV2Var, a function annotation database of SARS-CoV-2 genetic variation, available at http://biomedbdc.wchscu.cn/COV2Var/. COV2Var aims to identify common mutations in SARS-CoV-2 variants and assess their effects, providing a valuable resource for intensive functional annotations of common mutations among SARS-CoV-2 variants
COVIDanno, COVID-19 annotation in human
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiologic agent of coronavirus disease 19 (COVID-19), has caused a global health crisis. Despite ongoing efforts to treat patients, there is no universal prevention or cure available. One of the feasible approaches will be identifying the key genes from SARS-CoV-2-infected cells. SARS-CoV-2-infected in vitro model, allows easy control of the experimental conditions, obtaining reproducible results, and monitoring of infection progression. Currently, accumulating RNA-seq data from SARS-CoV-2 in vitro models urgently needs systematic translation and interpretation. To fill this gap, we built COVIDanno, COVID-19 annotation in humans, available at http://biomedbdc.wchscu.cn/COVIDanno/. The aim of this resource is to provide a reference resource of intensive functional annotations of differentially expressed genes (DEGs) among different time points of COVID-19 infection in human in vitro models. To do this, we performed differential expression analysis for 136 individual datasets across 13 tissue types. In total, we identified 4,935 DEGs. We performed multiple bioinformatics/computational biology studies for these DEGs. Furthermore, we developed a novel tool to help users predict the status of SARS-CoV-2 infection for a given sample. COVIDanno will be a valuable resource for identifying SARS-CoV-2-related genes and understanding their potential functional roles in different time points and multiple tissue types
- …