71 research outputs found
Nonnegative/binary matrix factorization with a D-Wave quantum annealer
D-Wave quantum annealers represent a novel computational architecture and
have attracted significant interest, but have been used for few real-world
computations. Machine learning has been identified as an area where quantum
annealing may be useful. Here, we show that the D-Wave 2X can be effectively
used as part of an unsupervised machine learning method. This method can be
used to analyze large datasets. The D-Wave only limits the number of features
that can be extracted from the dataset. We apply this method to learn the
features from a set of facial images
MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware
Malware is one of the most dangerous and costly cyber threats to national
security and a crucial factor in modern cyber-space. However, the adoption of
machine learning (ML) based solutions against malware threats has been
relatively slow. Shortcomings in the existing ML approaches are likely
contributing to this problem. The majority of current ML approaches ignore
real-world challenges such as the detection of novel malware. In addition,
proposed ML approaches are often designed either for malware/benign-ware
classification or malware family classification. Here we introduce and showcase
preliminary capabilities of a new method that can perform precise
identification of novel malware families, while also unifying the capability
for malware/benign-ware classification and malware family classification into a
single framework.Comment: Accepted at IEEE ISI 202
Tensor Network Space-Time Spectral Collocation Method for Time Dependent Convection-Diffusion-Reaction Equations
Emerging tensor network techniques for solutions of Partial Differential
Equations (PDEs), known for their ability to break the curse of dimensionality,
deliver new mathematical methods for ultrafast numerical solutions of
high-dimensional problems. Here, we introduce a Tensor Train (TT) Chebyshev
spectral collocation method, in both space and time, for solution of the time
dependent convection-diffusion-reaction (CDR) equation with inhomogeneous
boundary conditions, in Cartesian geometry. Previous methods for numerical
solution of time dependent PDEs often use finite difference for time, and a
spectral scheme for the spatial dimensions, which leads to slow linear
convergence. Spectral collocation space-time methods show exponential
convergence, however, for realistic problems they need to solve large
four-dimensional systems. We overcome this difficulty by using a TT approach as
its complexity only grows linearly with the number of dimensions. We show that
our TT space-time Chebyshev spectral collocation method converges
exponentially, when the solution of the CDR is smooth, and demonstrate that it
leads to very high compression of linear operators from terabytes to kilobytes
in TT-format, and tens of thousands times speedup when compared to full grid
space-time spectral method. These advantages allow us to obtain the solutions
at much higher resolutions
Interactive Distillation of Large Single-Topic Corpora of Scientific Papers
Highly specific datasets of scientific literature are important for both
research and education. However, it is difficult to build such datasets at
scale. A common approach is to build these datasets reductively by applying
topic modeling on an established corpus and selecting specific topics. A more
robust but time-consuming approach is to build the dataset constructively in
which a subject matter expert (SME) handpicks documents. This method does not
scale and is prone to error as the dataset grows. Here we showcase a new tool,
based on machine learning, for constructively generating targeted datasets of
scientific literature. Given a small initial "core" corpus of papers, we build
a citation network of documents. At each step of the citation network, we
generate text embeddings and visualize the embeddings through dimensionality
reduction. Papers are kept in the dataset if they are "similar" to the core or
are otherwise pruned through human-in-the-loop selection. Additional insight
into the papers is gained through sub-topic modeling using SeNMFk. We
demonstrate our new tool for literature review by applying it to two different
fields in machine learning.Comment: Accepted at 2023 IEEE ICMLA conferenc
- …