246 research outputs found
Everybody Compose: Deep Beats To Music
This project presents a deep learning approach to generate monophonic
melodies based on input beats, allowing even amateurs to create their own music
compositions. Three effective methods - LSTM with Full Attention, LSTM with
Local Attention, and Transformer with Relative Position Representation - are
proposed for this novel task, providing great variation, harmony, and structure
in the generated music. This project allows anyone to compose their own music
by tapping their keyboards or ``recoloring'' beat sequences from existing
works.Comment: Accepted MMSys '2
Convergence of flow-based generative models via proximal gradient descent in Wasserstein space
Flow-based generative models enjoy certain advantages in computing the data
generation and the likelihood, and have recently shown competitive empirical
performance. Compared to the accumulating theoretical studies on related
score-based diffusion models, analysis of flow-based models, which are
deterministic in both forward (data-to-noise) and reverse (noise-to-data)
directions, remain sparse. In this paper, we provide a theoretical guarantee of
generating data distribution by a progressive flow model, the so-called JKO
flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a
normalizing flow network. Leveraging the exponential convergence of the
proximal gradient descent (GD) in Wasserstein space, we prove the
Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be
when using many JKO steps
( Residual Blocks in the flow) where is the error in the
per-step first-order condition. The assumption on data density is merely a
finite second moment, and the theory extends to data distributions without
density and when there are inversion errors in the reverse process where we
obtain KL- mixed error guarantees. The non-asymptotic convergence rate of
the JKO-type -proximal GD is proved for a general class of convex
objective functionals that includes the KL divergence as a special case, which
can be of independent interest
CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration
In the context of image-to-point cloud registration, acquiring point-to-pixel
correspondences presents a challenging task since the similarity between
individual points and pixels is ambiguous due to the visual differences in data
modalities. Nevertheless, the same object present in the two data formats can
be readily identified from the local perspective of point sets and pixel
patches. Motivated by this intuition, we propose a coarse-to-fine framework
that emphasizes the establishment of correspondences between local point sets
and pixel patches, followed by the refinement of results at both the point and
pixel levels. On a coarse scale, we mimic the classic Visual Transformer to
translate both image and point cloud into two sequences of local
representations, namely point and pixel proxies, and employ attention to
capture global and cross-modal contexts. To supervise the coarse matching, we
propose a novel projected point proportion loss, which guides to match point
sets with pixel patches where more points can be projected into. On a finer
scale, point-to-pixel correspondences are then refined from a smaller search
space (i.e., the coarsely matched sets and patches) via well-designed sampling,
attentional learning and fine matching, where sampling masks are embedded in
the last two steps to mitigate the negative effect of sampling. With the
high-quality correspondences, the registration problem is then resolved by EPnP
algorithm within RANSAC. Experimental results on large-scale outdoor benchmarks
demonstrate our superiority over existing methods
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
Recent advances in large language models have led to renewed interest in
natural language processing in healthcare using the free text of clinical
notes. One distinguishing characteristic of clinical notes is their long time
span over multiple long documents. The unique structure of clinical notes
creates a new design choice: when the context length for a language model
predictor is limited, which part of clinical notes should we choose as the
input? Existing studies either choose the inputs with domain knowledge or
simply truncate them. We propose a framework to analyze the sections with high
predictive power. Using MIMIC-III, we show that: 1) predictive power
distribution is different between nursing notes and discharge notes and 2)
combining different types of notes could improve performance when the context
length is large. Our findings suggest that a carefully selected sampling
function could enable more efficient information extraction from clinical
notes.Comment: Our code is publicly available on GitHub
(https://github.com/nyuolab/EfficientTransformer
Radial Basis Function Neural Network with Particle Swarm Optimization Algorithms for Regional Logistics Demand Prediction
Regional logistics prediction is the key step in regional logistics planning and logistics resources rationalization. Since regional economy is the inherent and determinative factor of regional logistics demand, it is feasible to forecast regional logistics demand by investigating economic indicators which can accelerate the harmonious development of regional logistics industry and regional economy. In this paper, the PSO-RBFNN model, a radial basis function neural network (RBFNN) combined with particle swarm optimization (PSO) algorithm, is studied. The PSO-RBFNN model is trained by indicators data in a region to predict the regional logistics demand. And the corresponding results indicate the model’s applicability and potential advantages
- …