246 research outputs found

    Everybody Compose: Deep Beats To Music

    Full text link
    This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated music. This project allows anyone to compose their own music by tapping their keyboards or ``recoloring'' beat sequences from existing works.Comment: Accepted MMSys '2

    Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

    Full text link
    Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be O(ε2)O(\varepsilon^2) when using Nlog(1/ε)N \lesssim \log (1/\varepsilon) many JKO steps (NN Residual Blocks in the flow) where ε\varepsilon is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-W2W_2 mixed error guarantees. The non-asymptotic convergence rate of the JKO-type W2W_2-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest

    CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration

    Full text link
    In the context of image-to-point cloud registration, acquiring point-to-pixel correspondences presents a challenging task since the similarity between individual points and pixels is ambiguous due to the visual differences in data modalities. Nevertheless, the same object present in the two data formats can be readily identified from the local perspective of point sets and pixel patches. Motivated by this intuition, we propose a coarse-to-fine framework that emphasizes the establishment of correspondences between local point sets and pixel patches, followed by the refinement of results at both the point and pixel levels. On a coarse scale, we mimic the classic Visual Transformer to translate both image and point cloud into two sequences of local representations, namely point and pixel proxies, and employ attention to capture global and cross-modal contexts. To supervise the coarse matching, we propose a novel projected point proportion loss, which guides to match point sets with pixel patches where more points can be projected into. On a finer scale, point-to-pixel correspondences are then refined from a smaller search space (i.e., the coarsely matched sets and patches) via well-designed sampling, attentional learning and fine matching, where sampling masks are embedded in the last two steps to mitigate the negative effect of sampling. With the high-quality correspondences, the registration problem is then resolved by EPnP algorithm within RANSAC. Experimental results on large-scale outdoor benchmarks demonstrate our superiority over existing methods

    Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section

    Full text link
    Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.Comment: Our code is publicly available on GitHub (https://github.com/nyuolab/EfficientTransformer

    Radial Basis Function Neural Network with Particle Swarm Optimization Algorithms for Regional Logistics Demand Prediction

    Get PDF
    Regional logistics prediction is the key step in regional logistics planning and logistics resources rationalization. Since regional economy is the inherent and determinative factor of regional logistics demand, it is feasible to forecast regional logistics demand by investigating economic indicators which can accelerate the harmonious development of regional logistics industry and regional economy. In this paper, the PSO-RBFNN model, a radial basis function neural network (RBFNN) combined with particle swarm optimization (PSO) algorithm, is studied. The PSO-RBFNN model is trained by indicators data in a region to predict the regional logistics demand. And the corresponding results indicate the model’s applicability and potential advantages
    corecore