15 research outputs found

    Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares

    Get PDF
    We study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detection in hypergraphs is different from the one arising in the so-called tensor Principal Component Analysis model. We investigate the effectiveness of algorithms in the Sum-of-Squares hierarchy on these models. Interestingly, our results suggest that these two apparently similar models exhibit significantly different computational to statistical gaps.Comment: In proceedings of 2017 International Conference on Sampling Theory and Applications (SampTA

    Statistical limits of graphical channel models and a semidefinite programming approach

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 205-213).Community recovery is a major challenge in data science and computer science. The goal in community recovery is to find the hidden clusters from given relational data, which is often represented as a labeled hyper graph where nodes correspond to items needing to be labeled and edges correspond to observed relations between the items. We investigate the problem of exact recovery in the class of statistical models which can be expressed in terms of graphical channels. In a graphical channel model, we observe noisy measurements of the relations between k nodes while the true labeling is unknown to us, and the goal is to recover the labels correctly. This generalizes both the stochastic block models and spiked tensor models for principal component analysis, which has gained much interest over the last decade. We focus on two aspects of exact recovery: statistical limits and efficient algorithms achieving the statistic limit. For the statistical limits, we show that the achievability of exact recovery is essentially determined by whether we can recover the label of one node given other nodes labels with fairly high probability. This phenomenon was observed by Abbe et al. for generic stochastic block models, and called "local-to-global amplification". We confirm that local-to-global amplification indeed holds for generic graphical channel models, under some regularity assumptions. As a corollary, the threshold for exact recovery is explicitly determined. For algorithmic concerns, we consider two examples of graphical channel models, (i) the spiked tensor model with additive Gaussian noise, and (ii) the generalization of the stochastic block model for k-uniform hypergraphs. We propose a strategy which we call "truncate-and-relax", based on a standard semidefinite relaxation technique. We show that in these two models, the algorithm based on this strategy achieves exact recovery up to a threshold which orderwise matches the statistical threshold. We complement this by showing the limitation of the algorithm.by Chiheon Kim.Ph. D

    Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

    Full text link
    Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive Transformers for generating moderately long videos in both quality and speed. Videos and code are available at https://sites.google.com/view/mebt-cvpr2023

    NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

    Full text link
    Transfer learning of large-scale Text-to-Image (T2I) models has recently shown impressive potential for Novel View Synthesis (NVS) of diverse objects from a single image. While previous methods typically train large models on multi-view datasets for NVS, fine-tuning the whole parameters of T2I models not only demands a high cost but also reduces the generalization capacity of T2I models in generating diverse images in a new domain. In this study, we propose an effective method, dubbed NVS-Adapter, which is a plug-and-play module for a T2I model, to synthesize novel multi-views of visual objects while fully exploiting the generalization capacity of T2I models. NVS-Adapter consists of two main components; view-consistency cross-attention learns the visual correspondences to align the local details of view features, and global semantic conditioning aligns the semantic structure of generated views with the reference view. Experimental results demonstrate that the NVS-Adapter can effectively synthesize geometrically consistent multi-views and also achieve high performance on benchmarks without full fine-tuning of T2I models. The code and data are publicly available in ~\href{https://postech-cvlab.github.io/nvsadapter/}{https://postech-cvlab.github.io/nvsadapter/}.Comment: Project Page: https://postech-cvlab.github.io/nvsadapter

    An Energy-Efficient Algorithm for Classification of Fall Types Using a Wearable Sensor

    Get PDF
    Objective: To mitigate damage from falls, it is essential to provide medical attention expeditiously. Many previous studies have focused on detecting falls and have shown that falls can be accurately detected at least in a laboratory setting. However, a very few studies have classified the different types of falls. To this end, in this paper, a novel energy-efficient algorithm that can discriminate the five most common fall types was developed for wearable systems. Methods: A wearable system with an inertial measurement unit sensor was first developed. Then, our novel algorithm, temporal signal angle measurement (TSAM), was used to classify the different types of falls at various sampling frequencies, and the results were compared with those from three different machine learning algorithms. Results: The overall performance of the TSAM and that of the machine learning algorithms were similar. However, the TSAM outperformed the machine learning algorithms at frequencies in the range of 10-20 Hz. As the sampling frequency dropped from 200 to 10Hz, the accuracy of the TSAM ranged from 93.3% to 91.8%. The sensitivity and specificity ranges from 93.3% to 91.8%, and 98.3% to 97.9%, respectively for the same frequency range. Conclusion: Our algorithm can be utilized with energy-efficient wearable devices at low sampling frequencies to classify different types of falls. Significance: Our system can expedite medical assistance in emergency situations caused by falls by providing the necessary information to medical doctors or clinicians.1

    Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

    Full text link
    1

    Fast AutoAugment

    Full text link
    Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment \cite{cubuk2018autoaugment} has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet. Our code is open to the public by the official GitHub\footnote{\url{https://github.com/kakaobrain/fast-autoaugment}} of Kakao Brain
    corecore