15 research outputs found
Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares
We study the problem of community detection in hypergraphs under a stochastic
block model. Similarly to how the stochastic block model in graphs suggests
studying spiked random matrices, our model motivates investigating statistical
and computational limits of exact recovery in a certain spiked tensor model. In
contrast with the matrix case, the spiked model naturally arising from
community detection in hypergraphs is different from the one arising in the
so-called tensor Principal Component Analysis model. We investigate the
effectiveness of algorithms in the Sum-of-Squares hierarchy on these models.
Interestingly, our results suggest that these two apparently similar models
exhibit significantly different computational to statistical gaps.Comment: In proceedings of 2017 International Conference on Sampling Theory
and Applications (SampTA
Statistical limits of graphical channel models and a semidefinite programming approach
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 205-213).Community recovery is a major challenge in data science and computer science. The goal in community recovery is to find the hidden clusters from given relational data, which is often represented as a labeled hyper graph where nodes correspond to items needing to be labeled and edges correspond to observed relations between the items. We investigate the problem of exact recovery in the class of statistical models which can be expressed in terms of graphical channels. In a graphical channel model, we observe noisy measurements of the relations between k nodes while the true labeling is unknown to us, and the goal is to recover the labels correctly. This generalizes both the stochastic block models and spiked tensor models for principal component analysis, which has gained much interest over the last decade. We focus on two aspects of exact recovery: statistical limits and efficient algorithms achieving the statistic limit. For the statistical limits, we show that the achievability of exact recovery is essentially determined by whether we can recover the label of one node given other nodes labels with fairly high probability. This phenomenon was observed by Abbe et al. for generic stochastic block models, and called "local-to-global amplification". We confirm that local-to-global amplification indeed holds for generic graphical channel models, under some regularity assumptions. As a corollary, the threshold for exact recovery is explicitly determined. For algorithmic concerns, we consider two examples of graphical channel models, (i) the spiked tensor model with additive Gaussian noise, and (ii) the generalization of the stochastic block model for k-uniform hypergraphs. We propose a strategy which we call "truncate-and-relax", based on a standard semidefinite relaxation technique. We show that in these two models, the algorithm based on this strategy achieves exact recovery up to a threshold which orderwise matches the statistical threshold. We complement this by showing the limitation of the algorithm.by Chiheon Kim.Ph. D
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
Autoregressive transformers have shown remarkable success in video
generation. However, the transformers are prohibited from directly learning the
long-term dependency in videos due to the quadratic complexity of
self-attention, and inherently suffering from slow inference time and error
propagation due to the autoregressive process. In this paper, we propose
Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of
long-term dependency in videos and fast inference. Based on recent advances in
bidirectional transformers, our method learns to decode the entire
spatio-temporal volume of a video in parallel from partially observed patches.
The proposed transformer achieves a linear time complexity in both encoding and
decoding, by projecting observable context tokens into a fixed number of latent
tokens and conditioning them to decode the masked tokens through the
cross-attention. Empowered by linear complexity and bidirectional modeling, our
method demonstrates significant improvement over the autoregressive
Transformers for generating moderately long videos in both quality and speed.
Videos and code are available at https://sites.google.com/view/mebt-cvpr2023
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
Transfer learning of large-scale Text-to-Image (T2I) models has recently
shown impressive potential for Novel View Synthesis (NVS) of diverse objects
from a single image. While previous methods typically train large models on
multi-view datasets for NVS, fine-tuning the whole parameters of T2I models not
only demands a high cost but also reduces the generalization capacity of T2I
models in generating diverse images in a new domain. In this study, we propose
an effective method, dubbed NVS-Adapter, which is a plug-and-play module for a
T2I model, to synthesize novel multi-views of visual objects while fully
exploiting the generalization capacity of T2I models. NVS-Adapter consists of
two main components; view-consistency cross-attention learns the visual
correspondences to align the local details of view features, and global
semantic conditioning aligns the semantic structure of generated views with the
reference view. Experimental results demonstrate that the NVS-Adapter can
effectively synthesize geometrically consistent multi-views and also achieve
high performance on benchmarks without full fine-tuning of T2I models. The code
and data are publicly available in
~\href{https://postech-cvlab.github.io/nvsadapter/}{https://postech-cvlab.github.io/nvsadapter/}.Comment: Project Page: https://postech-cvlab.github.io/nvsadapter
An Energy-Efficient Algorithm for Classification of Fall Types Using a Wearable Sensor
Objective: To mitigate damage from falls, it is essential to provide medical attention expeditiously. Many previous studies have focused on detecting falls and have shown that falls can be accurately detected at least in a laboratory setting. However, a very few studies have classified the different types of falls. To this end, in this paper, a novel energy-efficient algorithm that can discriminate the five most common fall types was developed for wearable systems. Methods: A wearable system with an inertial measurement unit sensor was first developed. Then, our novel algorithm, temporal signal angle measurement (TSAM), was used to classify the different types of falls at various sampling frequencies, and the results were compared with those from three different machine learning algorithms. Results: The overall performance of the TSAM and that of the machine learning algorithms were similar. However, the TSAM outperformed the machine learning algorithms at frequencies in the range of 10-20 Hz. As the sampling frequency dropped from 200 to 10Hz, the accuracy of the TSAM ranged from 93.3% to 91.8%. The sensitivity and specificity ranges from 93.3% to 91.8%, and 98.3% to 97.9%, respectively for the same frequency range. Conclusion: Our algorithm can be utilized with energy-efficient wearable devices at low sampling frequencies to classify different types of falls. Significance: Our system can expedite medical assistance in emergency situations caused by falls by providing the necessary information to medical doctors or clinicians.1
Fast AutoAugment
Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment \cite{cubuk2018autoaugment} has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet. Our code is open to the public by the official GitHub\footnote{\url{https://github.com/kakaobrain/fast-autoaugment}} of Kakao Brain