26 research outputs found
Self-Ordering Point Clouds
In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call selfordering. We further contribute the first end-to-end trainable network that learns a point-wise ordering in a selfsupervised fashion. It utilizes a novel differentiable point scoring-sorting strategy and it constructs an hierarchical contrastive scheme to obtain self-supervision signals. We extensively ablate the method and show its scalability and superior performance even compared to supervised ordering methods on multiple datasets and tasks including zeroshot ordering of point clouds from unseen categories
Time Does Tell: Self-Supervised <i>Time-Tuning</i> of Dense Image Representations
Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consistency in dense self-supervised learning. While methods designed solely for images face difficulties in achieving even the same performance on videos, our method improves not only the representation quality for videos – but also images. Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos. This effectively facilitates the transfer of high-level information from videos to image representations. Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images. We believe this method paves the way for further self-supervised scaling by leveraging the abundant availability of videos. The implementation can be found here : https://github.com/SMSD75/Timetunin
No time to waste: practical statistical contact tracing with few low-bit messages
Pandemics have a major impact on society and the economy. In the case of a new virus, such as COVID-19, high-grade tests and vaccines might be slow to develop and scarce in the crucial initial phase. With no time to waste and lock-downs being expensive, contact tracing is thus an essential tool for policymakers. In theory, statistical inference on a virus transmission model can provide an effective method for tracing infections. However, in practice, such algorithms need to run decentralized, rendering existing methods – that require hundreds or even thousands of daily messages per person – infeasible. In this paper, we develop an algorithm that (i) requires only a few (2-5) daily messages, (ii) works with extremely low bandwidths (3-5 bits) and (iii) enables quarantining and targeted testing that drastically reduces the peak and length of the pandemic. We compare the effectiveness of our algorithm using two agent-based simulators of realistic contact patterns and pandemic parameters and show that it performs well even with low bandwidth, imprecise tests, and incomplete population coverage
PASS: An ImageNet replacement for self-supervised pretraining without humans
Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining. We thus propose an unlabelled dataset PASS: Pictures without humAns for Self-Supervision. PASS only contains images with CC-BY license and complete attribution metadata, addressing the copyright issue. Most importantly, it contains no images of people at all, and also avoids other types of images that are problematic for data protection or ethics. We show that PASS can be used for pretraining with methods such as MoCo-v2, SwAV and DINO. In the transfer learning setting, it yields similar downstream performances to ImageNet pretraining even on tasks that involve humans, such as human pose estimation. PASS does not make existing datasets obsolete, as for instance it is insufficient for benchmarking. However, it shows that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods
Risk profiles and one-year outcomes of patients with newly diagnosed atrial fibrillation in India: Insights from the GARFIELD-AF Registry.
BACKGROUND: The Global Anticoagulant Registry in the FIELD-Atrial Fibrillation (GARFIELD-AF) is an ongoing prospective noninterventional registry, which is providing important information on the baseline characteristics, treatment patterns, and 1-year outcomes in patients with newly diagnosed non-valvular atrial fibrillation (NVAF). This report describes data from Indian patients recruited in this registry. METHODS AND RESULTS: A total of 52,014 patients with newly diagnosed AF were enrolled globally; of these, 1388 patients were recruited from 26 sites within India (2012-2016). In India, the mean age was 65.8 years at diagnosis of NVAF. Hypertension was the most prevalent risk factor for AF, present in 68.5% of patients from India and in 76.3% of patients globally (P < 0.001). Diabetes and coronary artery disease (CAD) were prevalent in 36.2% and 28.1% of patients as compared with global prevalence of 22.2% and 21.6%, respectively (P < 0.001 for both). Antiplatelet therapy was the most common antithrombotic treatment in India. With increasing stroke risk, however, patients were more likely to receive oral anticoagulant therapy [mainly vitamin K antagonist (VKA)], but average international normalized ratio (INR) was lower among Indian patients [median INR value 1.6 (interquartile range {IQR}: 1.3-2.3) versus 2.3 (IQR 1.8-2.8) (P < 0.001)]. Compared with other countries, patients from India had markedly higher rates of all-cause mortality [7.68 per 100 person-years (95% confidence interval 6.32-9.35) vs 4.34 (4.16-4.53), P < 0.0001], while rates of stroke/systemic embolism and major bleeding were lower after 1 year of follow-up. CONCLUSION: Compared to previously published registries from India, the GARFIELD-AF registry describes clinical profiles and outcomes in Indian patients with AF of a different etiology. The registry data show that compared to the rest of the world, Indian AF patients are younger in age and have more diabetes and CAD. Patients with a higher stroke risk are more likely to receive anticoagulation therapy with VKA but are underdosed compared with the global average in the GARFIELD-AF. CLINICAL TRIAL REGISTRATION-URL: http://www.clinicaltrials.gov. Unique identifier: NCT01090362
Self-Supervised Learning of Object Parts for Semantic Segmentation
Progress in self-supervised learning has brought strong image representation learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks such as unsupervised image segmentation have not benefited from this trend as they require spatially-diverse representations. However, learning dense representations is challenging, as in the unsupervised context it is not clear how to guide the model to learn representations that correspond to various potential object categories. In this paper, we argue that self-supervised learning of object parts is a solution to this issue. Object parts are generalizable: they are a priori independent of an object definition, but can be grouped to form objects a posteriori. To this end, we leverage the recently proposed Vision Transformer's capability of attending to objects and combine it with a spatially dense clustering task for fine-tuning the spatial tokens. Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%, showing that our representations are versatile under various object definitions. Finally, we extend this to fully unsupervised segmentation - which refrains completely from using label information even at test-time - and demonstrate that a simple method for automatically merging discovered object parts based on community detection yields substantial gains.