42 research outputs found

    Large-scale geo-facial image analysis

    Get PDF
    While face analysis from images is a well-studied area, little work has explored the dependence of facial appearance on the geographic location from which the image was captured. To fill this gap, we constructed GeoFaces, a large dataset of geotagged face images, and used it to examine the geo-dependence of facial features and attributes, such as ethnicity, gender, or the presence of facial hair. Our analysis illuminates the relationship between raw facial appearance, facial attributes, and geographic location, both globally and in selected major urban areas. Some of our experiments, and the resulting visualizations, confirm prior expectations, such as the predominance of ethnically Asian faces in Asia, while others highlight novel information that can be obtained with this type of analysis, such as the major city with the highest percentage of people with a mustache

    How can photo sharing inspire sharing genomes?

    Get PDF
    People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep-resentations of real lives. They contain private information that maycompromise people’s privacy, and still, their highest value is most oftimes achieved only when sharing them with others. In this work, wepresent an analogy between the privacy aspects of sharing photos andsharing genomes, which clarifies the privacy risks in the latter to thegeneral public. Additionally, we illustrate an alternative informed modelto share genomic data according to the privacy-sensitivity level of eachportion. This article is a call to arms for a collaborative work between ge-neticists and security experts to build more effective methods to system-atically protect privacy, whilst promoting the accessibility and sharingof genome

    Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

    Get PDF
    This work was supported by Keygene N.V., a crop innovation company in the Netherlands and by the Spanish MINECO/FEDER Project TEC201680141-P with the associated FPI grant BES-2017-079792.The authors thank Dr. Elvin Isufi and Chirag Raman for their valuable comments and feedback.Motivation: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining.Keygene N.V., a crop innovation company in the NetherlandsSpanish MINECO/FEDER TEC201680141-PFPI grant BES-2017-07979

    Multi-task self-supervised visual learning

    No full text
    We investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling-in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for “harmonizing” network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks-even via a näýve multihead architecture-always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction

    Sim2real transfer learning for 3D human pose estimation: motion to the rescue

    No full text
    Synthetic visual data can provide practicically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person’s motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset

    Exploiting temporal context for 3D human pose estimation in the wild

    No full text
    We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets

    Video action transformer network

    No full text
    We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others. Additionally its attention mechanism learns to emphasize hands and faces, which are often crucial to discriminate an action - all without explicit supervision other than boxes and class labels. We train and test our Action Transformer network on the Atomic Visual Actions (AVA) dataset, outperforming the state-of-the-art by a significant margin using only raw RGB frames as input

    Leveraging unlabeled data

    No full text

    Convolutional Network Features for Scene Recognition

    No full text

    Architectural Style Classification Using Multinomial Latent Logistic Regression

    No full text
    corecore