152 research outputs found

    Text-Only Image Captioning with Multi-Context Data Generation

    Full text link
    Text-only Image Captioning (TIC) is an approach that aims to construct a model solely based on text that can accurately describe images. Recently, diffusion models have demonstrated remarkable capabilities in generating high-quality images that are semantically coherent with given texts. This presents an opportunity to generate synthetic training images for TIC. However, we have identified a challenge that the images generated from simple descriptions typically exhibit a single perspective with one or limited contexts, which is not aligned with the complexity of real-world scenes in the image domain. In this paper, we propose a novel framework that addresses this issue by introducing multi-context data generation. Starting with an initial text corpus, our framework employs a large language model to select multiple sentences that describe the same scene from various perspectives. These sentences are then summarized into a single sentence with multiple contexts. We generate simple images using the straightforward sentences and complex images using the summarized sentences through diffusion models. Finally, we train the model exclusively using the synthetic image-text pairs obtained from this process. Experimental results demonstrate that our proposed framework effectively tackles the central challenge we have identified, achieving the state-of-the-art performance on popular datasets such as MSCOCO, Flickr30k, and SS1M

    MotionBERT: A Unified Perspective on Learning Human Motion Representations

    Full text link
    We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. Specifically, we propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. The motion representations acquired in this way incorporate geometric, kinematic, and physical knowledge about human motion, which can be easily transferred to multiple downstream tasks. We implement the motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network. It could capture long-range spatio-temporal relationships among the skeletal joints comprehensively and adaptively, exemplified by the lowest 3D pose estimation error so far when trained from scratch. Furthermore, our proposed framework achieves state-of-the-art performance on all three downstream tasks by simply finetuning the pretrained motion encoder with a simple regression head (1-2 layers), which demonstrates the versatility of the learned motion representations. Code and models are available at https://motionbert.github.io/Comment: ICCV 2023 Camera Read

    EGC: Image Generation and Classification via a Diffusion Energy-Based Model

    Full text link
    Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution p(y∣x)p(y|\mathbf{x})), the forward pass in EGC is a classifier that outputs a joint distribution p(x,y)p(\mathbf{x},y), enabling an image generator in its backward pass by marginalizing out the label yy. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning

    Identification of microRNA precursors based on random forest with network-level representation method of stem-loop structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately.</p> <p>Results</p> <p>In the present work, a pre-miRNA stem-loop secondary structure is translated to a network, which provides a novel perspective for its structural analysis. Network parameters are used to construct prediction model, achieving an area under the receiver operating curves (AUC) value of 0.956. Moreover, by repeating the same method on two independent datasets, accuracies of 0.976 and 0.913 are achieved, respectively.</p> <p>Conclusions</p> <p>Network parameters effectively characterize pre-miRNA secondary structure, which improves our prediction model in both prediction ability and computation efficiency. Additionally, as a complement to feature extraction methods in previous studies, these multifaceted features can reflect natural properties of miRNAs and be used for comprehensive and systematic analysis on miRNA.</p

    Learning analytics for the global south

    Get PDF
    Learning Analytics for the Global South is a compilation of papers commissioned for the Digital Learning for Development (DL4D) project. DL4D is part of the Information Networks in Asia and Sub-Saharan Africa (INASSA) program funded jointly by the International Development Research Centre (IDRC) of Canada and the Department for International Development (DFID) of the United Kingdom, and administered by the Foundation for Information Technology Education and Development (FIT-ED) of the Philippines. DL4D aims to examine how digital learning could be used to address issues of equity, quality, and efficiency at all educational levels in developing countries. Over the past two years, DL4D has brought together leading international and regional scholars and practitioners to critically assess the potentials, prospects, challenges, and future directions for the Global South in key areas of interest around digital learning. It commissioned discussion papers for each of these areas from leading experts in the field: Diana Laurillard of the University College London Knowledge Lab, for learning at scale; Chris Dede of Harvard University, for digital game-based learning; Charalambos Vrasidas of the Centre for the Advancement of Research and Development in Educational Technology, for cost-effective digital learning innovations; and for learning analytics, the subject of this compilation, Dragan Gašević of the University of Edinburgh Moray House School of Education and School of Informatics. Each discussion paper is complemented by responses from a developing country-perspective by regional experts in Asia, Latin America, Africa, and the Middle East. Learning Analytics for the Global South considers how the collection, analysis, and use of data about learners and their contexts have the potential to broaden access to quality education and improve the efficiency of educational processes and systems in developing countries around the world. In his discussion paper, Prof. Gašević articulates these potentials and suggests how learning analytics could support critical digital learning and education imperatives such as quality learning at scale and the acquisition of 21st century skills. Experts from Africa (Paul Prinsloo of the University of South Africa), Mainland China (Bodong Chen of the University of Minnesota, USA and Yizhou Fan of Peking University, People’s Republic of China), Southeast Asia (Ma. Mercedes T. Rodrigo of the Ateneo de Manila University, Philippines), and Latin America (Cristóbal Cobo and Cecilia Aguerrebere, both of the Ceibal Foundation, Uruguay) situate Prof. Gašević’s proposals in their respective regional contexts, framing their responses around six key questions: 1. What are the main trends and challenges in education in your region? 2. How can learning analytics address these challenges? 3. What models of learning analytics adoption would be most effective in your region? 4. What are the barriers in adoption of learning analytics in your region and how could these be mitigated? 5. How do you envision ethical use and privacy protection in connection with learning analytics being addressed in your region? 6. How can the operationalization of learning analytics be futureproofed in your region? We hope that this compilation will serve as a springboard for deeper conversations about the adoption and sustained use of learning analytics in developing countries – its potential benefits and risks for learners, educators, and educations systems, as well as the ways to move forward that are rigorous, context-appropriate, ethical, and accountable.This work was created with financial support from the UK Government’s Department for International Development and the International Development Research Centre, Canada. The views expressed in this work are those of the authors and do not necessarily represent those of the UK Government’s Department for International Development; the International Development Research Centre, Canada or its Board of Governors; the Foundation for Information Technology Education and Development; or the editors

    Comparing empirical kinship derived heritability for imaging genetics traits in the UK biobank and human connectome project

    Get PDF
    Imaging genetics analyses use neuroimaging traits as intermediate phenotypes to infer the degree of genetic contribution to brain structure and function in health and/or illness. Coefficients of relatedness (CR) summarize the degree of genetic similarity among subjects and are used to estimate the heritability – the proportion of phenotypic variance explained by genetic factors. The CR can be inferred directly from genome-wide genotype data to explain the degree of shared variation in common genetic polymorphisms (SNP-heritability) among related or unrelated subjects. We developed a central processing and graphics processing unit (CPU and GPU) accelerated Fast and Powerful Heritability Inference (FPHI) approach that linearizes likelihood calculations to overcome the ∼N2–3 computational effort dependency on sample size of classical likelihood approaches. We calculated for 60 regional and 1.3 × 105 voxel-wise traits in N = 1,206 twin and sibling participants from the Human Connectome Project (HCP) (550 M/656 F, age = 28.8 ± 3.7 years) and N = 37,432 (17,531 M/19,901 F; age = 63.7 ± 7.5 years) participants from the UK Biobank (UKBB). The FPHI estimates were in excellent agreement with heritability values calculated using Genome-wide Complex Trait Analysis software (r = 0.96 and 0.98 in HCP and UKBB sample) while significantly reducing computational (102–4 times). The regional and voxel-wise traits heritability estimates for the HCP and UKBB were likewise in excellent agreement (r = 0.63–0.76, p \u3c 10−10). In summary, the hardware-accelerated FPHI made it practical to calculate heritability values for voxel-wise neuroimaging traits, even in very large samples such as the UKBB. The patterns of additive genetic variance in neuroimaging traits measured in a large sample of related and unrelated individuals showed excellent agreement regardless of the estimation method. The code and instruction to execute these analyses are available at www.solar-eclipse-genetics.org
    • …
    corecore