38,093 research outputs found

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    Personalized Pancreatic Tumor Growth Prediction via Group Learning

    Full text link
    Tumor growth prediction, a highly challenging task, has long been viewed as a mathematical modeling problem, where the tumor growth pattern is personalized based on imaging and clinical data of a target patient. Though mathematical models yield promising results, their prediction accuracy may be limited by the absence of population trend data and personalized clinical characteristics. In this paper, we propose a statistical group learning approach to predict the tumor growth pattern that incorporates both the population trend and personalized data, in order to discover high-level features from multimodal imaging data. A deep convolutional neural network approach is developed to model the voxel-wise spatio-temporal tumor progression. The deep features are combined with the time intervals and the clinical factors to feed a process of feature selection. Our predictive model is pretrained on a group data set and personalized on the target patient data to estimate the future spatio-temporal progression of the patient's tumor. Multimodal imaging data at multiple time points are used in the learning, personalization and inference stages. Our method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD 13.9% +- 9.8% obtained by a previous state-of-the-art model-based method

    Evaluating the accuracy of diffusion MRI models in white matter

    Full text link
    Models of diffusion MRI within a voxel are useful for making inferences about the properties of the tissue and inferring fiber orientation distribution used by tractography algorithms. A useful model must fit the data accurately. However, evaluations of model-accuracy of some of the models that are commonly used in analyzing human white matter have not been published before. Here, we evaluate model-accuracy of the two main classes of diffusion MRI models. The diffusion tensor model (DTM) summarizes diffusion as a 3-dimensional Gaussian distribution. Sparse fascicle models (SFM) summarize the signal as a linear sum of signals originating from a collection of fascicles oriented in different directions. We use cross-validation to assess model-accuracy at different gradient amplitudes (b-values) throughout the white matter. Specifically, we fit each model to all the white matter voxels in one data set and then use the model to predict a second, independent data set. This is the first evaluation of model-accuracy of these models. In most of the white matter the DTM predicts the data more accurately than test-retest reliability; SFM model-accuracy is higher than test-retest reliability and also higher than the DTM, particularly for measurements with (a) a b-value above 1000 in locations containing fiber crossings, and (b) in the regions of the brain surrounding the optic radiations. The SFM also has better parameter-validity: it more accurately estimates the fiber orientation distribution function (fODF) in each voxel, which is useful for fiber tracking

    Ensemble tractography

    Get PDF
    Fiber tractography uses diffusion MRI to estimate the trajectory and cortical projection zones of white matter fascicles in the living human brain. There are many different tractography algorithms and each requires the user to set several parameters, such as curvature threshold. Choosing a single algorithm with a specific parameters sets poses two challenges. First, different algorithms and parameter values produce different results. Second, the optimal choice of algorithm and parameter value may differ between different white matter regions or different fascicles, subjects, and acquisition parameters. We propose using ensemble methods to reduce algorithm and parameter dependencies. To do so we separate the processes of fascicle generation and evaluation. Specifically, we analyze the value of creating optimized connectomes by systematically combining candidate fascicles from an ensemble of algorithms (deterministic and probabilistic) and sweeping through key parameters (curvature and stopping criterion). The ensemble approach leads to optimized connectomes that provide better cross-validatedprediction error of the diffusion MRI data than optimized connectomes generated using the singlealgorithms or parameter set. Furthermore, the ensemble approach produces connectomes that contain both short- and long-range fascicles, whereas single-parameter connectomes are biased towards one or the other. In summary, a systematic ensemble tractography approach can produce connectomes that are superior to standard single parameter estimates both for predicting the diffusion measurements and estimating white matter fascicles.Fil: Takemura, Hiromasa. University of Stanford; Estados Unidos. Osaka University; JapónFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Wandell, Brian A.. University of Stanford; Estados UnidosFil: Pestilli, Franco. Indiana University; Estados Unido

    Scanner Invariant Representations for Diffusion MRI Harmonization

    Get PDF
    Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation. Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more common. We propose learning an intermediate representation invariant to site/protocol variables, a technique adapted from information theory-based algorithmic fairness; by leveraging the data processing inequality, such a representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to underlying structures. To implement this, we use a deep learning method based on variational auto-encoders (VAE) to construct scanner invariant encodings of the imaging data. Results: To evaluate our method, we use training data from the 2018 MICCAI Computational Diffusion MRI (CDMRI) Challenge Harmonization dataset. Our proposed method shows improvements on independent test data relative to a recently published baseline method on each subtask, mapping data from three different scanning contexts to and from one separate target scanning context. Conclusion: As imaging studies continue to grow, the use of pooled multi-site imaging will similarly increase. Invariant representation presents a strong candidate for the harmonization of these data
    corecore