38,093 research outputs found
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Personalized Pancreatic Tumor Growth Prediction via Group Learning
Tumor growth prediction, a highly challenging task, has long been viewed as a
mathematical modeling problem, where the tumor growth pattern is personalized
based on imaging and clinical data of a target patient. Though mathematical
models yield promising results, their prediction accuracy may be limited by the
absence of population trend data and personalized clinical characteristics. In
this paper, we propose a statistical group learning approach to predict the
tumor growth pattern that incorporates both the population trend and
personalized data, in order to discover high-level features from multimodal
imaging data. A deep convolutional neural network approach is developed to
model the voxel-wise spatio-temporal tumor progression. The deep features are
combined with the time intervals and the clinical factors to feed a process of
feature selection. Our predictive model is pretrained on a group data set and
personalized on the target patient data to estimate the future spatio-temporal
progression of the patient's tumor. Multimodal imaging data at multiple time
points are used in the learning, personalization and inference stages. Our
method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
13.9% +- 9.8% obtained by a previous state-of-the-art model-based method
Evaluating the accuracy of diffusion MRI models in white matter
Models of diffusion MRI within a voxel are useful for making inferences about
the properties of the tissue and inferring fiber orientation distribution used
by tractography algorithms. A useful model must fit the data accurately.
However, evaluations of model-accuracy of some of the models that are commonly
used in analyzing human white matter have not been published before. Here, we
evaluate model-accuracy of the two main classes of diffusion MRI models. The
diffusion tensor model (DTM) summarizes diffusion as a 3-dimensional Gaussian
distribution. Sparse fascicle models (SFM) summarize the signal as a linear sum
of signals originating from a collection of fascicles oriented in different
directions. We use cross-validation to assess model-accuracy at different
gradient amplitudes (b-values) throughout the white matter. Specifically, we
fit each model to all the white matter voxels in one data set and then use the
model to predict a second, independent data set. This is the first evaluation
of model-accuracy of these models. In most of the white matter the DTM predicts
the data more accurately than test-retest reliability; SFM model-accuracy is
higher than test-retest reliability and also higher than the DTM, particularly
for measurements with (a) a b-value above 1000 in locations containing fiber
crossings, and (b) in the regions of the brain surrounding the optic
radiations. The SFM also has better parameter-validity: it more accurately
estimates the fiber orientation distribution function (fODF) in each voxel,
which is useful for fiber tracking
Ensemble tractography
Fiber tractography uses diffusion MRI to estimate the trajectory and cortical projection zones of white matter fascicles in the living human brain. There are many different tractography algorithms and each requires the user to set several parameters, such as curvature threshold. Choosing a single algorithm with a specific parameters sets poses two challenges. First, different algorithms and parameter values produce different results. Second, the optimal choice of algorithm and parameter value may differ between different white matter regions or different fascicles, subjects, and acquisition parameters. We propose using ensemble methods to reduce algorithm and parameter dependencies. To do so we separate the processes of fascicle generation and evaluation. Specifically, we analyze the value of creating optimized connectomes by systematically combining candidate fascicles from an ensemble of algorithms (deterministic and probabilistic) and sweeping through key parameters (curvature and stopping criterion). The ensemble approach leads to optimized connectomes that provide better cross-validatedprediction error of the diffusion MRI data than optimized connectomes generated using the singlealgorithms or parameter set. Furthermore, the ensemble approach produces connectomes that contain both short- and long-range fascicles, whereas single-parameter connectomes are biased towards one or the other. In summary, a systematic ensemble tractography approach can produce connectomes that are superior to standard single parameter estimates both for predicting the diffusion measurements and estimating white matter fascicles.Fil: Takemura, Hiromasa. University of Stanford; Estados Unidos. Osaka University; JapónFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Wandell, Brian A.. University of Stanford; Estados UnidosFil: Pestilli, Franco. Indiana University; Estados Unido
Scanner Invariant Representations for Diffusion MRI Harmonization
Purpose: In the present work we describe the correction of diffusion-weighted
MRI for site and scanner biases using a novel method based on invariant
representation.
Theory and Methods: Pooled imaging data from multiple sources are subject to
variation between the sources. Correcting for these biases has become very
important as imaging studies increase in size and multi-site cases become more
common. We propose learning an intermediate representation invariant to
site/protocol variables, a technique adapted from information theory-based
algorithmic fairness; by leveraging the data processing inequality, such a
representation can then be used to create an image reconstruction that is
uninformative of its original source, yet still faithful to underlying
structures. To implement this, we use a deep learning method based on
variational auto-encoders (VAE) to construct scanner invariant encodings of the
imaging data.
Results: To evaluate our method, we use training data from the 2018 MICCAI
Computational Diffusion MRI (CDMRI) Challenge Harmonization dataset. Our
proposed method shows improvements on independent test data relative to a
recently published baseline method on each subtask, mapping data from three
different scanning contexts to and from one separate target scanning context.
Conclusion: As imaging studies continue to grow, the use of pooled multi-site
imaging will similarly increase. Invariant representation presents a strong
candidate for the harmonization of these data
- …