72 research outputs found
Personality Profiling: How informative are social media profiles in predicting personal information?
Personality profiling has been utilised by companies for targeted
advertising, political campaigns and vaccine campaigns. However, the accuracy
and versatility of such models still remains relatively unknown. Consequently,
we aim to explore the extent to which peoples' online digital footprints can be
used to profile their Myers-Briggs personality type. We analyse and compare the
results of four models: logistic regression, naive Bayes, support vector
machines (SVMs) and random forests. We discover that a SVM model achieves the
best accuracy of 20.95% for predicting someones complete personality type.
However, logistic regression models perform only marginally worse and are
significantly faster to train and perform predictions. We discover that many
labelled datasets present substantial class imbalances of personal
characteristics on social media, including our own. As a result, we highlight
the need for attentive consideration when reporting model performance on these
datasets and compare a number of methods for fixing the class-imbalance
problems. Moreover, we develop a statistical framework for assessing the
importance of different sets of features in our models. We discover some
features to be more informative than others in the Intuitive/Sensory (p =
0.032) and Thinking/Feeling (p = 0.019) models. While we apply these methods to
Myers-Briggs personality profiling, they could be more generally used for any
labelling of individuals on social media.Comment: 8 pages, 6 figures. Dataset available at
https://figshare.com/articles/dataset/Self-Reported_Myers-Briggs_Personality_Types_on_Twitter/2362055
An Induced Natural Selection Heuristic for Finding Optimal Bayesian Experimental Designs
Bayesian optimal experimental design has immense potential to inform the
collection of data so as to subsequently enhance our understanding of a variety
of processes. However, a major impediment is the difficulty in evaluating
optimal designs for problems with large, or high-dimensional, design spaces. We
propose an efficient search heuristic suitable for general optimisation
problems, with a particular focus on optimal Bayesian experimental design
problems. The heuristic evaluates the objective (utility) function at an
initial, randomly generated set of input values. At each generation of the
algorithm, input values are "accepted" if their corresponding objective
(utility) function satisfies some acceptance criteria, and new inputs are
sampled about these accepted points. We demonstrate the new algorithm by
evaluating the optimal Bayesian experimental designs for the previously
considered death, pharmacokinetic and logistic regression models. Comparisons
to the current "gold-standard" method are given to demonstrate the proposed
algorithm as a computationally-efficient alternative for moderately-large
design problems (i.e., up to approximately 40-dimensions)
Revealing Patient-Reported Experiences in Healthcare from Social Media using the DAPMAV Framework
Understanding patient experience in healthcare is increasingly important and
desired by medical professionals in a patient-centred care approach. Healthcare
discourse on social media presents an opportunity to gain a unique perspective
on patient-reported experiences, complementing traditional survey data. These
social media reports often appear as first-hand accounts of patients' journeys
through the healthcare system, whose details extend beyond the confines of
structured surveys and at a far larger scale than focus groups. However, in
contrast with the vast presence of patient-experience data on social media and
the potential benefits the data offers, it attracts comparatively little
research attention due to the technical proficiency required for text analysis.
In this paper, we introduce the Design-Acquire-Process-Model-Analyse-Visualise
(DAPMAV) framework to equip non-technical domain experts with a structured
approach that will enable them to capture patient-reported experiences from
social media data. We apply this framework in a case study on prostate cancer
data from /r/ProstateCancer, demonstrate the framework's value in capturing
specific aspects of patient concern (such as sexual dysfunction), provide an
overview of the discourse, and show narrative and emotional progression through
these stories. We anticipate this framework to apply to a wide variety of areas
in healthcare, including capturing and differentiating experiences across
minority groups, geographic boundaries, and types of illnesses
- …