Search CORE

72 research outputs found

Personality Profiling: How informative are social media profiles in predicting personal information?

Author: Mitchell Lewis
Tuke Jonathan
Watt Joshua
Publication venue
Publication date: 14/09/2023
Field of study

Personality profiling has been utilised by companies for targeted advertising, political campaigns and vaccine campaigns. However, the accuracy and versatility of such models still remains relatively unknown. Consequently, we aim to explore the extent to which peoples' online digital footprints can be used to profile their Myers-Briggs personality type. We analyse and compare the results of four models: logistic regression, naive Bayes, support vector machines (SVMs) and random forests. We discover that a SVM model achieves the best accuracy of 20.95% for predicting someones complete personality type. However, logistic regression models perform only marginally worse and are significantly faster to train and perform predictions. We discover that many labelled datasets present substantial class imbalances of personal characteristics on social media, including our own. As a result, we highlight the need for attentive consideration when reporting model performance on these datasets and compare a number of methods for fixing the class-imbalance problems. Moreover, we develop a statistical framework for assessing the importance of different sets of features in our models. We discover some features to be more informative than others in the Intuitive/Sensory (p = 0.032) and Thinking/Feeling (p = 0.019) models. While we apply these methods to Myers-Briggs personality profiling, they could be more generally used for any labelling of individuals on social media.Comment: 8 pages, 6 figures. Dataset available at https://figshare.com/articles/dataset/Self-Reported_Myers-Briggs_Personality_Types_on_Twitter/2362055

arXiv.org e-Print Archive

An Induced Natural Selection Heuristic for Finding Optimal Bayesian Experimental Designs

Author: Bean Nigel G.
Price David J.
Ross Joshua V.
Tuke Jonathan
Publication venue
Publication date: 13/03/2018
Field of study

Bayesian optimal experimental design has immense potential to inform the collection of data so as to subsequently enhance our understanding of a variety of processes. However, a major impediment is the difficulty in evaluating optimal designs for problems with large, or high-dimensional, design spaces. We propose an efficient search heuristic suitable for general optimisation problems, with a particular focus on optimal Bayesian experimental design problems. The heuristic evaluates the objective (utility) function at an initial, randomly generated set of input values. At each generation of the algorithm, input values are "accepted" if their corresponding objective (utility) function satisfies some acceptance criteria, and new inputs are sampled about these accepted points. We demonstrate the new algorithm by evaluating the optimal Bayesian experimental designs for the previously considered death, pharmacokinetic and logistic regression models. Comparisons to the current "gold-standard" method are given to demonstrate the proposed algorithm as a computationally-efficient alternative for moderately-large design problems (i.e., up to approximately 40-dimensions)

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Revealing Patient-Reported Experiences in Healthcare from Social Media using the DAPMAV Framework

Author: Mackay Mark
Mitchell Lewis
Murray Curtis
Tuke Jonathan
Publication venue
Publication date: 09/10/2022
Field of study

Understanding patient experience in healthcare is increasingly important and desired by medical professionals in a patient-centred care approach. Healthcare discourse on social media presents an opportunity to gain a unique perspective on patient-reported experiences, complementing traditional survey data. These social media reports often appear as first-hand accounts of patients' journeys through the healthcare system, whose details extend beyond the confines of structured surveys and at a far larger scale than focus groups. However, in contrast with the vast presence of patient-experience data on social media and the potential benefits the data offers, it attracts comparatively little research attention due to the technical proficiency required for text analysis. In this paper, we introduce the Design-Acquire-Process-Model-Analyse-Visualise (DAPMAV) framework to equip non-technical domain experts with a structured approach that will enable them to capture patient-reported experiences from social media data. We apply this framework in a case study on prostate cancer data from /r/ProstateCancer, demonstrate the framework's value in capturing specific aspects of patient concern (such as sexual dysfunction), provide an overview of the discourse, and show narrative and emotional progression through these stories. We anticipate this framework to apply to a wide variety of areas in healthcare, including capturing and differentiating experiences across minority groups, geographic boundaries, and types of illnesses

arXiv.org e-Print Archive