Search CORE

995 research outputs found

Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing

Author: Mueller Martin
Salathé Marcel
Publication venue
Publication date: 14/05/2018
Field of study

In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labelling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labelling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work introduces the technical aspects of the platform and explores its future use cases

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crowdsourcing Dialect Characterization through Twitter

Author: Bruno Gonçalves
D Mocanu
David Sánchez
DT Pham
J Borge-Holthoefer
M Salathé
M Salathé
PJ Rousseeuw
Tobias Preis
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/07/2014
Field of study

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

HAL AMU

Directory of Open Access Journals

PubMed Central

Digital.CSIC

On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Author: Bodnar Todd
Barclay Victoria C
Ram Nilam
Tucker Conrad S
Salathé Marcel
Publication venue
Publication date: 01/01/1995
Field of study

Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul, Kore

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Publikationsserver der RWTH Aachen University

Fabrication of optical planar waveguides in $KY(WO_4)_2$ by He-ion implantation

Author: Borca C.N.
Moretti P.
Pollnau M.
Salathé R.P.
Schnider C.
Zäh F.
Publication venue: IEEE Operations Center
Publication date: 01/01/2005
Field of study

In this paper, planar waveguides produced by He-ion implantation have been demonstrated in undoped and Yb-doped KY(WO/sub 4/)/sub 2/ crystals. The effective refractive indices of guided modes in surface planar waveguides were measured by dark m-line spectroscopy and the refractive index profiles were reconstructed by calculations based on the inverse WKB method. The end-faces of implanted crystals were polished and the waveguiding properties of the obtained planar structures were investigated using a laser diode at 980 nm and a CCD camera

University of Twente Research Information

Interactive exploration of population scale pharmacoepidemiology datasets

Author: Abadi M.
Furu K.
Salathé M.
Ventola C. L.
Wishart D. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2020
Field of study

Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million prescription data set from the Norwegian Prescription Database combined with a 62 million prescriptions for elders that were hospitalized. We preprocess the data in two minutes, train models in seconds, and plot the results in milliseconds. Our results show the power of combining computational power, short computation times, and ease of use for analysis of population scale pharmacoepidemiology datasets. The code is open source and available at: https://github.com/uit-hdl/norpd_prescription_analyse

arXiv.org e-Print Archive

Crossref

On the Ground Validation of Online Diagnosis with Twitter and Medical Records

Author: Barclay Victoria C
Bodnar Todd
Ram Nilam
Salathé Marcel
Tucker Conrad S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref