995 research outputs found
Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing
In the past decade, tracking health trends using social media data has shown
great promise, due to a powerful combination of massive adoption of social
media around the world, and increasingly potent hardware and software that
enables us to work with these new big data streams. At the same time, many
challenging problems have been identified. First, there is often a mismatch
between how rapidly online data can change, and how rapidly algorithms are
updated, which means that there is limited reusability for algorithms trained
on past data as their performance decreases over time. Second, much of the work
is focusing on specific issues during a specific past period in time, even
though public health institutions would need flexible tools to assess multiple
evolving situations in real time. Third, most tools providing such capabilities
are proprietary systems with little algorithmic or data transparency, and thus
little buy-in from the global public health and research community. Here, we
introduce Crowdbreaks, an open platform which allows tracking of health trends
by making use of continuous crowdsourced labelling of public social media
content. The system is built in a way which automatizes the typical workflow
from data collection, filtering, labelling and training of machine learning
classifiers and therefore can greatly accelerate the research process in the
public health domain. This work introduces the technical aspects of the
platform and explores its future use cases
Crowdsourcing Dialect Characterization through Twitter
We perform a large-scale analysis of language diatopic variation using
geotagged microblogging datasets. By collecting all Twitter messages written in
Spanish over more than two years, we build a corpus from which a carefully
selected list of concepts allows us to characterize Spanish varieties on a
global scale. A cluster analysis proves the existence of well defined
macroregions sharing common lexical properties. Remarkably enough, we find that
Spanish language is split into two superdialects, namely, an urban speech used
across major American and Spanish citites and a diverse form that encompasses
rural areas and small towns. The latter can be further clustered into smaller
varieties with a stronger regional character.Comment: 10 pages, 5 figure
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
Social media has been considered as a data source for tracking disease.
However, most analyses are based on models that prioritize strong correlation
with population-level disease rates over determining whether or not specific
individual users are actually sick. Taking a different approach, we develop a
novel system for social-media based disease detection at the individual level
using a sample of professionally diagnosed individuals. Specifically, we
develop a system for making an accurate influenza diagnosis based on an
individual's publicly available Twitter data. We find that about half (17/35 =
48.57%) of the users in our sample that were sick explicitly discuss their
disease on Twitter. By developing a meta classifier that combines text
analysis, anomaly detection, and social network analysis, we are able to
diagnose an individual with greater than 99% accuracy even if she does not
discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul,
Kore
Fabrication of optical planar waveguides in by He-ion implantation
In this paper, planar waveguides produced by He-ion implantation have been demonstrated in undoped and Yb-doped KY(WO/sub 4/)/sub 2/ crystals. The effective refractive indices of guided modes in surface planar waveguides were measured by dark m-line spectroscopy and the refractive index profiles were reconstructed by calculations based on the inverse WKB method. The end-faces of implanted crystals were polished and the waveguiding properties of the obtained planar structures were investigated using a laser diode at 980 nm and a CCD camera
Interactive exploration of population scale pharmacoepidemiology datasets
Population-scale drug prescription data linked with adverse drug reaction
(ADR) data supports the fitting of models large enough to detect drug use and
ADR patterns that are not detectable using traditional methods on smaller
datasets. However, detecting ADR patterns in large datasets requires tools for
scalable data processing, machine learning for data analysis, and interactive
visualization. To our knowledge no existing pharmacoepidemiology tool supports
all three requirements. We have therefore created a tool for interactive
exploration of patterns in prescription datasets with millions of samples. We
use Spark to preprocess the data for machine learning and for analyses using
SQL queries. We have implemented models in Keras and the scikit-learn
framework. The model results are visualized and interpreted using live Python
coding in Jupyter. We apply our tool to explore a 384 million prescription data
set from the Norwegian Prescription Database combined with a 62 million
prescriptions for elders that were hospitalized. We preprocess the data in two
minutes, train models in seconds, and plot the results in milliseconds. Our
results show the power of combining computational power, short computation
times, and ease of use for analysis of population scale pharmacoepidemiology
datasets. The code is open source and available at:
https://github.com/uit-hdl/norpd_prescription_analyse
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
Social media has been considered as a data source for tracking disease.
However, most analyses are based on models that prioritize strong correlation
with population-level disease rates over determining whether or not specific
individual users are actually sick. Taking a different approach, we develop a
novel system for social-media based disease detection at the individual level
using a sample of professionally diagnosed individuals. Specifically, we
develop a system for making an accurate influenza diagnosis based on an
individual's publicly available Twitter data. We find that about half (17/35 =
48.57%) of the users in our sample that were sick explicitly discuss their
disease on Twitter. By developing a meta classifier that combines text
analysis, anomaly detection, and social network analysis, we are able to
diagnose an individual with greater than 99% accuracy even if she does not
discuss her health.Comment: Presented at of WWW2014. WWW'14 Companion, April 7-11, 2014, Seoul,
Kore
- …