33,444 research outputs found
Event detection in location-based social networks
With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
Everyday the Same Picture: Popularity and Content Diversity
Facebook is flooded by diverse and heterogeneous content, from kittens up to
music and news, passing through satirical and funny stories. Each piece of that
corpus reflects the heterogeneity of the underlying social background. In the
Italian Facebook we have found an interesting case: a page having more than
followers that every day posts the same picture of a popular Italian
singer. In this work, we use such a page as a control to study and model the
relationship between content heterogeneity on popularity. In particular, we use
that page for a comparative analysis of information consumption patterns with
respect to pages posting science and conspiracy news. In total, we analyze
about likes and comments, made by approximately and
users, respectively. We conclude the paper by introducing a model mimicking
users selection preferences accounting for the heterogeneity of contents
Characterizing physiological and symptomatic variation in menstrual cycles using self-tracked mobile health data
The menstrual cycle is a key indicator of overall health for women of
reproductive age. Previously, menstruation was primarily studied through survey
results; however, as menstrual tracking mobile apps become more widely adopted,
they provide an increasingly large, content-rich source of menstrual health
experiences and behaviors over time. By exploring a database of user-tracked
observations from the Clue app by BioWink of over 378,000 users and 4.9 million
natural cycles, we show that self-reported menstrual tracker data can reveal
statistically significant relationships between per-person cycle length
variability and self-reported qualitative symptoms. A concern for self-tracked
data is that they reflect not only physiological behaviors, but also the
engagement dynamics of app users. To mitigate such potential artifacts, we
develop a procedure to exclude cycles lacking user engagement, thereby allowing
us to better distinguish true menstrual patterns from tracking anomalies. We
uncover that women located at different ends of the menstrual variability
spectrum, based on the consistency of their cycle length statistics, exhibit
statistically significant differences in their cycle characteristics and
symptom tracking patterns. We also find that cycle and period length statistics
are stationary over the app usage timeline across the variability spectrum. The
symptoms that we identify as showing statistically significant association with
timing data can be useful to clinicians and users for predicting cycle
variability from symptoms or as potential health indicators for conditions like
endometriosis. Our findings showcase the potential of longitudinal,
high-resolution self-tracked data to improve understanding of menstruation and
women's health as a whole.Comment: The Supplementary Information for this work, as well as the code
required for data pre-processing and producing results is available in
https://github.com/iurteaga/menstrual_cycle_analysi
Visual Quality Enhancement in Optoacoustic Tomography using Active Contour Segmentation Priors
Segmentation of biomedical images is essential for studying and
characterizing anatomical structures, detection and evaluation of pathological
tissues. Segmentation has been further shown to enhance the reconstruction
performance in many tomographic imaging modalities by accounting for
heterogeneities of the excitation field and tissue properties in the imaged
region. This is particularly relevant in optoacoustic tomography, where
discontinuities in the optical and acoustic tissue properties, if not properly
accounted for, may result in deterioration of the imaging performance.
Efficient segmentation of optoacoustic images is often hampered by the
relatively low intrinsic contrast of large anatomical structures, which is
further impaired by the limited angular coverage of some commonly employed
tomographic imaging configurations. Herein, we analyze the performance of
active contour models for boundary segmentation in cross-sectional optoacoustic
tomography. The segmented mask is employed to construct a two compartment model
for the acoustic and optical parameters of the imaged tissues, which is
subsequently used to improve accuracy of the image reconstruction routines. The
performance of the suggested segmentation and modeling approach are showcased
in tissue-mimicking phantoms and small animal imaging experiments.Comment: Accepted for publication in IEEE Transactions on Medical Imagin
Automated construction and analysis of political networks via open government and media sources
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.Peer ReviewedPostprint (author's final draft
- …