293 research outputs found
Mining Biclusters of Similar Values with Triadic Concept Analysis
Biclustering numerical data became a popular data-mining task in the
beginning of 2000's, especially for analysing gene expression data. A bicluster
reflects a strong association between a subset of objects and a subset of
attributes in a numerical object/attribute data-table. So called biclusters of
similar values can be thought as maximal sub-tables with close values. Only few
methods address a complete, correct and non redundant enumeration of such
patterns, which is a well-known intractable problem, while no formal framework
exists. In this paper, we introduce important links between biclustering and
formal concept analysis. More specifically, we originally show that Triadic
Concept Analysis (TCA), provides a nice mathematical framework for
biclustering. Interestingly, existing algorithms of TCA, that usually apply on
binary data, can be used (directly or with slight modifications) after a
preprocessing step for extracting maximal biclusters of similar values.Comment: Concept Lattices and their Applications (CLA) (2011
Enhancement of Epidemiological Models for Dengue Fever Based on Twitter Data
Epidemiological early warning systems for dengue fever rely on up-to-date
epidemiological data to forecast future incidence. However, epidemiological
data typically requires time to be available, due to the application of
time-consuming laboratorial tests. This implies that epidemiological models
need to issue predictions with larger antecedence, making their task even more
difficult. On the other hand, online platforms, such as Twitter or Google,
allow us to obtain samples of users' interaction in near real-time and can be
used as sensors to monitor current incidence. In this work, we propose a
framework to exploit online data sources to mitigate the lack of up-to-date
epidemiological data by obtaining estimates of current incidence, which are
then explored by traditional epidemiological models. We show that the proposed
framework obtains more accurate predictions than alternative approaches, with
statistically better results for delays greater or equal to 4 weeks.Comment: ACM Digital Health 201
Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition
Many of the state-of-the-art algorithms for gesture recognition are based on
Conditional Random Fields (CRFs). Successful approaches, such as the
Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
values are mapped to the values of the labels. In this paper we propose a novel
methodology to set the latent values according to the gesture complexity. We
use an heuristic that iterates through the samples associated with each label
value, stimating their complexity. We then use it to assign the latent values
to the label values. We evaluate our method on the task of recognizing human
gestures from video streams. The experiments were performed in binary datasets,
generated by grouping different labels. Our results demonstrate that our
approach outperforms the arbitrary one in many cases, increasing the accuracy
by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on
Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure
Characterizing videos, audience and advertising in Youtube channels for kids
Online video services, messaging systems, games and social media services are
tremendously popular among young people and children in many countries. Most of
the digital services offered on the internet are advertising funded, which
makes advertising ubiquitous in children's everyday life. To understand the
impact of advertising-based digital services on children, we study the
collective behavior of users of YouTube for kids channels and present the
demographics of a large number of users. We collected data from 12,848 videos
from 17 channels in US and UK and 24 channels in Brazil. The channels in
English have been viewed more than 37 billion times. We also collected more
than 14 million comments made by users. Based on a combination of text-analysis
and face recognition tools, we show the presence of racial and gender biases in
our large sample of users. We also identify children actively using YouTube,
although the minimum age for using the service is 13 years in most countries.
We provide comparisons of user behavior among the three countries, which
represent large user populations in the global North and the global South
Portinari: A Data Exploration Tool to Personalize Cervical Cancer Screening
Socio-technical systems play an important role in public health screening
programs to prevent cancer. Cervical cancer incidence has significantly
decreased in countries that developed systems for organized screening engaging
medical practitioners, laboratories and patients. The system automatically
identifies individuals at risk of developing the disease and invites them for a
screening exam or a follow-up exam conducted by medical professionals. A triage
algorithm in the system aims to reduce unnecessary screening exams for
individuals at low-risk while detecting and treating individuals at high-risk.
Despite the general success of screening, the triage algorithm is a
one-size-fits all approach that is not personalized to a patient. This can
easily be observed in historical data from screening exams. Often patients rely
on personal factors to determine that they are either at high risk or not at
risk at all and take action at their own discretion. Can exploring patient
trajectories help hypothesize personal factors leading to their decisions? We
present Portinari, a data exploration tool to query and visualize future
trajectories of patients who have undergone a specific sequence of screening
exams. The web-based tool contains (a) a visual query interface (b) a backend
graph database of events in patients' lives (c) trajectory visualization using
sankey diagrams. We use Portinari to explore diverse trajectories of patients
following the Norwegian triage algorithm. The trajectories demonstrated
variable degrees of adherence to the triage algorithm and allowed
epidemiologists to hypothesize about the possible causes.Comment: Conference paper published at ICSE 2017 Buenos Aires, at the Software
Engineering in Society Track. 10 pages, 5 figure
- …