293 research outputs found

    Mining Biclusters of Similar Values with Triadic Concept Analysis

    Get PDF
    Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values.Comment: Concept Lattices and their Applications (CLA) (2011

    Enhancement of Epidemiological Models for Dengue Fever Based on Twitter Data

    Full text link
    Epidemiological early warning systems for dengue fever rely on up-to-date epidemiological data to forecast future incidence. However, epidemiological data typically requires time to be available, due to the application of time-consuming laboratorial tests. This implies that epidemiological models need to issue predictions with larger antecedence, making their task even more difficult. On the other hand, online platforms, such as Twitter or Google, allow us to obtain samples of users' interaction in near real-time and can be used as sensors to monitor current incidence. In this work, we propose a framework to exploit online data sources to mitigate the lack of up-to-date epidemiological data by obtaining estimates of current incidence, which are then explored by traditional epidemiological models. We show that the proposed framework obtains more accurate predictions than alternative approaches, with statistically better results for delays greater or equal to 4 weeks.Comment: ACM Digital Health 201

    Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition

    Full text link
    Many of the state-of-the-art algorithms for gesture recognition are based on Conditional Random Fields (CRFs). Successful approaches, such as the Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose values are mapped to the values of the labels. In this paper we propose a novel methodology to set the latent values according to the gesture complexity. We use an heuristic that iterates through the samples associated with each label value, stimating their complexity. We then use it to assign the latent values to the label values. We evaluate our method on the task of recognizing human gestures from video streams. The experiments were performed in binary datasets, generated by grouping different labels. Our results demonstrate that our approach outperforms the arbitrary one in many cases, increasing the accuracy by up to 10%.Comment: Conference paper published at 2016 29th SIBGRAPI, Conference on Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figure

    Characterizing videos, audience and advertising in Youtube channels for kids

    Full text link
    Online video services, messaging systems, games and social media services are tremendously popular among young people and children in many countries. Most of the digital services offered on the internet are advertising funded, which makes advertising ubiquitous in children's everyday life. To understand the impact of advertising-based digital services on children, we study the collective behavior of users of YouTube for kids channels and present the demographics of a large number of users. We collected data from 12,848 videos from 17 channels in US and UK and 24 channels in Brazil. The channels in English have been viewed more than 37 billion times. We also collected more than 14 million comments made by users. Based on a combination of text-analysis and face recognition tools, we show the presence of racial and gender biases in our large sample of users. We also identify children actively using YouTube, although the minimum age for using the service is 13 years in most countries. We provide comparisons of user behavior among the three countries, which represent large user populations in the global North and the global South

    Portinari: A Data Exploration Tool to Personalize Cervical Cancer Screening

    Full text link
    Socio-technical systems play an important role in public health screening programs to prevent cancer. Cervical cancer incidence has significantly decreased in countries that developed systems for organized screening engaging medical practitioners, laboratories and patients. The system automatically identifies individuals at risk of developing the disease and invites them for a screening exam or a follow-up exam conducted by medical professionals. A triage algorithm in the system aims to reduce unnecessary screening exams for individuals at low-risk while detecting and treating individuals at high-risk. Despite the general success of screening, the triage algorithm is a one-size-fits all approach that is not personalized to a patient. This can easily be observed in historical data from screening exams. Often patients rely on personal factors to determine that they are either at high risk or not at risk at all and take action at their own discretion. Can exploring patient trajectories help hypothesize personal factors leading to their decisions? We present Portinari, a data exploration tool to query and visualize future trajectories of patients who have undergone a specific sequence of screening exams. The web-based tool contains (a) a visual query interface (b) a backend graph database of events in patients' lives (c) trajectory visualization using sankey diagrams. We use Portinari to explore diverse trajectories of patients following the Norwegian triage algorithm. The trajectories demonstrated variable degrees of adherence to the triage algorithm and allowed epidemiologists to hypothesize about the possible causes.Comment: Conference paper published at ICSE 2017 Buenos Aires, at the Software Engineering in Society Track. 10 pages, 5 figure
    • …
    corecore