116 research outputs found
Large scale homophily analysis in twitter using a twixonomy
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation
A Topic Recommender for Journalists
The way in which people acquire information on events and form their own
opinion on them has changed dramatically with the advent of social media. For many
readers, the news gathered from online sources become an opportunity to share points
of view and information within micro-blogging platforms such as Twitter, mainly
aimed at satisfying their communication needs. Furthermore, the need to deepen the
aspects related to news stimulates a demand for additional information which is often
met through online encyclopedias, such as Wikipedia. This behaviour has also
influenced the way in which journalists write their articles, requiring a careful assessment
of what actually interests the readers. The goal of this paper is to present
a recommender system, What to Write and Why, capable of suggesting to a journalist,
for a given event, the aspects still uncovered in news articles on which the
readers focus their interest. The basic idea is to characterize an event according to
the echo it receives in online news sources and associate it with the corresponding
readers’ communicative and informative patterns, detected through the analysis of
Twitter and Wikipedia, respectively. Our methodology temporally aligns the results
of this analysis and recommends the concepts that emerge as topics of interest from
Twitter and Wikipedia, either not covered or poorly covered in the published news
articles
Modeling Quality and Machine Learning Pipelines through Extended Feature Models
The recently increased complexity of Machine Learning (ML) methods, led to
the necessity to lighten both the research and industry development processes.
ML pipelines have become an essential tool for experts of many domains, data
scientists and researchers, allowing them to easily put together several ML
models to cover the full analytic process starting from raw datasets. Over the
years, several solutions have been proposed to automate the building of ML
pipelines, most of them focused on semantic aspects and characteristics of the
input dataset. However, an approach taking into account the new quality
concerns needed by ML systems (like fairness, interpretability, privacy, etc.)
is still missing. In this paper, we first identify, from the literature, key
quality attributes of ML systems. Further, we propose a new engineering
approach for quality ML pipeline by properly extending the Feature Models
meta-model. The presented approach allows to model ML pipelines, their quality
requirements (on the whole pipeline and on single phases), and quality
characteristics of algorithms used to implement each pipeline phase. Finally,
we demonstrate the expressiveness of our model considering the classification
problem
Robust Stochastic Graph Generator for Counterfactual Explanations
Counterfactual Explanation (CE) techniques have garnered attention as a means
to provide insights to the users engaging with AI systems. While extensively
researched in domains such as medical imaging and autonomous vehicles, Graph
Counterfactual Explanation (GCE) methods have been comparatively
under-explored. GCEs generate a new graph similar to the original one, with a
different outcome grounded on the underlying predictive model. Among these GCE
techniques, those rooted in generative mechanisms have received relatively
limited investigation despite demonstrating impressive accomplishments in other
domains, such as artistic styles and natural language modelling. The preference
for generative explainers stems from their capacity to generate counterfactual
instances during inference, leveraging autonomously acquired perturbations of
the input graph. Motivated by the rationales above, our study introduces
RSGG-CE, a novel Robust Stochastic Graph Generator for Counterfactual
Explanations able to produce counterfactual examples from the learned latent
space considering a partially ordered generation sequence. Furthermore, we
undertake quantitative and qualitative analyses to compare RSGG-CE's
performance against SoA generative explainers, highlighting its increased
ability to engendering plausible counterfactual candidates.Comment: Accepted in AAAI'2
Can Twitter be a source of information on allergy? Correlation of pollen counts with tweets reporting symptoms of allergic rhinoconjunctivitis and names of antihistamine drugs
Pollen forecasts are in use everywhere to inform therapeutic decisions for patients with allergic rhinoconjunctivitis (ARC). We exploited data derived from Twitter in order to identify tweets reporting a combination of symptoms consistent with a case definition of ARC and those reporting the name of an antihistamine drug. In order to increase the sensitivity of the system, we applied an algorithm aimed at automatically identifying jargon expressions related to medical terms. We compared weekly Twitter trends with National Allergy Bureau weekly pollen counts derived from US stations, and found a high correlation of the sum of the total pollen counts from each stations with tweets reporting ARC symptoms (Pearson's correlation coefficient: 0.95) and with tweets reporting antihistamine drug names (Pearson's correlation coefficient: 0.93). Longitude and latitude of the pollen stations affected the strength of the correlation. Twitter and other social networks may play a role in allergic disease surveillance and in signaling drug consumptions trends
- …