Search CORE

27 research outputs found

Biased Embeddings from Wild Data:Measuring, Understanding and Removing

Author: Cristianini Nello
Lansdall-Welfare Thomas
Sutton Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2018
Field of study

Explore Bristol Research

Detecting Shifts in Public Opinion:A Big Data Study of Global News Content

Author: Cristianini Nello
Sudhahar Saatviga
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2018
Field of study

Explore Bristol Research

Fact Checking from Natural Text with Probabilistic Soft Logic

Author: D Cartwright
EB Wilson
N Hassan
S Sudhahar
WM Soon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2018
Field of study

Crossref

Explore Bristol Research

Co-occurrence patterns in diagnostic data

Author: Balcázar
Chengqi
Gansner
Goh
Hahsler
Hahsler
Leung
María
Tan
Zamora
Publication venue: 'Wiley'
Publication date: 12/04/2020
Field of study

We demonstrate how graph decomposition techniques can be employed for the visualization of hierarchical co-occurrence patterns between medical data items. Our research is based on Gaifman graphs (a mathematical concept introduced in Logic), on specific variants of this concept, and on existing graph decomposition notions, specifically, graph modules and the clan decomposition of so-called 2-structures. The construction of the Gaifman graphs from a dataset is based on co-occurrence, or lack of it, of items in the dataset. We may select a discretization on the edge labels to aim at one among several Gaifman graph variants. Then, the decomposition of the graph may provide us with visual information about the data co-occurrences, after which one can proceed to more traditional statistical analysis.Partially supported by European Research Council (ERC) under the European Union's Horizon2020 research and innovation programme, grant agreement ERC-2014-CoG 648276 (AUTAR);by grant TIN2017-89244-R from Ministerio de Economia, Industria y Competitividad, and byConacyt (México). We acknowledge unfunded recognition 2017SGR-856 (MACDA) from AGAUR(Generalitat de Catalunya).Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Non-empirical problems in fair machine learning

Author: Scantamburlo T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The problem of fair machine learning has drawn much attention over the last few years and the bulk of offered solutions are, in principle, empirical. However, algorithmic fairness also raises important conceptual issues that would fail to be addressed if one relies entirely on empirical considerations. Herein, I will argue that the current debate has developed an empirical framework that has brought important contributions to the development of algorithmic decision-making, such as new techniques to discover and prevent discrimination, additional assessment criteria, and analyses of the interaction between fairness and predictive accuracy. However, the same framework has also suggested higher-order issues regarding the translation of fairness into metrics and quantifiable trade-offs. Although the (empirical) tools which have been developed so far are essential to address discrimination encoded in data and algorithms, their integration into society elicits key (conceptual) questions such as: What kind of assumptions and decisions underlies the empirical framework? How do the results of the empirical approach penetrate public debate? What kind of reflection and deliberation should stakeholders have over available fairness metrics? I will outline the empirical approach to fair machine learning, i.e. how the problem is framed and addressed, and suggest that there are important non-empirical issues that should be tackled. While this work will focus on the problem of algorithmic fairness, the lesson can extend to other conceptual problems in the analysis of algorithmic decision-making such as privacy and explainability

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Advances in Intelligent Data Analysis XVII: 17th International Symposium, IDA 2018, ’s-Hertogenbosch, The Netherlands, October 24–26, 2018, Proceedings

Author: Lahti L.
Laitinen V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2022
Field of study

Longitudinal data is ubiquitous in research, and often complemented by broad collections of static background information. There is, however, a shortage of general-purpose statistical tools for studying the temporal dynamics of complex and stochastic dynamical systems especially when data is scarce, and the underlying mechanisms that generate the observation are poorly understood. Contemporary microbiome research provides a topical example, where vast cross-sectional and longitudinal collections of taxonomic profiling data from the human body and other environments are now being collected in various research laboratories world-wide. Many classical algorithms rely on long and densely sampled time series, whereas human microbiome studies typically have more limited sample sizes, short time spans, sparse sampling intervals, lack of replicates and high levels of unaccounted technical and biological variation. We demonstrate how non-parametric models can help to quantify key properties of a dynamical system when the actual data-generating mechanisms are largely unknown. Such properties include the locations of stable states, resilience of the system, and the levels of stochastic fluctuations. Moreover, we show how limited data availability can be compensated by pooling statistical evidence across multiple individuals or studies, and by incorporating prior information in the models. In particular, we derive and implement a hierarchical Bayesian variant of Ornstein-Uhlenbeck driven t-processes. This can be used to characterize universal dynamics in univariate, unimodal, and mean reversible systems based on multiple short time series. We validate the model with simulated data and investigate its applicability in characterizing temporal dynamics of human gut microbiome.</p

UTUPub

Advances in Intelligent Data Analysis XVII: 17th International Symposium, IDA 2018, ’s-Hertogenbosch, The Netherlands, October 24–26, 2018, Proceedings

Author: Lahti L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2022
Field of study

The increasing openness of data, methods, and collaboration networks has created new opportunities for research, citizen science, and industry. Whereas openly licensed scientific, governmental, and institutional data sets can now be accessed through programmatic interfaces, compressed archives, and downloadable spreadsheets, realizing the full potential of open data streams depends critically on the availability of targeted data analytical methods, and on user communities that can derive value from these digital resources. Interoperable software libraries have become a central element in modern statistical data analysis, bridging the gap between theory and practice, while open developer communities have emerged as a powerful driver of research software development. Drawing insights from a decade of community engagement, I propose the concept of open data science, which refers to the new forms of research enabled by open data, open methods, and open collaboration.</p

UTUPub

Advances in intelligent data analysis XVII:17th International Symposium, IDA 2018, ’s-Hertogenbosch, The Netherlands, October 24–26, 2018, Proceedings

Author
Publication venue: Springer
Publication date: 01/01/2018
Field of study

Pure OAI Repository

Advances in intelligent data analysis XVII:17th International Symposium, IDA 2018, ’s-Hertogenbosch, The Netherlands, October 24–26, 2018, Proceedings

Author
Publication venue: Springer
Publication date: 01/01/2018
Field of study

Pure OAI Repository

Survey on Sociodemographic Bias in Natural Language Processing

Author: Gupta Vipul
Passonneau Rebecca J.
Venkit Pranav Narayanan
Wilson Shomir
Publication venue
Publication date: 26/06/2023
Field of study

Deep neural networks often learn unintended biases during training, which might have harmful effects when deployed in real-world settings. This paper surveys 209 papers on bias in NLP models, most of which address sociodemographic bias. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world biases, and that current debiasing techniques are superficial and hide bias rather than removing it. Finally, we provide recommendations for future work.Comment: 23 pages, 1 figur

arXiv.org e-Print Archive