814 research outputs found
Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation
This thesis proposes a recalibration of Machine Learning for social biases to minimize harms from existing approaches and practices in the field. Prioritizing quality over quantity, accuracy over efficiency, representativeness over convenience, and situated thinking over universal thinking, the thesis demonstrates an alternative approach to creating Machine Learning models. Drawing on GLAM, the Humanities, the Social Sciences, and Design, the thesis focuses on understanding and communicating biases in a specific use case. 11,888 metadata descriptions from the University of Edinburgh Heritage Collections' Archives catalog were manually annotated for gender biases and text classification models were then trained on the resulting dataset of 55,260 annotations. Evaluations of the models' performance demonstrates that annotating gender biases can be automated; however, the subjectivity of bias as a concept complicates the generalizability of any one approach.
The contributions are: (1) an interdisciplinary and participatory Bias-Aware Methodology, (2) a Taxonomy of Gendered and Gender Biased Language, (3) data annotated for gender biased language, (4) gender biased text classification models, and (5) a human-centered approach to model evaluation. The contributions have implications for Machine Learning, demonstrating how bias is inherent to all data and models; more specifically for Natural Language Processing, providing an annotation taxonomy, annotated datasets and classification models for analyzing gender biased language at scale; for the Gallery, Library, Archives, and Museum sector, offering guidance to institutions seeking to reconcile with histories of marginalizing communities through their documentation practices; and for historians, who utilize cultural heritage documentation to study and interpret the past. Through a real-world application of the Bias-Aware Methodology in a case study, the thesis illustrates the need to shift away from removing social biases and towards acknowledging them, creating data and models that surface the uncertainty and multiplicity characteristic of human societies
The legibility of the imaged human brain
Our knowledge of the organisation of the human brain at the population-level
is yet to translate into power to predict functional differences at the
individual-level, limiting clinical applications, and casting doubt on the
generalisability of inferred mechanisms. It remains unknown whether the
difficulty arises from the absence of individuating biological patterns within
the brain, or from limited power to access them with the models and compute at
our disposal. Here we comprehensively investigate the resolvability of such
patterns with data and compute at unprecedented scale. Across 23810 unique
participants from UK Biobank, we systematically evaluate the predictability of
25 individual biological characteristics, from all available combinations of
structural and functional neuroimaging data. Over 4526 GPU*hours of
computation, we train, optimize, and evaluate out-of-sample 700 individual
predictive models, including multilayer perceptrons of demographic,
psychological, serological, chronic morbidity, and functional connectivity
characteristics, and both uni- and multi-modal 3D convolutional neural network
models of macro- and micro-structural brain imaging. We find a marked
discrepancy between the high predictability of sex (balanced accuracy 99.7%),
age (mean absolute error 2.048 years, R2 0.859), and weight (mean absolute
error 2.609Kg, R2 0.625), for which we set new state-of-the-art performance,
and the surprisingly low predictability of other characteristics. Neither
structural nor functional imaging predicted individual psychology better than
the coincidence of common chronic morbidity (p<0.05). Serology predicted common
morbidity (p<0.05) and was best predicted by it (p<0.001), followed by
structural neuroimaging (p<0.05). Our findings suggest either more informative
imaging or more powerful models will be needed to decipher individual level
characteristics from the brain.Comment: 36 pages, 6 figures, 1 table, 2 supplementary figure
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Recommended from our members
Role of Formal and Informal Institutions in Advancing Sustainable Environmental Practices in SMEs of Pakistan's Textile Sector
Economies around the globe have established formal institutions to protect their natural environments (Klewitz et al., 2012, Wahga et al., 2018b), but parallel to them are 'proto-institutions' that also make an important contribution towards sustainable development. A proto-institution, an institution in the making, comprises rules, practices, and technologies that are partially diffused and weakly entrenched but poised to become widely institutionalised (Lawrence et al., 2002, p. 283). This qualitative study examines how proto-institutions in Pakistan's textile sector emerged and played a role in promoting sustainable environmental practices. Stakeholder Theory and Institutional Theory were combined to guide data collection and analysis. Primary data were collected through in-depth interviews, field observations and a field journal, whereas secondary data came from archival records and industry-specific publications. NVIVO 12 was used to sort and prepare data for analysis. Grounded analysis (Gioia et al., 2013, Easterby-Smith et al., 2015) revealed that institutional voids (Mair and Marti, 2009) and institutional gaps (Kolk, 2014) impeded the ability of formal institutions to assist the textile sector and ensure compliance with the established Punjab Environmental Quality Standards (PEQS). Due to these voids and gaps, textile manufacturers and stakeholders collaborated in various ways, resulting in the emergence of proto-institutions. These proto-institutions address the 'knowledge gap' by conducting informative seminars, capacity building workshops, and the production of best practice manuals. They bridge the 'cleaner production gap' by devolving internationally tested cleaner production solutions and assisting with their implementation. In addition, they take steps to close the 'compliance gap' by building the capacity of firms and public institutions. They fill the 'R&D gap' through commercial research into inputs, processes, and product development. They also provide firms with financial assistance through matching grants that help firms overcome their 'financial assistance gap' and acquire international certifications for market entry into global markets and undertake business development services. In doing so, these proto-institutions imposed iii normative and mimetic pressure on firms to adopt green practices while coexisting with formal institutions as compensatory institutions to create environmentally compliant isomorphs (firms). These findings add to the insights about institutional work processes and roles of proto-institutions, by presenting evidence from a previously under-research context: promoting sustainability in a SMEs dominated manufacturing sector of a developing country. In terms of practice, these findings are helpful information for textile manufacturers who are yet unknown to the benefits they could reap by adopting sustainable practices and processes in their manufacturing concerns. The information about collaboration is helpful for stakeholders looking to form new partnerships for responsible production. This study also suggests policymakers to both encourage and collaborate with proto-institutions to accomplish national and international commitments such as SDG 12 - Sustainable Consumption and Production, and race to net zero in textiles. Furthermore, the context specific factors that are affecting the emergence and development of proto-institutions in Pakistan’s textile sector could also help policymakers in Pakistan and alike developing countries to overcome institutional gaps and voids in their formal institutional arrangements and better promote sustainable production in their key manufacturing sectors
Applying machine learning: a multi-role perspective
Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference
Development of an R package to learn supervised classification techniques
This TFG aims to develop a custom R package for teaching supervised classification algorithms, starting
with the identification of requirements, including algorithms, data structures, and libraries. A strong
theoretical foundation is essential for effective package design. Documentation will explain each function’s
purpose, accompanied by necessary paperwork.
The package will include R scripts and data files in organized directories, complemented by a user
manual for easy installation and usage, even for beginners. Built entirely from scratch without external
dependencies, it’s optimized for accuracy and performance.
In conclusion, this TFG provides a roadmap for creating an R package to teach supervised classification
algorithms, benefiting researchers and practitioners dealing with real-world challenges.Grado en IngenierÃa Informátic
Electron Thermal Runaway in Atmospheric Electrified Gases: a microscopic approach
Thesis elaborated from 2018 to 2023 at the Instituto de AstrofÃsica de AndalucÃa under the supervision of Alejandro Luque (Granada, Spain) and Nikolai Lehtinen (Bergen, Norway). This thesis presents a new database of atmospheric electron-molecule collision cross sections which was published separately under the DOI :
With this new database and a new super-electron management algorithm which significantly enhances high-energy electron statistics at previously unresolved ratios, the thesis explores general facets of the electron thermal runaway process relevant to atmospheric discharges under various conditions of the temperature and gas composition as can be encountered in the wake and formation of discharge channels
Testing novel facial recognition technology to identify dogs during vaccination campaigns
A lack of methods to identify individual animals can be a barrier to zoonoses control. We developed and field-tested facial recognition technology for a mobile phone application to identify dogs, which we used to assess vaccination coverage against rabies in rural Tanzania. Dogs were vaccinated, registered using the application, and microchipped. During subsequent household visits to validate vaccination, dogs were registered using the application and their vaccination status determined by operators using the application to classify dogs as vaccinated (matched) or unvaccinated (unmatched), with microchips validating classifications. From 534 classified dogs (251 vaccinated, 283 unvaccinated), the application specificity was 98.9% and sensitivity 76.2%, with positive and negative predictive values of 98.4% and 82.8% respectively. The facial recognition algorithm correctly matched 249 (99.2%) vaccinated and microchipped dogs (true positives) and failed to match two (0.8%) vaccinated dogs (false negatives). Operators correctly identified 186 (74.1%) vaccinated dogs (true positives), and 280 (98.9%) unvaccinated dogs (true negatives), but incorrectly classified 58 (23.1%) vaccinated dogs as unmatched (false negatives). Reduced application sensitivity resulted from poor quality photos and light-associated color distortion. With development and operator training, this technology has potential to be a useful tool to identify dogs and support research and intervention programs
Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology
The great behavioral heterogeneity observed between individuals with the same
psychiatric disorder and even within one individual over time complicates both
clinical practice and biomedical research. However, modern technologies are an
exciting opportunity to improve behavioral characterization. Existing
psychiatry methods that are qualitative or unscalable, such as patient surveys
or clinical interviews, can now be collected at a greater capacity and analyzed
to produce new quantitative measures. Furthermore, recent capabilities for
continuous collection of passive sensor streams, such as phone GPS or
smartwatch accelerometer, open avenues of novel questioning that were
previously entirely unrealistic. Their temporally dense nature enables a
cohesive study of real-time neural and behavioral signals.
To develop comprehensive neurobiological models of psychiatric disease, it
will be critical to first develop strong methods for behavioral quantification.
There is huge potential in what can theoretically be captured by current
technologies, but this in itself presents a large computational challenge --
one that will necessitate new data processing tools, new machine learning
techniques, and ultimately a shift in how interdisciplinary work is conducted.
In my thesis, I detail research projects that take different perspectives on
digital psychiatry, subsequently tying ideas together with a concluding
discussion on the future of the field. I also provide software infrastructure
where relevant, with extensive documentation.
Major contributions include scientific arguments and proof of concept results
for daily free-form audio journals as an underappreciated psychiatry research
datatype, as well as novel stability theorems and pilot empirical success for a
proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop
- …