6,329 research outputs found
Information actors beyond modernity and coloniality in times of climate change:A comparative design ethnography on the making of monitors for sustainable futures in Curaçao and Amsterdam, between 2019-2022
In his dissertation, Mr. Goilo developed a cutting-edge theoretical framework for an Anthropology of Information. This study compares information in the context of modernity in Amsterdam and coloniality in Curaçao through the making process of monitors and develops five ways to understand how information can act towards sustainable futures. The research also discusses how the two contexts, that is modernity and coloniality, have been in informational symbiosis for centuries which is producing negative informational side effects within the age of the Anthropocene. By exploring the modernity-coloniality symbiosis of information, the author explains how scholars, policymakers, and data-analysts can act through historical and structural roots of contemporary global inequities related to the production and distribution of information. Ultimately, the five theses propose conditions towards the collective production of knowledge towards a more sustainable planet
The development of bioinformatics workflows to explore single-cell multi-omics data from T and B lymphocytes
The adaptive immune response is responsible for recognising, containing and eliminating viral infection, and protecting from further reinfection. This antigen-specific response is driven by T and B cells, which recognise antigenic epitopes via highly specific heterodimeric surface receptors, termed T-cell receptors (TCRs) and B cell receptors (BCRs). The theoretical diversity of the receptor repertoire that can be generated via homologous recombination of V, D and J genes is large enough (>1015 unique sequences) that virtually any antigen can be recognised. However, only a subset of these are generated within the human body, and how they succeed in specifically recognising any pathogen(s) and distinguishing these from self-proteins remains largely unresolved.
The recent advances in applying single-cell genomics technologies to simultaneously measure the clonality, surface phenotype and transcriptomic signature of pathogen- specific immune cells have significantly improved understanding of these questions. Single-cell multi-omics permits the accurate identification of clonally expanded populations, their differentiation trajectories, the level of immune receptor repertoire diversity involved in the response and the phenotypic and molecular heterogeneity.
This thesis aims to develop a bioinformatic workflow utilising single-cell multi-omics data to explore, quantify and predict the clonal and transcriptomic signatures of the human T-cell response during and following viral infection. In the first aim, a web application, VDJView, was developed to facilitate the simultaneous analysis and visualisation of clonal, transcriptomic and clinical metadata of T and B cell multi-omics data. The application permits non-bioinformaticians to perform quality control and common analyses of single-cell genomics data integrated with other metadata, thus permitting the identification of biologically and clinically relevant parameters. The second aim pertains to analysing the functional, molecular and immune receptor profiles of CD8+ T cells in the acute phase of primary hepatitis C virus (HCV) infection. This analysis identified a novel population of progenitors of exhausted T cells, and lineage tracing revealed distinct trajectories with multiple fates and evolutionary plasticity. Furthermore, it was observed that high-magnitude IFN-Îł CD8+ T-cell response is associated with the increased probability of viral escape and chronic infection. Finally, in the third aim, a novel analysis is presented based on the topological characteristics of a network generated on pathogen-specific, paired-chain, CD8+ TCRs. This analysis revealed how some cross-reactivity between TCRs can be explained via the sequence similarity between TCRs and that this property is not uniformly distributed across all pathogen-specific TCR repertoires. Strong correlations between the topological properties of the network and the biological properties of the TCR sequences were identified and highlighted.
The suite of workflows and methods presented in this thesis are designed to be adaptable to various T and B cell multi-omic datasets. The associated analyses contribute to understanding the role of T and B cells in the adaptive immune response to viral-infection and cancer
Machine learning applications in search algorithms for gravitational waves from compact binary mergers
Gravitational waves from compact binary mergers are now routinely observed by Earth-bound detectors. These observations enable exciting new science, as they have opened a new window to the Universe.
However, extracting gravitational-wave signals from the noisy detector data is a challenging problem. The most sensitive search algorithms for compact binary mergers use matched filtering, an algorithm that compares the data with a set of expected template signals. As detectors are upgraded and more sophisticated signal models become available, the number of required templates will increase, which can make some sources computationally prohibitive to search for. The computational cost is of particular concern when low-latency alerts should be issued to maximize the time for electromagnetic follow-up observations. One potential solution to reduce computational requirements that has started to be explored in the last decade is machine learning. However, different proposed deep learning searches target varying parameter spaces and use metrics that are not always comparable to existing literature. Consequently, a clear picture of the capabilities of machine learning searches has been sorely missing.
In this thesis, we closely examine the sensitivity of various deep learning gravitational-wave search algorithms and introduce new methods to detect signals from binary black hole and binary neutron star mergers at previously untested statistical confidence levels. By using the sensitive distance as our core metric, we allow for a direct comparison of our algorithms to state-of-the-art search pipelines. As part of this thesis, we organized a global mock data challenge to create a benchmark for machine learning search algorithms targeting compact binaries. This way, the tools developed in this thesis are made available to the greater community by publishing them as open source software.
Our studies show that, depending on the parameter space, deep learning gravitational-wave search algorithms are already competitive with current production search pipelines. We also find that strategies developed for traditional searches can be effectively adapted to their machine learning counterparts. In regions where matched filtering becomes computationally expensive, available deep learning algorithms are also limited in their capability. We find reduced sensitivity to long duration signals compared to the excellent results for short-duration binary black hole signals
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative â short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as âgreat dead on arrival battery.â We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules â adapters â that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as âprice,â âbattery life,â and
âresolution.â This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an âexpensiveâ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
Software Design Change Artifacts Generation through Software Architectural Change Detection and Categorisation
Software is solely designed, implemented, tested, and inspected by expert people, unlike other engineering projects where they are mostly implemented by workers (non-experts) after designing by engineers. Researchers and practitioners have linked software bugs, security holes, problematic integration of changes, complex-to-understand codebase, unwarranted mental pressure, and so on in software development and maintenance to inconsistent and complex design and a lack of ways to easily understand what is going on and what to plan in a software system. The unavailability of proper information and insights needed by the development teams to make good decisions makes these challenges worse. Therefore, software design documents and other insightful information extraction are essential to reduce the above mentioned anomalies. Moreover, architectural design artifacts extraction is required to create the developerâs profile to be available to the market for many crucial scenarios. To that end, architectural change detection, categorization, and change description generation are crucial because they are the primary artifacts to trace other software artifacts.
However, it is not feasible for humans to analyze all the changes for a single release for detecting change and impact because it is time-consuming, laborious, costly, and inconsistent. In this thesis, we conduct six studies considering the mentioned challenges to automate the architectural change information extraction and document generation that could potentially assist the development and maintenance teams. In particular, (1) we detect architectural changes using lightweight techniques leveraging textual and codebase properties, (2) categorize them considering intelligent perspectives, and (3) generate design change documents by exploiting precise contexts of componentsâ relations and change purposes which were previously unexplored. Our experiment using 4000+ architectural change samples and 200+ design change documents suggests that our proposed approaches are promising in accuracy and scalability to deploy frequently. Our proposed change detection approach can detect up to 100% of the architectural change instances (and is very scalable). On the other hand, our proposed change classifierâs F1 score is 70%, which is promising given the challenges. Finally, our proposed system can produce descriptive design change artifacts with 75% significance. Since most of our studies are foundational, our approaches and prepared datasets can be used as baselines for advancing research in design change information extraction and documentation
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
GeschĂ€tzt mehr als 6.000 Erkrankungen werden durch VerĂ€nderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das SpleiĂen der mRNA in eine andere Isoform begĂŒnstigen. All diese Prozesse mĂŒssen ĂŒberprĂŒft werden, um die zum beschriebenen PhĂ€notyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer PathogenitĂ€t.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier prĂ€sentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst auĂerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells fĂŒr das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. AuĂerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-SpleiĂing verbessern. AuĂerdem werden Varianteneffektmodelle aufgrund eines neuen, auf AllelhĂ€ufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfĂŒgbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
Fuzzy Natural Logic in IFSA-EUSFLAT 2021
The present book contains five papers accepted and published in the Special Issue, âFuzzy Natural Logic in IFSA-EUSFLAT 2021â, of the journal Mathematics (MDPI). These papers are extended versions of the contributions presented in the conference âThe 19th World Congress of the International Fuzzy Systems Association and the 12th Conference of the European Society for Fuzzy Logic and Technology jointly with the AGOP, IJCRS, and FQAS conferencesâ, which took place in Bratislava (Slovakia) from September 19 to September 24, 2021. Fuzzy Natural Logic (FNL) is a system of mathematical fuzzy logic theories that enables us to model natural language terms and rules while accounting for their inherent vagueness and allows us to reason and argue using the tools developed in them. FNL includes, among others, the theory of evaluative linguistic expressions (e.g., small, very large, etc.), the theory of fuzzy and intermediate quantifiers (e.g., most, few, many, etc.), and the theory of fuzzy/linguistic IFâTHEN rules and logical inference. The papers in this Special Issue use the various aspects and concepts of FNL mentioned above and apply them to a wide range of problems both theoretically and practically oriented. This book will be of interest for researchers working in the areas of fuzzy logic, applied linguistics, generalized quantifiers, and their applications
Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring
Artificially intelligent perception is increasingly present in the lives of
every one of us. Vehicles are no exception, (...) In the near future, pattern
recognition will have an even stronger role in vehicles, as self-driving cars
will require automated ways to understand what is happening around (and within)
them and act accordingly. (...) This doctoral work focused on advancing
in-vehicle sensing through the research of novel computer vision and pattern
recognition methodologies for both biometrics and wellbeing monitoring. The
main focus has been on electrocardiogram (ECG) biometrics, a trait well-known
for its potential for seamless driver monitoring. Major efforts were devoted to
achieving improved performance in identification and identity verification in
off-the-person scenarios, well-known for increased noise and variability. Here,
end-to-end deep learning ECG biometric solutions were proposed and important
topics were addressed such as cross-database and long-term performance,
waveform relevance through explainability, and interlead conversion. Face
biometrics, a natural complement to the ECG in seamless unconstrained
scenarios, was also studied in this work. The open challenges of masked face
recognition and interpretability in biometrics were tackled in an effort to
evolve towards algorithms that are more transparent, trustworthy, and robust to
significant occlusions. Within the topic of wellbeing monitoring, improved
solutions to multimodal emotion recognition in groups of people and
activity/violence recognition in in-vehicle scenarios were proposed. At last,
we also proposed a novel way to learn template security within end-to-end
models, dismissing additional separate encryption processes, and a
self-supervised learning approach tailored to sequential data, in order to
ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022
to the University of Port
Systemic Circular Economy Solutions for Fiber Reinforced Composites
This open access book provides an overview of the work undertaken within the FiberEUse project, which developed solutions enhancing the profitability of composite recycling and reuse in value-added products, with a cross-sectorial approach. Glass and carbon fiber reinforced polymers, or composites, are increasingly used as structural materials in many manufacturing sectors like transport, constructions and energy due to their better lightweight and corrosion resistance compared to metals. However, composite recycling is still a challenge since no significant added value in the recycling and reprocessing of composites is demonstrated. FiberEUse developed innovative solutions and business models towards sustainable Circular Economy solutions for post-use composite-made products. Three strategies are presented, namely mechanical recycling of short fibers, thermal recycling of long fibers and modular car parts design for sustainable disassembly and remanufacturing. The validation of the FiberEUse approach within eight industrial demonstrators shows the potentials towards new Circular Economy value-chains for composite materials
- âŠ