21 research outputs found
Computationally Linking Chemical Exposure to Molecular Effects with Complex Data: Comparing Methods to Disentangle Chemical Drivers in Environmental Mixtures and Knowledge-based Deep Learning for Predictions in Environmental Toxicology
Chemical exposures affect the environment and may lead to adverse outcomes in its organisms. Omics-based approaches, like standardised microarray experiments, have expanded the toolbox to monitor the distribution of chemicals and assess the risk to organisms in the environment. The resulting complex data have extended the scope of toxicological knowledge bases and published literature. A plethora of computational approaches have been applied in environmental toxicology considering systems biology and data integration. Still, the complexity of environmental and biological systems given in data challenges investigations of exposure-related effects. This thesis aimed at computationally linking chemical exposure to biological effects on the molecular level considering sources of complex environmental data.
The first study employed data of an omics-based exposure study considering mixture effects in a freshwater environment. We compared three data-driven analyses in their suitability to disentangle mixture effects of chemical exposures to biological effects and their reliability in attributing potentially adverse outcomes to chemical drivers with toxicological databases on gene and pathway levels. Differential gene expression analysis and a network inference approach resulted in toxicologically meaningful outcomes and uncovered individual chemical effects — stand-alone and in combination. We developed an integrative computational strategy to harvest exposure-related gene associations from environmental samples considering mixtures of lowly concentrated compounds. The applied approaches allowed assessing the hazard of chemicals more systematically with correlation-based compound groups.
This dissertation presents another achievement toward a data-driven hypothesis generation for molecular exposure effects. The approach combined text-mining and deep learning. The study was entirely data-driven and involved state-of-the-art computational methods of artificial intelligence. We employed literature-based relational data and curated toxicological knowledge to predict chemical-biomolecule interactions. A word embedding neural network with a subsequent feed-forward network was implemented. Data augmentation and recurrent neural networks were beneficial for training with curated toxicological knowledge. The trained models reached accuracies of up to 94% for unseen test data of the employed knowledge base.
However, we could not reliably confirm known chemical-gene interactions across selected data sources. Still, the predictive models might derive unknown information from toxicological knowledge sources, like literature, databases or omics-based exposure studies. Thus, the deep learning models might allow predicting hypotheses of exposure-related molecular effects.
Both achievements of this dissertation might support the prioritisation of chemicals for testing and an intelligent selection of chemicals for monitoring in future exposure studies.:Table of Contents ... I
Abstract ... V
Acknowledgements ... VII
Prelude ... IX
1 Introduction
1.1 An overview of environmental toxicology ... 2
1.1.1 Environmental toxicology ... 2
1.1.2 Chemicals in the environment ... 4
1.1.3 Systems biological perspectives in environmental toxicology ... 7
Computational toxicology ... 11
1.2.1 Omics-based approaches ... 12
1.2.2 Linking chemical exposure to transcriptional effects ... 14
1.2.3 Up-scaling from the gene level to higher biological organisation levels ... 19
1.2.4 Biomedical literature-based discovery ... 24
1.2.5 Deep learning with knowledge representation ... 27
1.3 Research question and approaches ... 29
2 Methods and Data ... 33
2.1 Linking environmental relevant mixture exposures to transcriptional effects ... 34
2.1.1 Exposure and microarray data ... 34
2.1.2 Preprocessing ... 35
2.1.3 Differential gene expression ... 37
2.1.4 Association rule mining ... 38
2.1.5 Weighted gene correlation network analysis ... 39
2.1.6 Method comparison ... 41
Predicting exposure-related effects on a molecular level ... 44
2.2.1 Input ... 44
2.2.2 Input preparation ... 47
2.2.3 Deep learning models ... 49
2.2.4 Toxicogenomic application ... 54
3 Method comparison to link complex stream water exposures to effects on
the transcriptional level ... 57
3.1 Background and motivation ... 58
3.1.1 Workflow ... 61
3.2 Results ... 62
3.2.1 Data preprocessing ... 62
3.2.2 Differential gene expression analysis ... 67
3.2.3 Association rule mining ... 71
3.2.4 Network inference ... 78
3.2.5 Method comparison ... 84
3.2.6 Application case of method integration ... 87
3.3 Discussion ... 91
3.4 Conclusion ... 99
4 Deep learning prediction of chemical-biomolecule interactions ... 101
4.1 Motivation ... 102
4.1.1Workflow ...105
4.2 Results ... 107
4.2.1 Input preparation ... 107
4.2.2 Model selection ... 110
4.2.3 Model comparison ... 118
4.2.4 Toxicogenomic application ... 121
4.2.5 Horizontal augmentation without tail-padding ...123
4.2.6 Four-class problem formulation ... 124
4.2.7 Training with CTD data ... 125
4.3 Discussion ... 129
4.3.1 Transferring biomedical knowledge towards toxicology ... 129
4.3.2 Deep learning with biomedical knowledge representation ...133
4.3.3 Data integration ...136
4.4 Conclusion ... 141
5 Conclusion and Future perspectives ... 143
5.1 Conclusion ... 143
5.1.1 Investigating complex mixtures in the environment ... 144
5.1.2 Complex knowledge from literature and curated databases predict chemical-
biomolecule interactions ... 145
5.1.3 Linking chemical exposure to biological effects by integrating CTD ... 146
5.2 Future perspectives ... 147
S1 Supplement Chapter 1 ... 153
S1.1 Example of an estrogen bioassay ... 154
S1.2 Types of mode of action ... 154
S1.3 The dogma of molecular biology ... 157
S1.4 Transcriptomics ... 159
S2 Supplement Chapter 3 ... 161
S3 Supplement Chapter 4 ... 175
S3.1 Hyperparameter tuning results ... 176
S3.2 Functional enrichment with predicted chemical-gene interactions and CTD reference pathway genesets ... 179
S3.3 Reduction of learning rate in a model with large word embedding vectors ... 183
S3.4 Horizontal augmentation without tail-padding ... 183
S3.5 Four-relationship classification ... 185
S3.6 Interpreting loss observations for SemMedDB trained models ... 187
List of Abbreviations ... i
List of Figures ... vi
List of Tables ... x
Bibliography ... xii
Curriculum scientiae ... xxxix
Selbständigkeitserklärung ... xlii
WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM
Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
Artificial Intelligence in Oncology Drug Discovery and Development
There exists a profound conflict at the heart of oncology drug development. The efficiency of the drug development process is falling, leading to higher costs per approved drug, at the same time personalised medicine is limiting the target market of each new medicine. Even as the global economic burden of cancer increases, the current paradigm in drug development is unsustainable. In this book, we discuss the development of techniques in machine learning for improving the efficiency of oncology drug development and delivering cost-effective precision treatment. We consider how to structure data for drug repurposing and target identification, how to improve clinical trials and how patients may view artificial intelligence
Antennas and Electromagnetics Research via Natural Language Processing.
Advanced techniques for performing natural language processing (NLP) are being utilised to devise a pioneering methodology for collecting and analysing data derived from scientific literature. Despite significant advancements in automated database generation and analysis within the domains of material chemistry and physics, the implementation of NLP techniques in the realms of metamaterial discovery, antenna design, and wireless communications remains at its early stages. This thesis proposes several novel approaches to advance research in material science. Firstly, an NLP method has been developed to automatically extract keywords from large-scale unstructured texts in the area of metamaterial research. This enables the uncovering of trends and relationships between keywords, facilitating the establishment of future research directions. Additionally, a trained neural network model based on the encoder-decoder Long Short-Term Memory (LSTM) architecture has been developed to predict future research directions and provide insights into the influence of metamaterials research. This model lays the groundwork for developing a research roadmap of metamaterials. Furthermore, a novel weighting system has been designed to evaluate article attributes in antenna and propagation research, enabling more accurate assessments of impact of each scientific publication. This approach goes beyond conventional numeric metrics to produce more meaningful predictions. Secondly, a framework has been proposed to leverage text summarisation, one of the primary NLP tasks, to enhance the quality of scientific reviews. It has been applied to review recent development of antennas and propagation for body-centric wireless communications, and the validation has been made available for comparison with well-referenced datasets for text summarisation. Lastly, the effectiveness of automated database building in the domain of tunable materials and their properties has been presented. The collected database will use as an input for training a surrogate machine learning model in an iterative active learning cycle. This model will be utilised to facilitate high-throughput material processing, with the ultimate goal of discovering novel materials exhibiting high tunability. The approaches proposed in this thesis will help to accelerate the discovery of new materials and enhance their applications in antennas, which has the potential to transform electromagnetic material research
The Palgrave Handbook of Digital Russia Studies
This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today
Recommended from our members
Digital phenotyping through multimodal, unobtrusive sensing
The growing adoption of multimodal wearable and mobile devices, such as smartphones and wrist-worn watches has generated an increase in the collection of physiological and behavioural data at scale. This digital phenotyping data enables researchers to make inferences regarding users’ physical and mental health at scale, for the first time. However, translating this data into actionable insights requires computational approaches that turn unlabelled, multimodal time-series sensor data into validated measures that can be interpreted at scale.
This thesis describes the derivation of novel computational methods that leverage digital phenotyping data from wearable devices in large-scale populations to infer physical behaviours. These methods combine insights from signal processing, data mining and machine learning alongside domain knowledge in physical activity and sleep epidemiology. First, the inference of sleeping windows in free-living conditions through a heart rate sensing approach is explored. This algorithm is particularly valuable in the absence of ground truth or sleep diaries given its simplicity, adaptability and capacity for personalization. I then explore multistage sleep classification through combined movement and cardiac wearable sensing and machine learning. Further, I demonstrate that postural changes detected through wrist accelerometers can inform habitual behaviours and are valuable complements to traditional, intensity-based physical activity metrics. I then leverage the concomitant responses of heart rate to physical activity that can be captured through multimodal wearable sensors through a self-supervised training task. The resulting embeddings from this task are shown to be useful for the downstream classification of demographic factors, BMI, energy expenditure and cardiorespiratory fitness. Finally, I describe a deep learning model for the adaptive inference of cardiorespiratory fitness (VO2max) using wearable data in free living conditions. I demonstrate the robustness of the model in a large UK population and show the models’ adaptability by evaluating its performance in a subset of the population with repeated measures ~6 years after the original recordings.
Together, this work increases the potential of multimodal wearable and mobile sensors for physical activity and behavioural inferences in population studies. In particular, this thesis showcases the potential of using wearable devices to make valuable physical activity, sleep and fitness inferences in large cohort studies. Given the nature of the data collected and the fact that most of this data is currently generated by commercial providers and not research institutes, laying the foundations for responsible data governance and ethical use of these technologies will be critical to building trust and enabling the development of the field of digital phenotyping.I was funded by GlaxoSmithKline and the Engineering and Physical Sciences Research Council. I was also supported by the Alan Turing Institute through their Enrichment Scheme