8,164 research outputs found
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
GeschĂ€tzt mehr als 6.000 Erkrankungen werden durch VerĂ€nderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das SpleiĂen der mRNA in eine andere Isoform begĂŒnstigen. All diese Prozesse mĂŒssen ĂŒberprĂŒft werden, um die zum beschriebenen PhĂ€notyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer PathogenitĂ€t.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier prĂ€sentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst auĂerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells fĂŒr das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. AuĂerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-SpleiĂing verbessern. AuĂerdem werden Varianteneffektmodelle aufgrund eines neuen, auf AllelhĂ€ufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfĂŒgbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
Online Self-Supervised Thermal Water Segmentation for Aerial Vehicles
We present a new method to adapt an RGB-trained water segmentation network to
target-domain aerial thermal imagery using online self-supervision by
leveraging texture and motion cues as supervisory signals. This new thermal
capability enables current autonomous aerial robots operating in near-shore
environments to perform tasks such as visual navigation, bathymetry, and flow
tracking at night. Our method overcomes the problem of scarce and
difficult-to-obtain near-shore thermal data that prevents the application of
conventional supervised and unsupervised methods. In this work, we curate the
first aerial thermal near-shore dataset, show that our approach outperforms
fully-supervised segmentation models trained on limited target-domain thermal
data, and demonstrate real-time capabilities onboard an Nvidia Jetson embedded
computing platform. Code and datasets used in this work will be available at:
https://github.com/connorlee77/uav-thermal-water-segmentation.Comment: 8 pages, 4 figures, 3 table
Sourcing high tissue quality brains from deceased wild primates with known socioâecology
The selection pressures that drove dramatic encephalisation processes through the mammal lineage remain elusive, as does knowledge of brain structure reorganisation through this process. In particular, considerable structural brain changes are present across the primate lineage, culminating in the complex human brain that allows for unique behaviours such as language and sophisticated tool use. To understand this evolution, a diverse sample set of humans' closest relatives with varying socio-ecologies is needed. However, current brain banks predominantly curate brains from primates that died in zoological gardens. We try to address this gap by establishing a field pipeline mitigating the challenges associated with brain extractions of wild primates in their natural habitat. The success of our approach is demonstrated by our ability to acquire a novel brain sample of deceased primates with highly variable socio-ecological exposure and a particular focus on wild chimpanzees. Methods in acquiring brain tissue from wild settings are comprehensively explained, highlighting the feasibility of conducting brain extraction procedures under strict biosafety measures by trained veterinarians in field sites. Brains are assessed at a fine-structural level via high-resolution MRI and state-of-the-art histology. Analyses confirm that excellent tissue quality of primate brains sourced in the field can be achieved with a comparable tissue quality of brains acquired from zoo-living primates. Our field methods are noninvasive, here defined as not harming living animals, and may be applied to other mammal systems than primates. In sum, the field protocol and methodological pipeline validated here pose a major advance for assessing the influence of socio-ecology on medium to large mammal brains, at both macro- and microstructural levels as well as aiding with the functional annotation of brain regions and neuronal pathways via specific behaviour assessments.Output Status: Forthcoming/Available Online Additional authors: Richard McElreath, Alfred Anwander, Philipp Gunz, Markus Morawski, Angela D. Friederici, Nikolaus Weiskopf, Fabian H. Leendertz, Roman M. Wittig EBC Cosortium: Karoline Albig, Bala Amarasekaran, Sam Angedakin, Alfred Anwander, Daniel Aschoff, Caroline Asiimwe, Laurent Bailanda, Jacinta C. Beehner, Raphael Belais, Thore J. Bergman, Birgit Blazey, Andreas Bernhard, Christian Bock, PĂ©nĂ©lope Carlier, Julian Chantrey, Catherine Crockford, Tobias Deschner, Ariane DĂŒx1, Luke Edwards, Cornelius Eichner, GĂ©raldine Escoubas2, Malak Ettaj, Karina Flores, Richard Francke, Angela D. Friederici, CĂ©dric Girard-Buttoz, Jorge Gomez Fortun, Zoro Bertin GoneBi, Tobias GrĂ€Ăle, Eva Gruber-Dujardin, Philipp Gunz, Jess Hartel, Daniel B. M. Haun, Michael Henshall, Catherine Hobaiter, NoĂ©mie Hofman, Jenny E. Jaffe, Carsten JĂ€ger, Anna Jauch, Stomy Kahemere, Evgeniya Kirilina, Robert Klopfleisch, Tobias Knauf-Witzens, Kathrin S. Kopp, Guy Landry Mamboundou Kouima, Bastian Lange, Kevin Langergraber, Arne Lawrenz, Fabian H. Leendertz, Ilona Lipp, Matys Liptovszky, Tobias Loubser Theron, Christelle Patricia Lumbu, Patrice Makouloutou Nzassi, Kerstin MĂ€tz-Rensing, Richard McElreath, Matthew McLennan, Zoltan Mezö, Sophie Moittie, Torsten MĂžller, Markus Morawski, David Morgan, Timothy Mugabe, Martin Muller, Matthias MĂŒller, Inoussa Njumboket, Karin Olofsson-Sannö, Alain Ondzie, Emily Otali, Michael Paquette, Simone Pika, Kerrin Pine, Andrea Pizarro, Kamilla PlĂ©h, Jessica Rendel, Sandra Reichler-Danielowski, Martha M. Robbins, Alejandra Romero Forero, Konstantin Ruske, Liran Samuni, Crickette Sanz, AndrĂ© SchĂŒle, Ingo Schwabe, Katarina Schwalm, Sheri Speede, Lara Southern, Jonas Steiner, Marc Stidworthy, Martin Surbeck, Claudia Szentiks, Tanguy Tanga, Reiner Ulrich, Steve Unwin, Erica van de Waal, Sue Walker, Nikolaus Weiskopf, Gudrun Wibbelt, Roman M. Wittig, Kim Wood, Klaus ZuberbĂŒhle
Colour technologies for content production and distribution of broadcast content
The requirement of colour reproduction has long been a priority driving the development of new colour imaging systems that maximise human perceptual plausibility. This thesis explores machine learning algorithms for colour processing to assist both content production and distribution. First, this research studies colourisation technologies with practical use cases in restoration and processing of archived content. The research targets practical deployable solutions, developing a cost-effective pipeline which integrates the activity of the producer into the processing workflow. In particular, a fully automatic image colourisation paradigm using Conditional GANs is proposed to improve content generalisation and colourfulness of existing baselines. Moreover, a more conservative solution is considered by providing references to guide the system towards more accurate colour predictions. A fast-end-to-end architecture is proposed to improve existing exemplar-based image colourisation methods while decreasing the complexity and runtime. Finally, the proposed image-based methods are integrated into a video colourisation pipeline. A general framework is proposed to reduce the generation of temporal flickering or propagation of errors when such methods are applied frame-to-frame. The proposed model is jointly trained to stabilise the input video and to cluster their frames with the aim of learning scene-specific modes. Second, this research explored colour processing technologies for content distribution with the aim to effectively deliver the processed content to the broad audience. In particular, video compression is tackled by introducing a novel methodology for chroma intra prediction based on attention models. Although the proposed architecture helped to gain control over the reference samples and better understand the prediction process, the complexity of the underlying neural network significantly increased the encoding and decoding time. Therefore, aiming at efficient deployment within the latest video coding standards, this work also focused on the simplification of the proposed architecture to obtain a more compact and explainable model
Instance-based Learning with Prototype Reduction for Real-Time Proportional Myocontrol: A Randomized User Study Demonstrating Accuracy-preserving Data Reduction for Prosthetic Embedded Systems
This work presents the design, implementation and validation of learning
techniques based on the kNN scheme for gesture detection in prosthetic control.
To cope with high computational demands in instance-based prediction, methods
of dataset reduction are evaluated considering real-time determinism to allow
for the reliable integration into battery-powered portable devices. The
influence of parameterization and varying proportionality schemes is analyzed,
utilizing an eight-channel-sEMG armband. Besides offline cross-validation
accuracy, success rates in real-time pilot experiments (online target
achievement tests) are determined. Based on the assessment of specific dataset
reduction techniques' adequacy for embedded control applications regarding
accuracy and timing behaviour, Decision Surface Mapping (DSM) proves itself
promising when applying kNN on the reduced set. A randomized, double-blind user
study was conducted to evaluate the respective methods (kNN and kNN with
DSM-reduction) against Ridge Regression (RR) and RR with Random Fourier
Features (RR-RFF). The kNN-based methods performed significantly better
(p<0.0005) than the regression techniques. Between DSM-kNN and kNN, there was
no statistically significant difference (significance level 0.05). This is
remarkable in consideration of only one sample per class in the reduced set,
thus yielding a reduction rate of over 99% while preserving success rate. The
same behaviour could be confirmed in an extended user study. With k=1, which
turned out to be an excellent choice, the runtime complexity of both kNN (in
every prediction step) as well as DSM-kNN (in the training phase) becomes
linear concerning the number of original samples, favouring dependable wearable
prosthesis applications
An Efficient Big Data Visualization Deep Learning Architecture Model for Path Selection of College Students through Moral Education
Visualization technology can be used to present the analysis results in a more intuitive and easy-to-understand way, which can help educators to better understand the moral education needs of college students, and adjust their teaching strategies accordingly. The combination of big data analysis and visualization technology can also help to improve the efficiency and effectiveness of moral education in colleges and universities. The research on the moral education path selection of college students based on big data visualization has great significance for promoting the development of moral education in colleges and universities, and for cultivating high-quality talent with good moral character. This paper proposed an Optimization model for big data analytics for moral education. The data associated with moral education and information are stored in cloud with the big data. The stored big data visualization process is performed with the optimization model for the feature extraction. The optimization is performed with an integrated Flamingo and weighted black widow Optimization model. The proposed model is stated as the Integrated Flamingo Black Widow (IFBW) model. The performance of the IFBW model is implemented with the deep learning Restricted Boltzmann Machine (RBM) architecture. Simulation analysis stated that IFBW model achieves a higher classification accuracy rate of 99% with a minimal error rate
Balancing between the Local and Global Structures (LGS) in Graph Embedding
We present a method for balancing between the Local and Global Structures
(LGS) in graph embedding, via a tunable parameter. Some embedding methods aim
to capture global structures, while others attempt to preserve local
neighborhoods. Few methods attempt to do both, and it is not always possible to
capture well both local and global information in two dimensions, which is
where most graph drawing live. The choice of using a local or a global
embedding for visualization depends not only on the task but also on the
structure of the underlying data, which may not be known in advance. For a
given graph, LGS aims to find a good balance between the local and global
structure to preserve. We evaluate the performance of LGS with synthetic and
real-world datasets and our results indicate that it is competitive with the
state-of-the-art methods, using established quality metrics such as stress and
neighborhood preservation. We introduce a novel quality metric, cluster
distance preservation, to assess intermediate structure capture. All
source-code, datasets, experiments and analysis are available online.Comment: Appears in the Proceedings of the 31st International Symposium on
Graph Drawing and Network Visualization (GD 2023
- âŠ