240 research outputs found
Meta-learning algorithms and applications
Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples.
Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number.
Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation.
More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Understanding and Adapting Tree Ensembles: A Training Data Perspective
Despite the impressive success of deep-learning models on unstructured data (e.g., images, audio, text), tree-based ensembles such as random forests and gradient-boosted trees are hugely popular and remain the preferred choice for tabular or structured data, and are regularly used to win challenges on data-competition websites such as Kaggle and DrivenData. Despite their impressive predictive performance, tree-based ensembles lack certain characteristics which may limit their further adoption, especially for safety-critical or privacy-sensitive domains such as weather forecasting or predictive medical modeling.
This dissertation investigates the shortcomings currently facing tree-based ensembles---lack of explainable predictions, limited uncertainty estimation, and inefficient adaptability to changes in the training data---and posits that numerous improvements to tree-based ensembles can be made by analyzing the relationships between the training data and the resulting learned model. By studying the effects of one or many training examples on tree-based ensembles, we develop solutions for these models which (1) increase their predictive explainability, (2) provide accurate uncertainty estimates for individual predictions, and (3) efficiently adapt learned models to accurately reflect updated training data.
This dissertation includes previously published coauthored material
Anomaliedetektion in rÀumlich-zeitlichen DatensÀtzen
Die UnterstĂŒtzung des Menschen bei Ăberwachungsaufgaben ist aufgrund der ĂŒberwĂ€ltigenden Menge an Sensordaten von entscheidender Bedeutung. Diese Arbeit konzentriert sich auf die Entwicklung von Datenfusionsmethoden am Beispiel des maritimen Raums. Es werden verschiedene Anomalien untersucht, anhand realer Schiffsverkehrsdaten bewertet und mit Experten erprobt. Dazu werden Situationen von Interesse und Anomalien basierend auf verschiedenen maschinellen Lernverfahren modelliert und evaluiert
Evaluering av maskinlĂŠringsmetoder for automatisk tumorsegmentering
The definition of target volumes and organs at risk (OARs) is a critical part of radiotherapy planning. In routine practice, this is typically done manually by clinical experts who contour the structures in medical images prior to dosimetric planning. This is a time-consuming and labor-intensive task. Moreover, manual contouring is inherently a subjective task and substantial contour variability can occur, potentially impacting on radiotherapy treatment and image-derived biomarkers. Automatic segmentation (auto-segmentation) of target volumes and
OARs has the potential to save time and resources while reducing contouring variability. Recently, auto-segmentation of OARs using machine learning methods has been integrated into the clinical workflow by several institutions and such tools have been made commercially available by major vendors. The use of machine learning methods for auto-segmentation of target volumes including the gross tumor volume (GTV) is less mature at present but is the focus of extensive ongoing research.
The primary aim of this thesis was to investigate the use of machine learning methods for auto-segmentation of the GTV in medical images. Manual GTV contours constituted the ground truth in the analyses. Volumetric overlap and distance-based metrics were used to quantify auto-segmentation performance. Four different
image datasets were evaluated. The first dataset, analyzed in papers IâII, consisted of positron emission tomography (PET) and contrast-enhanced computed tomography (ceCT) images of 197 patients with head and neck cancer (HNC). The ceCT images of this dataset were also included in paper IV. Two datasets were analyzed separately in paper III, namely (i) PET, ceCT, and low-dose CT (ldCT) images of 86 patients with anal cancer (AC), and (ii) PET, ceCT, ldCT, and T2 and diffusion-weighted (T2W and DW, respectively) MR images of a subset (n = 36) of the aforementioned AC patients. The last dataset consisted of ceCT images
of 36 canine patients with HNC and was analyzed in paper IV.
In paper I, three approaches to auto-segmentation of the GTV in patients with HNC were evaluated and compared, namely conventional PET thresholding, classical machine learning algorithms, and deep learning using a 2-dimensional (2D) U-Net convolutional neural network (CNN). For the latter two approaches the effect of imaging modality on auto-segmentation performance was also assessed. Deep learning based on multimodality PET/ceCT image input resulted in superior agreement with the manual ground truth contours, as quantified by geometric overlap and distance-based performance evaluation metrics calculated on a per patient basis. Moreover, only deep learning provided adequate performance for segmentation based solely on ceCT images. For segmentation based on PET-only, all three approaches provided adequate segmentation performance, though deep learning ranked first, followed by classical machine learning, and PET thresholding. In paper II, deep learning-based auto-segmentation of the GTV in patients with HNC using a 2D U-Net architecture was evaluated more thoroughly by introducing new structure-based performance evaluation metrics and including qualitative expert evaluation of the resulting auto-segmentation quality. As in paper I, multimodal PET/ceCT image input provided superior segmentation performance, compared to the single modality CNN models. The structure-based metrics showed quantitatively that the PET signal was vital for the sensitivity of the CNN models, as the superior PET/ceCT-based model identified 86 % of all malignant GTV structures whereas the ceCT-based model only identified 53 % of these structures. Furthermore, the majority of the qualitatively evaluated auto-segmentations (~ 90 %) generated by the best PET/ceCT-based CNN were given a quality score corresponding to substantial clinical value. Based on papers I and II, deep learning with multimodality PET/ceCT image input would be the recommended approach for auto-segmentation of the GTV in human patients with HNC.
In paper III, deep learning-based auto-segmentation of the GTV in patients with AC was evaluated for the first time, using a 2D U-Net architecture. Furthermore, an extensive comparison of the impact of different single modality and multimodality combinations of PET, ceCT, ldCT, T2W, and/or DW image input on quantitative auto-segmentation performance was conducted. For both the 86-patient and 36-patient datasets, the models based on PET/ceCT provided the highest mean overlap with the manual ground truth contours. For this task, however, comparable auto-segmentation quality was obtained for solely ceCT-based CNN models. The CNN model based solely on T2W images also obtained acceptable auto-segmentation performance and was ranked as the second-best single modality model for the 36-patient dataset. These results indicate that deep learning could prove a versatile future tool for auto-segmentation of the GTV in patients with AC.
Paper IV investigated for the first time the applicability of deep learning-based auto-segmentation of the GTV in canine patients with HNC, using a 3-dimensional (3D) U-Net architecture and ceCT image input. A transfer learning approach where CNN models were pre-trained on the human HNC data and subsequently fine-tuned on canine data was compared to training models from scratch on canine data. These two approaches resulted in similar auto-segmentation performances, which on average was comparable to the overlap metrics obtained for ceCT-based auto-segmentation in human HNC patients. Auto-segmentation in canine HNC patients appeared particularly promising for nasal cavity tumors, as the average overlap with manual contours was 25 % higher for this subgroup, compared to the average for all included tumor sites.
In conclusion, deep learning with CNNs provided high-quality GTV autosegmentations for all datasets included in this thesis. In all cases, the best-performing deep learning models resulted in an average overlap with manual contours which was comparable to the reported interobserver agreements between human experts performing manual GTV contouring for the given cancer type and imaging modality. Based on these findings, further investigation of deep learning-based auto-segmentation of the GTV in the given diagnoses would be highly warranted.Definisjon av mÄlvolum og risikoorganer er en kritisk del av planleggingen av strÄlebehandling. I praksis gjÞres dette vanligvis manuelt av kliniske eksperter som tegner inn strukturenes konturer i medisinske bilder fÞr dosimetrisk planlegging. Dette er en tids- og arbeidskrevende oppgave. Manuell inntegning er ogsÄ subjektiv, og betydelig variasjon i inntegnede konturer kan forekomme. Slik variasjon kan potensielt pÄvirke strÄlebehandlingen og bildebaserte biomarkÞrer. Automatisk segmentering (auto-segmentering) av mÄlvolum og risikoorganer kan potensielt spare tid og ressurser samtidig som konturvariasjonen reduseres. Autosegmentering av risikoorganer ved hjelp av maskinlÊringsmetoder har nylig blitt implementert som del av den kliniske arbeidsflyten ved flere helseinstitusjoner, og slike verktÞy er kommersielt tilgjengelige hos store leverandÞrer av medisinsk teknologi. Auto-segmentering av mÄlvolum inkludert tumorvolumet gross tumor volume (GTV) ved hjelp av maskinlÊringsmetoder er per i dag mindre teknologisk modent, men dette omrÄdet er fokus for omfattende pÄgÄende forskning.
HovedmĂ„let med denne avhandlingen var Ă„ undersĂžke bruken av maskinlĂŠringsmetoder for auto-segmentering av GTV i medisinske bilder. Manuelle GTVinntegninger utgjorde grunnsannheten (the ground truth) i analysene. MĂ„l pĂ„ volumetrisk overlapp og avstand mellom sanne og predikerte konturer ble brukt til Ă„ kvantifisere kvaliteten til de automatisk genererte GTV-konturene. Fire forskjellige bildedatasett ble evaluert. Det fĂžrste datasettet, analysert i artikkel IâII, bestod av positronemisjonstomografi (PET) og kontrastforsterkede computertomografi (ceCT) bilder av 197 pasienter med hode/halskreft. ceCT-bildene i dette datasettet ble ogsĂ„ inkludert i artikkel IV. To datasett ble analysert separat i artikkel III, nemlig (i) PET, ceCT og lavdose CT (ldCT) bilder av 86 pasienter med analkreft, og (ii) PET, ceCT, ldCT og T2- og diffusjonsvektet (henholdsvis T2W og DW) MR-bilder av en undergruppe (n = 36) av de ovennevnte analkreftpasientene. Det siste datasettet, som bestod av ceCT-bilder av 36 hunder med hode/halskreft, ble analysert i artikkel IV
Exploration and adaptation of large language models for specialized domains
Large language models have transformed the field of natural language processing (NLP). Their improved performance on various NLP benchmarks makes them a promising toolâalso for the application in specialized domains. Such domains are characterized by highly trained professionals with particular domain expertise. Since these experts are rare, improving the efficiency of their work with automated systems is especially desirable. However, domain-specific text resources hold various challenges for NLP systems. These challenges include distinct language, noisy and scarce data, and a high level of variation. Further, specialized domains present an increased need for transparent systems since they are often applied in high stakes settings. In this dissertation, we examine whether large language models (LLMs) can overcome some of these challenges and propose methods to effectively adapt them to domain-specific requirements.
We first investigate the inner workings and abilities of LLMs and show how they can fill the gaps that are present in previous NLP algorithms for specialized domains. To this end, we explore the sources of errors produced by earlier systems to identify which of them can be addressed by using LLMs. Following this, we take a closer look at how information is processed within Transformer-based LLMs to better understand their capabilities. We find that their layers encode different dimensions of the input text. Here, the contextual vector representation, and the general language knowledge learned during pre-training are especially beneficial for solving complex and multi-step tasks common in specialized domains.
Following this exploration, we propose solutions for further adapting LLMs to the requirements of domain-specific tasks. We focus on the clinical domain, which incorporates many typical challenges found in specialized domains. We show how to improve generalization by integrating different domain-specific resources into our models. We further analyze the behavior of the produced models and propose a behavioral testing framework that can serve as a tool for communication with domain experts. Finally, we present an approach for incorporating the benefits of LLMs while fulfilling requirements such as interpretability and modularity. The presented solutions show improvements in performance on benchmark datasets and in manually conducted analyses with medical professionals.
Our work provides both new insights into the inner workings of pre-trained language models as well as multiple adaptation methods showing that LLMs can be an effective tool for NLP in specialized domains
Joint learning from multiple information sources for biological problems
Thanks to technological advancements, more and more biological data havebeen generated in recent years. Data availability offers unprecedented opportunities to look at the same problem from multiple aspects. It also unveils a more global view of the problem that takes into account the intricated inter-play between the involved molecules/entities. Nevertheless, biological datasets are biased, limited in quantity, and contain many false-positive samples. Such challenges often drastically downgrade the performance of a predictive model on unseen data and, thus, limit its applicability in real biological studies.
Human learning is a multi-stage process in which we usually start with simple things. Through the accumulated knowledge over time, our cognition ability extends to more complex concepts. Children learn to speak simple words before being able to formulate sentences. Similarly, being able to speak correct sentences supports our learning to speak correct and meaningful paragraphs, etc. Generally, knowledge acquired from related learning tasks would help boost our learning capability in the current task. Motivated by such a phenomenon, in this thesis, we study supervised machine learning models for bioinformatics problems that can improve their performance through exploiting multiple related knowledge sources. More specifically, we concern with ways to enrich the supervised modelsâ knowledge base with publicly available related data to enhance the computational modelsâ prediction performance.
Our work shares commonality with existing works in multimodal learning, multi-task learning, and transfer learning. Nevertheless, there are certain differences in some cases. Besides the proposed architectures, we present large-scale experiment setups with consensus evaluation metrics along with the creation and release of large datasets to showcase our approachesâ superiority. Moreover, we add case studies with detailed analyses in which we place no simplified assumptions to demonstrate the systemsâ utilities in realistic application scenarios. Finally, we develop and make available an easy-to-use website for non-expert users to query the modelâs generated prediction results to facilitate field expertsâ assessments and adaptation. We believe that our work serves as one of the first steps in bridging the gap between âComputer Scienceâ and âBiologyâ that will open a new era of fruitful collaboration between computer scientists and biological field experts
Anomaliedetektion in rÀumlich-zeitlichen DatensÀtzen
Eine UnterstĂŒtzung des Menschen in Ăberwachungsaufgaben spielt eine immer wichtigere Rolle, da die schiere Menge der anfallenden Daten von heterogenen Sensoren eine Ăberforderung des Menschen zur Folge hat. HierfĂŒr mĂŒssen dem Menschen in kritischen Entscheidungen die wichtigsten Informationen transparent dargebracht werden, um so das Situationsbewusstsein zu stĂ€rken. In dieser Arbeit wird der maritime Raum als Beispiel fĂŒr die Entwicklung verschiedener Datenfusionsverfahren zu ebendiesem Zweck herangezogen.
Der maritime Raum als Anwendungsszenario bietet durch seine enorme wirtschaftliche Bedeutung fĂŒr den Welthandel, das Auftreten verschiedenster Anomalien und krimineller Handlungen wie Piraterie und illegaler Fischerei und die VerfĂŒgbarkeit von Datenquellen ein gut fĂŒr die Erprobung der Verfahren geeignetes Umfeld.
Die entwickelten und untersuchten Verfahren decken hierbei die gesamte Bandbreite von einfachen Positions- und kinematischen Anomalien, ĂŒber kontextuelle Anomalien bis zu komplexen Anomalien ab. FĂŒr die Untersuchung werden verschiedene DatensĂ€tze mit realen Schiffsverkehrsinformationen genutzt. AuĂerdem werden die Verfahren teilweise in Live Trials mit KĂŒstenwachen erprobt.
Zur Entwicklung der Verfahren wird als Grundlage zunĂ€chst das objektorientierte Weltmodell um Verhaltensmodelle erweitert sowie das EUCISE-Datenmodell als Basis fĂŒr die Modellierung des verfĂŒgbaren Hintergrundwissens identifiziert. Die ersten untersuchten Verfahren detektieren Anomalien in der Position und der Kinematik basierend auf einzelnen Datenpunkten oder ganzen Trajektorien. Hierbei wurde festgestellt, dass zwar Anomalien erkannt werden, die Korrektklassifikationsrate fĂŒr einen tatsĂ€chlichen Einsatz aber deutlich zu hoch ausfĂ€llt sowie bestimmte Anomalien ohne Kontext nicht bestimmbar sind.
Im nÀchsten Schritt wird ein Multiagentensystem aufgestellt, welches das Verhalten der beobachteten Objekte durch spieltheoretische Modelle simuliert. Die hierzu notwendigen Nutzenfunktionen werden sowohl wissensbasiert als auch datengetrieben hergeleitet. Mit den integrierten Kontextinformationen können echte Anomalien deutlich besser von normalem Verhalten abgegrenzt werden.
Des Weiteren wird gezeigt, wie mit Hilfe von Merkmalen, die aus georeferenzierten Informationen abgeleitet werden, Kontextinformationen zur Klassifikation von Schiffstypen in neuronalen Netzen integriert werden können.
Im letzten Schritt werden komplexe Anomalien in Form von spezifischen Situationen basierend auf dynamischen Bayesâschen Netzen modelliert und in Live Trials erprobt. Hierbei werden Kontextinformationen, wie das Wetter, sowie Datenquellen mit unterschiedlicher ZuverlĂ€ssigkeit integriert, um Situationen in verschiedenen durch Endanwender/-innen mitgestalteten Anwendungsszenarien zu erkennen.
Insgesamt wird gezeigt, dass mit automatischen Verfahren Anomalien unterschiedlicher Art erkannt werden können. Die Verfahren werden jeweils mit realen Daten evaluiert, um die Möglichkeit des tatsĂ€chlichen Einsatzes als EntscheidungsunterstĂŒtzung fĂŒr Menschen in realen Anwendungsszenarien aufzuzeigen
Applied Mathematics to Mechanisms and Machines
This book brings together all 16 articles published in the Special Issue "Applied Mathematics to Mechanisms and Machines" of the MDPI Mathematics journal, in the section âEngineering Mathematicsâ. The subject matter covered by these works is varied, but they all have mechanisms as the object of study and mathematics as the basis of the methodology used. In fact, the synthesis, design and optimization of mechanisms, robotics, automotives, maintenance 4.0, machine vibrations, control, biomechanics and medical devices are among the topics covered in this book. This volume may be of interest to all who work in the field of mechanism and machine science and we hope that it will contribute to the development of both mechanical engineering and applied mathematics
Intelligent road lane mark extraction using a Mobile Mapping System
102 p.During the last years, road landmark in- ventory has raised increasing interest in different areas: the maintenance of transport infrastructures, road 3d modelling, GIS applications, etc. The lane mark detection is posed as a two-class classification problem over a highly class imbalanced dataset. To cope with this imbalance we have applied Active Learning approaches. This Thesis has been divided into two main com- putational parts. In the first part, we have evaluated different Machine Learning approaches using panoramic images, obtained from image sensor, such as Random Forest (RF) and ensembles of Extreme Learning Machines (V-ELM), obtaining satisfactory results in the detection of road continuous lane marks. In the second part of the Thesis, we have applied a Random Forest algorithm to a LiDAR point cloud, obtaining a georeferenced road horizontal signs classification. We have not only identified continuous lines, but also, we have been able to identify every horizontal lane mark detected by the LiDAR sensor
- âŠ