135 research outputs found

    Less is More: Restricted Representations for Better Interpretability and Generalizability

    Get PDF
    Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery. Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets. This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime. We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3. We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability. In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6. In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5. Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available. In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8. The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability

    Applying machine learning: a multi-role perspective

    Get PDF
    Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference

    Multiparametric Magnetic Resonance Imaging Artificial Intelligence Pipeline for Oropharyngeal Cancer Radiotherapy Treatment Guidance

    Get PDF
    Oropharyngeal cancer (OPC) is a widespread disease and one of the few domestic cancers that is rising in incidence. Radiographic images are crucial for assessment of OPC and aid in radiotherapy (RT) treatment. However, RT planning with conventional imaging approaches requires operator-dependent tumor segmentation, which is the primary source of treatment error. Further, OPC expresses differential tumor/node mid-RT response (rapid response) rates, resulting in significant differences between planned and delivered RT dose. Finally, clinical outcomes for OPC patients can also be variable, which warrants the investigation of prognostic models. Multiparametric MRI (mpMRI) techniques that incorporate simultaneous anatomical and functional information coupled to artificial intelligence (AI) approaches could improve clinical decision support for OPC by providing immediately actionable clinical rationale for adaptive RT planning. If tumors could be reproducibly segmented, rapid response could be classified, and prognosis could be reliably determined, overall patient outcomes would be optimized to improve the therapeutic index as a function of more risk-adapted RT volumes. Consequently, there is an unmet need for automated and reproducible imaging which can simultaneously segment tumors and provide predictive value for actionable RT adaptation. This dissertation primarily seeks to explore and optimize image processing, tumor segmentation, and patient outcomes in OPC through a combination of advanced imaging techniques and AI algorithms. In the first specific aim of this dissertation, we develop and evaluate mpMRI pre-processing techniques for use in downstream segmentation, response prediction, and outcome prediction pipelines. Various MRI intensity standardization and registration approaches were systematically compared and benchmarked. Moreover, synthetic image algorithms were developed to decrease MRI scan time in an effort to optimize our AI pipelines. We demonstrated that proper intensity standardization and image registration can improve mpMRI quality for use in AI algorithms, and developed a novel method to decrease mpMRI acquisition time. Subsequently, in the second specific aim of this dissertation, we investigated underlying questions regarding the implementation of RT-related auto-segmentation. Firstly, we quantified interobserver variability for an unprecedented large number of observers for various radiotherapy structures in several disease sites (with a particular emphasis on OPC) using a novel crowdsourcing platform. We then trained an AI algorithm on a series of extant matched mpMRI datasets to segment OPC primary tumors. Moreover, we validated and compared our best model\u27s performance to clinical expert observers. We demonstrated that AI-based mpMRI OPC tumor auto-segmentation offers decreased variability and comparable accuracy to clinical experts, and certain mpMRI input channel combinations could further improve performance. Finally, in the third specific aim of this dissertation, we predicted OPC primary tumor mid-therapy (rapid) treatment response and prognostic outcomes. Using co-registered pre-therapy and mid-therapy primary tumor manual segmentations of OPC patients, we generated and characterized treatment sensitive and treatment resistant pre-RT sub-volumes. These sub-volumes were used to train an AI algorithm to predict individual voxel-wise treatment resistance. Additionally, we developed an AI algorithm to predict OPC patient progression free survival using pre-therapy imaging from an international data science competition (ranking 1st place), and then translated these approaches to mpMRI data. We demonstrated AI models could be used to predict rapid response and prognostic outcomes using pre-therapy imaging, which could help guide treatment adaptation, though further work is needed. In summary, the completion of these aims facilitates the development of an image-guided fully automated OPC clinical decision support tool. The resultant deliverables from this project will positively impact patients by enabling optimized therapeutic interventions in OPC. Future work should consider investigating additional imaging timepoints, imaging modalities, uncertainty quantification, perceptual and ethical considerations, and prospective studies for eventual clinical implementation. A dynamic version of this dissertation is publicly available and assigned a digital object identifier through Figshare (doi: 10.6084/m9.figshare.22141871)

    Microservices-Based Autonomous Anomaly Detection for Mobile Network Observability

    Get PDF
    In modern telecommunication networks, network observability entails the use of diverse data sources to understand the state and behavior of the network, and its ability to provide the required service and user experience. Because of the vast amounts of data collection and transmission involved in this process, the network's performance is negatively impacted, and it can become difficult for network operators to identify the occurrence of problematic behavior before it is too late. To enable a more efficient form of data collection and aid in diagnostic operations, this thesis aims to develop an autonomous anomaly detection system for time series data. The system is to be developed as a microservices-based solution, to be integrated with a software-defined networking controller platform developed at \textit{Ericsson}. This thesis describes the extensive experimentation process conducted during the development of this system, including various methods of data processing, time series clustering, and anomaly detection. The resulting system is a highly customizable and scalable product, supported by modern and reliable anomaly detection models. The system is capable of detecting several different kinds of anomalies in an arbitrary number of mobile network monitoring metrics and can be easily configured to fit the specific needs of each customer

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 33. Workshops „Computational Intelligence“ der vom 23.11. – 24.11.2023 in Berlin stattfindet. Die Schwerpunkte sind Methoden, Anwendungen und Tools für ° Fuzzy-Systeme, ° Künstliche Neuronale Netze, ° Evolutionäre Algorithmen und ° Data-Mining-Verfahren sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen.The workshop proceedings contain the contributions of the 33rd workshop "Computational Intelligence" which will take place from 23.11. - 24.11.2023 in Berlin. The focus is on methods, applications and tools for ° Fuzzy systems, ° Artificial Neural Networks, ° Evolutionary algorithms and ° Data mining methods as well as the comparison of methods on the basis of industrial and benchmark problems

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Tumor heterogeneity in glioblastoma:a real-life brain teaser

    Get PDF

    Structural Cheminformatics for Kinase-Centric Drug Design

    Get PDF
    Drug development is a long, expensive, and iterative process with a high failure rate, while patients wait impatiently for treatment. Kinases are one of the main drug targets studied for the last decades to combat cancer, the second leading cause of death worldwide. These efforts resulted in a plethora of structural, chemical, and pharmacological kinase data, which are collected in the KLIFS database. In this thesis, we apply ideas from structural cheminformatics to the rich KLIFS dataset, aiming to provide computational tools that speed up the complex drug discovery process. We focus on methods for target prediction and fragment-based drug design that study characteristics of kinase binding sites (also called pockets). First, we introduce the concept of computational target prediction, which is vital in the early stages of drug discovery. This approach identifies biological entities such as proteins that may (i) modulate a disease of interest (targets or on-targets) or (ii) cause unwanted side effects due to their similarity to on-targets (off-targets). We focus on the research field of binding site comparison, which lacked a freely available and efficient tool to determine similarities between the highly conserved kinase pockets. We fill this gap with the novel method KiSSim, which encodes and compares spatial and physicochemical pocket properties for all kinases (kinome) that are structurally resolved. We study kinase similarities in the form of kinome-wide phylogenetic trees and detect expected and unexpected off-targets. To allow multiple perspectives on kinase similarity, we propose an automated and production-ready pipeline; user-defined kinases can be inspected complementarily based on their pocket sequence and structure (KiSSim), pocket-ligand interactions, and ligand profiles. Second, we introduce the concept of fragment-based drug design, which is useful to identify and optimize active and promising molecules (hits and leads). This approach identifies low-molecular-weight molecules (fragments) that bind weakly to a target and are then grown into larger high-affinity drug-like molecules. With the novel method KinFragLib, we provide a fragment dataset for kinases (fragment library) by viewing kinase inhibitors as combinations of fragments. Kinases have a highly conserved pocket with well-defined regions (subpockets); based on the subpockets that they occupy, we fragment kinase inhibitors in experimentally resolved protein-ligand complexes. The resulting dataset is used to generate novel kinase-focused molecules that are recombinations of the previously fragmented kinase inhibitors while considering their subpockets. The KinFragLib and KiSSim methods are published as freely available Python tools. Third, we advocate for open and reproducible research that applies FAIR principles ---data and software shall be findable, accessible, interoperable, and reusable--- and software best practices. In this context, we present the TeachOpenCADD platform that contains pipelines for computer-aided drug design. We use open source software and data to demonstrate ligand-based applications from cheminformatics and structure-based applications from structural bioinformatics. To emphasize the importance of FAIR data, we dedicate several topics to accessing life science databases such as ChEMBL, PubChem, PDB, and KLIFS. These pipelines are not only useful to novices in the field to gain domain-specific skills but can also serve as a starting point to study research questions. Furthermore, we show an example of how to build a stand-alone tool that formalizes reoccurring project-overarching tasks: OpenCADD-KLIFS offers a clean and user-friendly Python API to interact with the KLIFS database and fetch different kinase data types. This tool has been used in this thesis and beyond to support kinase-focused projects. We believe that the FAIR-based methods, tools, and pipelines presented in this thesis (i) are valuable additions to the toolbox for kinase research, (ii) provide relevant material for scientists who seek to learn, teach, or answer questions in the realm of computer-aided drug design, and (iii) contribute to making drug discovery more efficient, reproducible, and reusable
    corecore