367 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Less is More: Restricted Representations for Better Interpretability and Generalizability
Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery.
Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets.
This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime.
We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it.
We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3.
We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior.
Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability.
In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data.
To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6.
In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5.
Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available.
In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8.
The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability
Structured Semidefinite Programming for Recovering Structured Preconditioners
We develop a general framework for finding approximately-optimal
preconditioners for solving linear systems. Leveraging this framework we obtain
improved runtimes for fundamental preconditioning and linear system solving
problems including the following. We give an algorithm which, given positive
definite with
nonzero entries, computes an -optimal
diagonal preconditioner in time , where is the
optimal condition number of the rescaled matrix. We give an algorithm which,
given that is either the pseudoinverse
of a graph Laplacian matrix or a constant spectral approximation of one, solves
linear systems in in time. Our diagonal
preconditioning results improve state-of-the-art runtimes of
attained by general-purpose semidefinite programming, and our solvers improve
state-of-the-art runtimes of where is the
current matrix multiplication constant. We attain our results via new
algorithms for a class of semidefinite programs (SDPs) we call
matrix-dictionary approximation SDPs, which we leverage to solve an associated
problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172
Named Entity Resolution in Personal Knowledge Graphs
Entity Resolution (ER) is the problem of determining when two entities refer
to the same underlying entity. The problem has been studied for over 50 years,
and most recently, has taken on new importance in an era of large,
heterogeneous 'knowledge graphs' published on the Web and used widely in
domains as wide ranging as social media, e-commerce and search. This chapter
will discuss the specific problem of named ER in the context of personal
knowledge graphs (PKGs). We begin with a formal definition of the problem, and
the components necessary for doing high-quality and efficient ER. We also
discuss some challenges that are expected to arise for Web-scale data. Next, we
provide a brief literature review, with a special focus on how existing
techniques can potentially apply to PKGs. We conclude the chapter by covering
some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct.
2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and
applications' edited by Tiwari et a
Application of machine learning to quantify forest cover loss in the Congo Basin and its implications for large mammal habitat suitability
Machine learning (ML) models are a powerful tool for land use and land cover (LULC) mapping. In the African tropics, and particularly in the Congo Basin, there is a need to better assess the performance and reliability of ML-based LULC classification using coarse-resolution satellite images. In the context of ongoing climate change and socioeconomically-driven forest disturbances, it is important to understand and quantify the extent of forest cover loss in the Congo Basin, as well as the impact of this loss on suitable habitat for key wildlife species. In this dissertation, I address these key issues in three manuscript-based chapters. In Chapter 2, I compared the classification performance of four ML algorithms (k-nearest neighbor (kNN), support vector machines (SVM), artificial neural networks (ANN), and random forests (RF)) for LULC mapping within a tropical region in Central Africa (the Mayo Rey department of northern Cameroon). All four classification algorithms produced high accuracy (overall classification accuracy > 80%), with the RF model (> 90% classification accuracy) outperforming the other algorithms. In Chapter 3, I used the RF model, together with the Idrissi TerrSet land change modeler, to map and project LULCC for the Congo Basin under historical and future scenarios of socioeconomic impacts and climate change. I found that over 352642 km2 of dense forests have been lost in this region between 1990 and 2020, with projected continued loss of about 174860 - 204161 km2 by the year 2050. In Chapter 4, I produced spatially explicit species distribution models to map habitat suitability for great apes (chimpanzees and gorillas) and elephants within the Dzanga Sangha Protected Areas (DSPA) of the Congo Basin. I found that priority habitat areas for the three mammal species mostly occurred and overlapped spatially within the DSPA national parks. However, priority habitat areas for the three species declined by 4, 4.5 and 9.8 percentage points respectively between 2015 and 2020, mostly due to increased human pressures. This research provides a new understanding of the extend and implications of forest cover loss in the Congo Basin, highlighting the critical conservation challenges that remain in this region
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
Adaptive Automated Machine Learning
The ever-growing demand for machine learning has led to the development of automated machine learning (AutoML) systems that can be used off the shelf by non-experts. Further, the demand for ML applications with high predictive performance exceeds the number of machine learning experts and makes the development of AutoML systems necessary. Automated Machine Learning tackles the problem of finding machine learning models with high predictive performance. Existing approaches incorporating deep learning techniques assume that all data is available at the beginning of the training process (offline learning). They configure and optimise a pipeline of preprocessing, feature engineering, and model selection by choosing suitable hyperparameters in each model pipeline step. Furthermore, they assume that the user is fully aware of the choice and, thus, the consequences of the underlying metric (such as precision, recall, or F1-measure). By variation of this metric, the search for suitable configurations and thus the adaptation of algorithms can be tailored to the user’s needs. With the creation of a vast amount of data from all kinds of sources every day, our capability to process and understand these data sets in a single batch is no longer viable. By training machine learning models incrementally (i.ex. online learning), the flood of data can be processed sequentially within data streams. However, if one assumes an online learning scenario, where an AutoML instance executes on evolving data streams, the question of the best model and its configuration remains open.
In this work, we address the adaptation of AutoML in an offline learning scenario toward a certain utility an end-user might pursue as well as the adaptation of AutoML towards evolving data streams in an online learning scenario with three main contributions:
1. We propose a System that allows the adaptation of AutoML and the search for neural architectures towards a particular utility an end-user might pursue.
2. We introduce an online deep learning framework that fosters the research of deep learning models under the online learning assumption and enables the automated search for neural architectures.
3. We introduce an online AutoML framework that allows the incremental adaptation of ML models.
We evaluate the contributions individually, in accordance with predefined requirements and to state-of-the- art evaluation setups. The outcomes lead us to conclude that (i) AutoML, as well as systems for neural architecture search, can be steered towards individual utilities by learning a designated ranking model from pairwise preferences and using the latter as the target function for the offline learning scenario; (ii) architectual small neural networks are in general suitable assuming an online learning scenario; (iii) the configuration of machine learning pipelines can be automatically be adapted to ever-evolving data streams and lead to better performances
Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence
The increasing capacities of large language models (LLMs) present an
unprecedented opportunity to scale up data analytics in the humanities and
social sciences, augmenting and automating qualitative analytic tasks
previously typically allocated to human labor. This contribution proposes a
systematic mixed methods framework to harness qualitative analytic expertise,
machine scalability, and rigorous quantification, with attention to
transparency and replicability. 16 machine-assisted case studies are showcased
as proof of concept. Tasks include linguistic and discourse analysis, lexical
semantic change detection, interview analysis, historical event cause inference
and text mining, detection of political stance, text and idea reuse, genre
composition in literature and film; social network inference, automated
lexicography, missing metadata augmentation, and multimodal visual cultural
analytics. In contrast to the focus on English in the emerging LLM
applicability literature, many examples here deal with scenarios involving
smaller languages and historical texts prone to digitization distortions. In
all but the most difficult tasks requiring expert knowledge, generative LLMs
can demonstrably serve as viable research instruments. LLM (and human)
annotations may contain errors and variation, but the agreement rate can and
should be accounted for in subsequent statistical modeling; a bootstrapping
approach is discussed. The replications among the case studies illustrate how
tasks previously requiring potentially months of team effort and complex
computational pipelines, can now be accomplished by an LLM-assisted scholar in
a fraction of the time. Importantly, this approach is not intended to replace,
but to augment researcher knowledge and skills. With these opportunities in
sight, qualitative expertise and the ability to pose insightful questions have
arguably never been more critical
- …