7,625 research outputs found

    On information captured by neural networks: connections with memorization and generalization

    Full text link
    Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.Comment: PhD thesi

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments

    Full text link
    Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated can lead to prediction errors. The partially observable Markov decision process (POMDP) is a widely-used framework to model these stochastic and partially-observable decision-making problems. However, due to a lack of explicit causal semantics, POMDP planning methods are prone to confounding bias and thus in the presence of unobserved confounders may produce underperforming policies. This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT), a modern anytime online POMDP planner, using causal modelling and inference to eliminate errors caused by unmeasured confounder variables. We further propose a method to learn offline the partial parameterisation of the causal model for planning, from ground truth model data. We evaluate our methods on a toy problem with an unobserved confounder and show that the learned causal model is highly accurate, while our planning method is more robust to confounding and produces overall higher performing policies than AR-DESPOT.Comment: 8 pages, 3 figures, submitted to 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Comparative Multiple Case Study into the Teaching of Problem-Solving Competence in Lebanese Middle Schools

    Get PDF
    This multiple case study investigates how problem-solving competence is integrated into teaching practices in private schools in Lebanon. Its purpose is to compare instructional approaches to problem-solving across three different programs: the American (Common Core State Standards and New Generation Science Standards), French (Socle Commun de Connaissances, de Compétences et de Culture), and Lebanese with a focus on middle school (grades 7, 8, and 9). The project was conducted in nine schools equally distributed among three categories based on the programs they offered: category 1 schools offered the Lebanese program, category 2 the French and Lebanese programs, and category 3 the American and Lebanese programs. Each school was treated as a separate case. Structured observation data were collected using observation logs that focused on lesson objectives and specific cognitive problem-solving processes. The two logs were created based on a document review of the requirements for the three programs. Structured observations were followed by semi-structured interviews that were conducted to explore teachers' beliefs and understandings of problem-solving competence. The comparative analysis of within-category structured observations revealed an instruction ranging from teacher-led practices, particularly in category 1 schools, to more student-centered approaches in categories 2 and 3. The cross-category analysis showed a reliance on cognitive processes primarily promoting exploration, understanding, and demonstrating understanding, with less emphasis on planning and executing, monitoring and reflecting, thus uncovering a weakness in addressing these processes. The findings of the post-observation semi-structured interviews disclosed a range of definitions of problem-solving competence prevalent amongst teachers with clear divergences across the three school categories. This research is unique in that it compares problem-solving teaching approaches across three different programs and explores underlying teachers' beliefs and understandings of problem-solving competence in the Lebanese context. It is hoped that this project will inform curriculum developers about future directions and much-anticipated reforms of the Lebanese program and practitioners about areas that need to be addressed to further improve the teaching of problem-solving competence

    Mitigating Voter Attribute Bias for Fair Opinion Aggregation

    Full text link
    The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation

    Positive Difference Distribution for Image Outlier Detection using Normalizing Flows and Contrastive Data

    Full text link
    Detecting test data deviating from training data is a central problem for safe and robust machine learning. Likelihoods learned by a generative model, e.g., a normalizing flow via standard log-likelihood training, perform poorly as an outlier score. We propose to use an unlabelled auxiliary dataset and a probabilistic outlier score for outlier detection. We use a self-supervised feature extractor trained on the auxiliary dataset and train a normalizing flow on the extracted features by maximizing the likelihood on in-distribution data and minimizing the likelihood on the contrastive dataset. We show that this is equivalent to learning the normalized positive difference between the in-distribution and the contrastive feature density. We conduct experiments on benchmark datasets and compare to the likelihood, the likelihood ratio and state-of-the-art anomaly detection methods

    Model Diagnostics meets Forecast Evaluation: Goodness-of-Fit, Calibration, and Related Topics

    Get PDF
    Principled forecast evaluation and model diagnostics are vital in fitting probabilistic models and forecasting outcomes of interest. A common principle is that fitted or predicted distributions ought to be calibrated, ideally in the sense that the outcome is indistinguishable from a random draw from the posited distribution. Much of this thesis is centered on calibration properties of various types of forecasts. In the first part of the thesis, a simple algorithm for exact multinomial goodness-of-fit tests is proposed. The algorithm computes exact pp-values based on various test statistics, such as the log-likelihood ratio and Pearson\u27s chi-square. A thorough analysis shows improvement on extant methods. However, the runtime of the algorithm grows exponentially in the number of categories and hence its use is limited. In the second part, a framework rooted in probability theory is developed, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. Based on a general notion of conditional T-calibration, the thesis introduces population versions of T-reliability diagrams and revisits a score decomposition into measures of miscalibration, discrimination, and uncertainty. Stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, a universal coefficient of determination is introduced that nests and reinterprets the classical R2R^2 in least squares regression. In the third part, probabilistic top lists are proposed as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicited by strictly consistent evaluation metrics, based on symmetric proper scoring rules, which admit comparison of various types of predictions

    Estimating Directional Changes Trend Reversal in Forex Using Machine Learning

    Get PDF
    Most forecasting algorithms use a physical time scale data to study price movement in financial markets by taking snapshots in fixed schedule, making the flow of time discontinuous. The use of a physical time scale can make traders oblivious to significant activities in the market, which poses risks. For example, currency risk, the risk that exchange rate will change. Directional changes is a different and newer approach of taking snapshot of the market, which uses an event-based time scale. This approach summarises data into alternating trends called upward directional change and downward directional change according to a change in price a trader considers to be significant, which is expressed as a threshold. The trends in the summary are split into directional change (DC) and overshoot (OS) events. In this work, we propose a novel DC-based framework, which uses machine learning algorithms to forecast when the next, alternate trend is expected to begin. First, we present a genetic programming (GP) algorithm that evolves equations that express linear and non-linear relationships between the length of DC and OS events in a given dataset. Awareness of DC event and OS event lengths provide traders with an idea of when DC trends are expected to reverse and thus take appropriate action to increase profit or mitigate risk. Second, DC trends can be categorised into two distinct types: (1) trends with OS events; and (2) trends without OS events(i.e. OS event length is 0). Trends with OS events are those that continue beyond a period when they were first observed and trends without OS event are others that ends as soon as they were observed. To further improve trend reversal estimation accuracy, we identified these two categorises using classification techniques and estimated OS event length for trends that belong in the first category. We appraised whether this new knowledge could lead to an even greater excess return. Third, our novel trend reversal estimation approach was then used as part of a novel genetic algorithm (GA) based trading strategy. The strategy embedded an optimised trend reversal forecasting algorithm that was based on trend reversal point forecasted by multiple thresholds. We assessed the efficiency of our framework (i.e., a novel trend reversal approach and an optimised trading strategy) by performing an in-depth investigation. To assess our approach and evaluate the extent to which it could be generalised in Forex markets, we used five tailored thresholds to create 1000 DC datasets from 10, monthly 10- minute physical time data of 20 major Forex markets (i.e 5 thresholds * 10 months * 20 currency pairs). We compared our results to six benchmarks techniques, both DC and non-DC based, such as technical analysis and buy-and-hold. Our findings showed that our proposed approach can return a significantly higher profit at reduced risk, and statistically outperformed the other trading strategies compareds in a number of different performance metrics
    corecore