381 research outputs found

    Bayesian Data Sketching for Varying Coefficient Regression Models

    Get PDF
    Varying coefficient models are popular tools in estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We introduce Bayesian data sketching for varying coefficient models to obviate computational challenges presented by large sample sizes. To address the challenges of analyzing large data, we compress functional response vector and predictor matrix by a random linear transformation to achieve dimension reduction and conduct inference on the compressed data. Our approach distinguishes itself from several existing methods for analyzing large functional data in that it requires neither the development of new models or algorithms nor any specialized computational hardware while delivering fully model-based Bayesian inference. Well-established methods and algorithms for varying coefficient regression models can be applied to the compressed data. We establish posterior contraction rates for estimating the varying coefficients and predicting the outcome at new locations under the randomly compressed data model. We use simulation experiments and conduct a spatially varying coefficient analysis of remote sensed vegetation data to empirically illustrate the inferential and computational efficiency of our approach

    On the Construction of Wavelets and Multiwavelets for General Dilation Matrices

    Get PDF
    This thesis is concerned with the construction of (pre-)wavelets and (pre-)multiwavelets. In particular, we identify minimal requirements such that a construction is still possible. To this end, we weaken the assumptions made in the definition of the multiresolution analysis. Based on this generalized multiresolution analysis, we develop construction procedures for compactly supported (pre-)wavelets and for compactly supported (pre-)multiwavelets. These construction procedures involve general dilation matrices which allow us to reduce the number of mother wavelets to a minimum. To illustrate the theory developed in this work, we choose exponential box splines as generators for the generalized multiresolution analysis and construct compactly supported (pre-)wavelets and (pre-)multiwavelets

    Toward Building an Intelligent and Secure Network: An Internet Traffic Forecasting Perspective

    Get PDF
    Internet traffic forecast is a crucial component for the proactive management of self-organizing networks (SON) to ensure better Quality of Service (QoS) and Quality of Experience (QoE). Given the volatile and random nature of traffic data, this forecasting influences strategic development and investment decisions in the Internet Service Provider (ISP) industry. Modern machine learning algorithms have shown potential in dealing with complex Internet traffic prediction tasks, yet challenges persist. This thesis systematically explores these issues over five empirical studies conducted in the past three years, focusing on four key research questions: How do outlier data samples impact prediction accuracy for both short-term and long-term forecasting? How can a denoising mechanism enhance prediction accuracy? How can robust machine learning models be built with limited data? How can out-of-distribution traffic data be used to improve the generalizability of prediction models? Based on extensive experiments, we propose a novel traffic forecast/prediction framework and associated models that integrate outlier management and noise reduction strategies, outperforming traditional machine learning models. Additionally, we suggest a transfer learning-based framework combined with a data augmentation technique to provide robust solutions with smaller datasets. Lastly, we propose a hybrid model with signal decomposition techniques to enhance model generalization for out-of-distribution data samples. We also brought the issue of cyber threats as part of our forecast research, acknowledging their substantial influence on traffic unpredictability and forecasting challenges. Our thesis presents a detailed exploration of cyber-attack detection, employing methods that have been validated using multiple benchmark datasets. Initially, we incorporated ensemble feature selection with ensemble classification to improve DDoS (Distributed Denial-of-Service) attack detection accuracy with minimal false alarms. Our research further introduces a stacking ensemble framework for classifying diverse forms of cyber-attacks. Proceeding further, we proposed a weighted voting mechanism for Android malware detection to secure Mobile Cyber-Physical Systems, which integrates the mobility of various smart devices to exchange information between physical and cyber systems. Lastly, we employed Generative Adversarial Networks for generating flow-based DDoS attacks in Internet of Things environments. By considering the impact of cyber-attacks on traffic volume and their challenges to traffic prediction, our research attempts to bridge the gap between traffic forecasting and cyber security, enhancing proactive management of networks and contributing to resilient and secure internet infrastructure

    Machine Learning Methods with Noisy, Incomplete or Small Datasets

    Get PDF
    In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios

    Glitch Estimation and Removal Using Adaptive Spline Fitting and Wavelet Shrinkage on the Gravitational Wave Data

    Get PDF
    The false alarm rate and reduced sensitivity of searches for astrophysical signals are caused by transient signals of earthly origin, or glitches, in gravitational wave strain data from groundbased detectors. The greater number of observable astrophysical signals will increase the likelihood of glitch overlaps and exacerbate their negative impact for future detectors with higher sensitivities. The wide morphological diversity and unpredictable waveforms of glitches, and with the vast majority of cases lacking supplemental data present the main obstacles to their mitigation. Thus, nonparametric glitch mitigation techniques are required, which should operate for a wide range of glitches and, in the case of overlaps, have little impact on astrophysical signals. The arrangement of free knots is improved to estimate both smooth and non-smooth curves, and wavelet-based shrinkage is added for specific types of glitches in our method for glitch estimation and removal utilizing adaptive spline curve fitting. The effectiveness of the technique is evaluated for seven different kinds of LIGO detector glitch types. In the specific instance of a loud glitch in data from LIGO, Livingston that coincides with the event GW170817, the glitch is evaluated and eliminated without adversely altering the gravitational wave signal. For injected signals overlapped with other kinds of glitches, similar results are observed

    Construction of interpolating and orthonormal multigenerators and multiwavelets on the interval

    Get PDF
    In den letzten Jahren haben sich Wavelets zu einem hochwertigen Hilfsmittel in der angewandten Mathematik entwickelt. Eine Waveletbasis ist im Allgemeinen ein System von Funktionen, das durch die Skalierung, Translation und Dilatation einer endlichen Menge von Funktionen, den sogenannten Mutterwavelets, entsteht. Wavelets wurden sehr erfolgreich in der digitalen Signal- und Bildanalyse, z. B. zur Datenkompression verwendet. Ein weiteres wichtiges Anwendungsfeld ist die Analyse und die numerische Behandlung von Operatorgleichungen. Insbesondere ist es gelungen, adaptive numerische Algorithmen basierend auf Wavelets für eine riesige Klasse von Operatorgleichungen, einschließlich Operatoren mit negativer Ordnung, zu entwickeln. Der Erfolg der Wavelet- Algorithmen ergibt sich als Konsequenz der folgenden Fakten: - Gewichtete Folgennormen von Wavelet-Expansionskoeffizienten sind in einem bestimmten Bereich (abhängig von der Regularität der Wavelets) äquivalent zu Glättungsnormen wie Besov- oder Sobolev-Normen. - Für eine breite Klasse von Operatoren ist ihre Darstellung in Wavelet-Koordinaten nahezu diagonal. - Die verschwindenden Momente von Wavelets entfernen den glatten Teil einer Funktion und führen zu sehr effizienten Komprimierungsstrategien. Diese Fakten können z. B. verwendet werden, um adaptive numerische Strategien mit optimaler Konvergenzgeschwindigkeit zu konstruieren, in dem Sinne, dass diese Algorithmen die Konvergenzordnung der besten N-Term-Approximationsschemata realisieren. Die maßgeblichen Ergebnisse lassen sich für lineare, symmetrische, elliptische Operatorgleichungen erzielen. Es existiert auch eine Verallgemeinerung für nichtlineare elliptische Gleichungen. Hier verbirgt sich jedoch eine ernste Schwierigkeit: Jeder numerische Algorithmus für diese Gleichungen erfordert die Auswertung eines nichtlinearen Funktionals, welches auf eine Wavelet-Reihe angewendet wird. Obwohl einige sehr ausgefeilte Algorithmen existieren, erweisen sie sich als ziemlich langsam in der Praxis. In neueren Studien wurde gezeigt, dass dieses Problem durch sogenannte Interpolanten verbessert werden kann. Dabei stellt sich heraus, dass die meisten bekannten Basen der Interpolanten keine stabilen Basen in L2[a,b] bilden. In der vorliegenden Arbeit leisten wir einen wesentlichen Beitrag zu diesem Problem und konstruieren neue Familien von Interpolanten auf beschränkten Gebieten, die nicht nur interpolierend, sondern auch stabil in L2[a,b] sind. Da dies mit nur einem Generator schwer (oder vielleicht sogar unmöglich) zu erreichen ist, werden wir mit Multigeneratoren und Multiwavelets arbeiten.In recent years, wavelets have become a very powerful tools in applied mathematics. In general, a wavelet basis is a system of functions that is generated by scaling, translating and dilating a finite set of functions, the so-called mother wavelets. Wavelets have been very successfully applied in image/signal analysis, e.g., for denoising and compression purposes. Another important field of applications is the analysis and the numerical treatment of operator equations. In particular, it has been possible to design adaptive numerical algorithms based on wavelets for a huge class of operator equations including operators of negative order. The success of wavelet algorithms is an ultimative consequence of the following facts: - Weighted sequence norms of wavelet expansion coefficients are equivalent in a certain range (depending on the regularity of the wavelets) to smoothness norms such as Besov or Sobolev norms. - For a wide class of operators their representation in wavelet coordinates is nearly diagonal. -The vanishing moments of wavelets remove the smooth part of a function. These facts can, e.g., be used to construct adaptive numerical strategies that are guaranteed to converge with optimal order, in the sense that these algorithms realize the convergence order of best N-term approximation schemes. The most far-reaching results have been obtained for linear, symmetric elliptic operator equations. Generalization to nonlinear elliptic equations also exist. However, then one is faced with a serious bottleneck: every numerical algorithm for these equations requires the evaluation of a nonlinear functional applied to a wavelet series. Although some very sophisticated algorithms exist, they turn out to perform quite slowly in practice. In recent studies, it has been shown that this problem can be ameliorated by means of so called interpolants. However, then the problem occurs that most of the known bases of interpolants do not form stable bases in L2[a,b]. In this PhD project, we intend to provide a significant contribution to this problem. We want to construct new families of interpolants on domains that are not only interpolating, but also stable in L2[a,b]or even orthogonal. Since this is hard to achieve (or maybe even impossible) with just one generator, we worked with multigenerators and multiwavelets

    Improving the Efficiency of Variationally Enhanced Sampling with Wavelet-Based Bias Potentials

    Get PDF

    xxAI - Beyond Explainable AI

    Get PDF
    This is an open access book. Statistical machine learning (ML) has triggered a renaissance of artificial intelligence (AI). While the most successful ML models, including Deep Neural Networks (DNN), have developed better predictivity, they have become increasingly complex, at the expense of human interpretability (correlation vs. causality). The field of explainable AI (xAI) has emerged with the goal of creating tools and models that are both predictive and interpretable and understandable for humans. Explainable AI is receiving huge interest in the machine learning and AI research communities, across academia, industry, and government, and there is now an excellent opportunity to push towards successful explainable AI applications. This volume will help the research community to accelerate this process, to promote a more systematic use of explainable AI to improve models in diverse applications, and ultimately to better understand how current explainable AI methods need to be improved and what kind of theory of explainable AI is needed. After overviews of current methods and challenges, the editors include chapters that describe new developments in explainable AI. The contributions are from leading researchers in the field, drawn from both academia and industry, and many of the chapters take a clear interdisciplinary approach to problem-solving. The concepts discussed include explainability, causability, and AI interfaces with humans, and the applications include image processing, natural language, law, fairness, and climate science
    corecore