311 research outputs found

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently, there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications in various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications for various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    A methodology for the characterization of business-to-consumer E-commerce.

    Get PDF
    This thesis concerns the field of business-to-consumer electronic commerce. Research on Internet consumer behaviour is still in its infancy, and a quantitative framework to characterize user profiles for e-commerce is not yet established. This study proposes a quantitative framework that uses latent variable analysis to identify the underlying traits of Internet users' opinions. Predictive models are then built to select the factors that are most predictive of the propensity to buy on-line and classify Internet users according to that propensity. This is followed by a segmentation of the online market based on that selection of factors and the deployment of segment-specific graphical models to map the interactions between factors and between these and the propensity to buy online. The novel aspects of this work can be summarised as follows: the definition of a fully quantitative methodology for the segmentation and analysis of large data sets; the description of the latent dimensions underlying consumers' opinions using quantitative methods; the definition of a principled method of marginalisation to the empirical prior, for Bayesian neural networks, to deal with the use of class-unbalanced data sets; a study of the Generative Topographic Mapping (GTM) as a principled method for market segmentation, including some developments of the model, namely: a) an entropy-based measure to compare the class-discriminatory capabilities of maps of equal dimensions; b) a Cumulative Responsibility measure to provide information on the mapping distortion and define data clusters; c) Selective Smoothing as an extended model for the regularization of the GTM training

    Machine assisted quantitative seismic interpretation

    Get PDF
    During the past decades, the size of 3D seismic data volumes and the number of seismic attributes have increased to the extent that it is difficult, if not impossible, for interpreters to examine every seismic line and time slice. Reducing the labor associated with seismic interpretation while increasing the reliability of the interpreted result has been an on going challenge that becomes increasingly more difficult with the amount of data available to interpreters. To address this issue, geoscientists often adopt concepts and algorithms from fields such as image processing, signal processing, and statistics, with much of the focus on auto-picking and automatic seismic facies analysis. I focus my research on adapting and improving machine learning and pattern recognition methods for automatic seismic facies analysis. Being an emerging and rapid developing topic, there is an endless list of machine learning and pattern recognition techniques available to scientific researchers. More often, the obstacle that prevents geoscientists from using such techniques is the “black box” nature of such techniques. Interpreters may not know the assumptions and limitations of a given technique, resulting in subsequent choices that may be suboptimum. In this dissertation, I provide a review of the more commonly used seismic facies analysis algorithms. My goal is to assist seismic interpreters in choosing the best method for a specific problem. Moreover, because all these methods are just generic mathematic tools that solve highly abstract, analytical problems, we have to tailor them to fit seismic interpretation problems. Self-organizing map (SOM) is a popular unsupervised learning technique that interpreters use to explore seismic facies using multiple seismic attributes as input. It projects the high dimensional seismic attribute data onto a lower dimensional (usually 2D) space in which interpreters are able to identify clusters of seismic facies. In this dissertation, using SOM as an example, I provide three improvements on the traditional algorithm, in order to present the information residing in the seismic attributes more adequately, and therefore reducing the uncertainly in the generated seismic facies map

    Advanced and novel modeling techniques for simulation, optimization and monitoring chemical engineering tasks with refinery and petrochemical unit applications

    Get PDF
    Engineers predict, optimize, and monitor processes to improve safety and profitability. Models automate these tasks and determine precise solutions. This research studies and applies advanced and novel modeling techniques to automate and aid engineering decision-making. Advancements in computational ability have improved modeling software’s ability to mimic industrial problems. Simulations are increasingly used to explore new operating regimes and design new processes. In this work, we present a methodology for creating structured mathematical models, useful tips to simplify models, and a novel repair method to improve convergence by populating quality initial conditions for the simulation’s solver. A crude oil refinery application is presented including simulation, simplification tips, and the repair strategy implementation. A crude oil scheduling problem is also presented which can be integrated with production unit models. Recently, stochastic global optimization (SGO) has shown to have success of finding global optima to complex nonlinear processes. When performing SGO on simulations, model convergence can become an issue. The computational load can be decreased by 1) simplifying the model and 2) finding a synergy between the model solver repair strategy and optimization routine by using the initial conditions formulated as points to perturb the neighborhood being searched. Here, a simplifying technique to merging the crude oil scheduling problem and the vertically integrated online refinery production optimization is demonstrated. To optimize the refinery production a stochastic global optimization technique is employed. Process monitoring has been vastly enhanced through a data-driven modeling technique Principle Component Analysis. As opposed to first-principle models, which make assumptions about the structure of the model describing the process, data-driven techniques make no assumptions about the underlying relationships. Data-driven techniques search for a projection that displays data into a space easier to analyze. Feature extraction techniques, commonly dimensionality reduction techniques, have been explored fervidly to better capture nonlinear relationships. These techniques can extend data-driven modeling’s process-monitoring use to nonlinear processes. Here, we employ a novel nonlinear process-monitoring scheme, which utilizes Self-Organizing Maps. The novel techniques and implementation methodology are applied and implemented to a publically studied Tennessee Eastman Process and an industrial polymerization unit

    Data exploration process based on the self-organizing map

    Get PDF
    With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

    Generative adversarial networks review in earthquake-related engineering fields

    Get PDF
    Within seismology, geology, civil and structural engineering, deep learning (DL), especially via generative adversarial networks (GANs), represents an innovative, engaging, and advantageous way to generate reliable synthetic data that represent actual samples' characteristics, providing a handy data augmentation tool. Indeed, in many practical applications, obtaining a significant number of high-quality information is demanding. Data augmentation is generally based on artificial intelligence (AI) and machine learning data-driven models. The DL GAN-based data augmentation approach for generating synthetic seismic signals revolutionized the current data augmentation paradigm. This study delivers a critical state-of-art review, explaining recent research into AI-based GAN synthetic generation of ground motion signals or seismic events, and also with a comprehensive insight into seismic-related geophysical studies. This study may be relevant, especially for the earth and planetary science, geology and seismology, oil and gas exploration, and on the other hand for assessing the seismic response of buildings and infrastructures, seismic detection tasks, and general structural and civil engineering applications. Furthermore, highlighting the strengths and limitations of the current studies on adversarial learning applied to seismology may help to guide research efforts in the next future toward the most promising directions

    Optimization of deepwater channel seismic reservoir characterization using seismic attributes and machine learning

    Get PDF
    Accurate subsurface reservoir mapping is essential for resource exploration. In uncalibrated basins, seismic data, often limited by resolution, frequency, quality, etc., algorithms become the primary information source due to the unavailability of well logs and core data. Seismic attributes, while integral for understanding subsurface structures, visually limit interpreters to working with only three of them at once. Conversely, machine learning, though capable of handling numerous attributes, is often seen as inscrutable "black boxes," complicating the interpretation of their predictions and uncertainties. To address these challenges, a comprehensive approach was undertaken, involving a detailed 3D model from Chilean Patagonia's Tres Pasos Formation with synthetic seismic data. The synthetic data served as a benchmark for conducting sensitivity analysis on seismic attributes, offering insights for parameter and workflow optimization. The study also evaluated the uncertainty in unsupervised and supervised machine learning for deepwater facies prediction through qualitative and quantitative assessments. Study key findings include: 1) High-frequency data and smaller analysis windows provide clearer channel images, while low-frequency data and larger windows create composite appearances, particularly in small stratigraphic features. 2) GTM and SOM exhibited similar performance, with error rates around 2% for predominant facies but significantly higher for individual channel-related facies. This suggests that unbalanced data results in higher errors for minor facies and that a reduction in clusters or a simplified model may better represent reservoir versus non-reservoir facies. 3) Resolution and data distribution significantly impact predictability, leading to non-uniqueness in cluster generation, which applies to supervised models as well. Strengthening the argument that understanding the limitations of seismic data is crucial. 4) Uncertainty in seismic facies prediction is influenced by factors such as training attribute selection, original facies proportions (e.g., imbalanced data, variable errors, and data quality). While optimized random forests achieved an 80% accuracy rate, validation accuracy was lower, emphasizing the need to address uncertainties and their role in interpretation. Overall, the utilization of ground truth seismic data derived from outcrops offers valuable insights into the strengths and challenges of machine learning in subsurface applications, where accurate predictions are critical for decision-making and safety in the energy sector

    Recent Advances in Image Restoration with Applications to Real World Problems

    Get PDF
    In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included
    • …
    corecore