3,417 research outputs found

    Data-driven design of intelligent wireless networks: an overview and tutorial

    Get PDF
    Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves

    On Distribution Asset Management: Development of Replacement Strategies

    Get PDF
    Presented at IEEE PES PowerAfrica 2007 Conference and Exposition. ©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Digital Object Identifier: 10.1109/PESAFR.2007.4498062The components of electricity networks are ageing. It is expected that within a horizon of 15 years, the performance will deteriorate significantly, while the costs for operating the networks will increase enormously. The main problem is that a significant part of the population of the assets is installed in the same period, resulting in a highly concentrated number of failures in a short time. The currently applied replacement strategy has to be revisited, in order to accommodate the effects of ageing assets: higher maintenance costs, high failure rates, and a steep increase of capital expenditure (CAPEX).The work reported here was supported by a large number of utilities in North America and the US Department of Energy under award number DE-FC02-04CH11237

    Learning workload behaviour models from monitored time-series for resource estimation towards data center optimization

    Get PDF
    In recent years there has been an extraordinary growth of the demand of Cloud Computing resources executed in Data Centers. Modern Data Centers are complex systems that need management. As distributed computing systems grow, and workloads benefit from such computing environments, the management of such systems increases in complexity. The complexity of resource usage and power consumption on cloud-based applications makes the understanding of application behavior through expert examination difficult. The difficulty increases when applications are seen as "black boxes", where only external monitoring can be retrieved. Furthermore, given the different amount of scenarios and applications, automation is required. To deal with such complexity, Machine Learning methods become crucial to facilitate tasks that can be automatically learned from data. Firstly, this thesis proposes an unsupervised learning technique to learn high level representations from workload traces. Such technique provides a fast methodology to characterize workloads as sequences of abstract phases. The learned phase representation is validated on a variety of datasets and used in an auto-scaling task where we show that it can be applied in a production environment, achieving better performance than other state-of-the-art techniques. Secondly, this thesis proposes a neural architecture, based on Sequence-to-Sequence models, that provides the expected resource usage of applications sharing hardware resources. The proposed technique provides resource managers the ability to predict resource usage over time as well as the completion time of the running applications. The technique provides lower error predicting usage when compared with other popular Machine Learning methods. Thirdly, this thesis proposes a technique for auto-tuning Big Data workloads from the available tunable parameters. The proposed technique gathers information from the logs of an application generating a feature descriptor that captures relevant information from the application to be tuned. Using this information we demonstrate that performance models can generalize up to a 34% better when compared with other state-of-the-art solutions. Moreover, the search time to find a suitable solution can be drastically reduced, with up to a 12x speedup and almost equal quality results as modern solutions. These results prove that modern learning algorithms, with the right feature information, provide powerful techniques to manage resource allocation for applications running in cloud environments. This thesis demonstrates that learning algorithms allow relevant optimizations in Data Center environments, where applications are externally monitored and careful resource management is paramount to efficiently use computing resources. We propose to demonstrate this thesis in three areas that orbit around resource management in server environmentsEls Centres de Dades (Data Centers) moderns són sistemes complexos que necessiten ser gestionats. A mesura que creixen els sistemes de computació distribuïda i les aplicacions es beneficien d’aquestes infraestructures, també n’augmenta la seva complexitat. La complexitat que implica gestionar recursos de còmput i d’energia en sistemes de computació al núvol fa difícil entendre el comportament de les aplicacions que s'executen de manera manual. Aquesta dificultat s’incrementa quan les aplicacions s'observen com a "caixes negres", on només es poden monitoritzar algunes mètriques de les caixes de manera externa. A més, degut a la gran varietat d’escenaris i aplicacions, és necessari automatitzar la gestió d'aquests recursos. Per afrontar-ne el repte, l'aprenentatge automàtic juga un paper cabdal que facilita aquestes tasques, que poden ser apreses automàticament en base a dades prèvies del sistema que es monitoritza. Aquesta tesi demostra que els algorismes d'aprenentatge poden aportar optimitzacions molt rellevants en la gestió de Centres de Dades, on les aplicacions són monitoritzades externament i la gestió dels recursos és de vital importància per a fer un ús eficient de la capacitat de còmput d'aquests sistemes. En primer lloc, aquesta tesi proposa emprar aprenentatge no supervisat per tal d’aprendre representacions d'alt nivell a partir de traces d'aplicacions. Aquesta tècnica ens proporciona una metodologia ràpida per a caracteritzar aplicacions vistes com a seqüències de fases abstractes. La representació apresa de fases és validada en diferents “datasets” i s'aplica a la gestió de tasques d'”auto-scaling”, on es conclou que pot ser aplicable en un medi de producció, aconseguint un millor rendiment que altres mètodes de vanguardia. En segon lloc, aquesta tesi proposa l'ús de xarxes neuronals, basades en arquitectures “Sequence-to-Sequence”, que proporcionen una estimació dels recursos usats per aplicacions que comparteixen recursos de hardware. La tècnica proposada facilita als gestors de recursos l’habilitat de predir l'ús de recursos a través del temps, així com també una estimació del temps de còmput de les aplicacions. Tanmateix, redueix l’error en l’estimació de recursos en comparació amb d’altres tècniques populars d'aprenentatge automàtic. Per acabar, aquesta tesi introdueix una tècnica per a fer “auto-tuning” dels “hyper-paràmetres” d'aplicacions de Big Data. Consisteix així en obtenir informació dels “logs” de les aplicacions, generant un vector de característiques que captura informació rellevant de les aplicacions que s'han de “tunejar”. Emprant doncs aquesta informació es valida que els ”Regresors” entrenats en la predicció del rendiment de les aplicacions són capaços de generalitzar fins a un 34% millor que d’altres “Regresors” de vanguàrdia. A més, el temps de cerca per a trobar una bona solució es pot reduir dràsticament, aconseguint un increment de millora de fins a 12 vegades més dels resultats de qualitat en contraposició a alternatives modernes. Aquests resultats posen de manifest que els algorismes moderns d'aprenentatge automàtic esdevenen tècniques molt potents per tal de gestionar l'assignació de recursos en aplicacions que s'executen al núvol.Arquitectura de computador

    Decision Support Based on Bio-PEPA Modeling and Decision Tree Induction: A New Approach, Applied to a Tuberculosis Case Study

    Get PDF
    The problem of selecting determinant features generating appropriate model structure is a challenge in epidemiological modelling. Disease spread is highly complex, and experts develop their understanding of its dynamic over years. There is an increasing variety and volume of epidemiological data which adds to the potential confusion. We propose here to make use of that data to better understand disease systems. Decision tree techniques have been extensively used to extract pertinent information and improve decision making. In this paper, we propose an innovative structured approach combining decision tree induction with Bio-PEPA computational modelling, and illustrate the approach through application to tuberculosis. By using decision tree induction, the enhanced Bio-PEPA model shows considerable improvement over the initial model with regard to the simulated results matching observed data. The key finding is that the developer expresses a realistic predictive model using relevant features, thus considering this approach as decision support, empowers the epidemiologist in his policy decision making

    Software Performance Engineering using Virtual Time Program Execution

    Get PDF
    In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of "what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract performance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method. The virtual time execution of I/O operations or multicore targets is also investigated. We explore these ideas using a Virtual EXecution (VEX) framework, which provides performance predictions for multi-threaded applications. The language-independent VEX core is driven by an instrumentation layer that notifies it of thread state changes and method profiling events; it is then up to VEX to control the progress of application threads in virtual time on top of the operating system scheduler. We also describe a Java Instrumentation Environment (JINE), demonstrating the challenges involved in virtual time execution at the JVM level. We evaluate the VEX/JINE tools by executing client-side Java benchmarks in virtual time and identifying the causes of deviations from observed real times. Our results show that VEX and JINE transparently provide predictions for the response time of unmodified applications with typically good accuracy (within 5-10%) and low simulation overheads (25-50% additional time). We conclude this thesis with a case study that shows how models and code can be integrated, thus illustrating our vision on how virtual time execution can support performance testing throughout the software lifecycle

    Arrival Metering Precision Study

    Get PDF
    This paper describes the background, method and results of the Arrival Metering Precision Study (AMPS) conducted in the Airspace Operations Laboratory at NASA Ames Research Center in May 2014. The simulation study measured delivery accuracy, flight efficiency, controller workload, and acceptability of time-based metering operations to a meter fix at the terminal area boundary for different resolution levels of metering delay times displayed to the air traffic controllers and different levels of airspeed information made available to the Time-Based Flow Management (TBFM) system computing the delay. The results show that the resolution of the delay countdown timer (DCT) on the controllers display has a significant impact on the delivery accuracy at the meter fix. Using the 10 seconds rounded and 1 minute rounded DCT resolutions resulted in more accurate delivery than 1 minute truncated and were preferred by the controllers. Using the speeds the controllers entered into the fourth line of the data tag to update the delay computation in TBFM in high and low altitude sectors increased air traffic control efficiency and reduced fuel burn for arriving aircraft during time based metering

    Analysis and Approximation of Optimal Co-Scheduling on CMP

    Get PDF
    In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling.;The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems.;The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling.;Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly

    Topics in high dimensional energy forecasting

    Get PDF
    The forecasting of future energy consumption and generation is now an essential part of power system operation. In networks with high renewable power penetration, forecasts are used to help maintain security of supply and to operate the system efficiently. Historically, uncertainties have always been present in the demand side of the network, they are now also present in the generation side with the growth of weather dependent renewables. Here, we focus on forecasting for wind energy applications at the day(s)- ahead scale. Most of the work developed is for power forecasting, although we also identify an emerging opportunity in access forecasting for offshore operations. Power forecasts are used by traders, power system operators, and asset owners to optimise decision making based on future generation. Several novel methodologies are presented based on post–processing Numerical Weather Predictions (NWP) with measured data, using modern statistical learning techniques; they are linked with the increasingly relevant challenge of dealing with high-dimensional data. The term ‘high-dimensional’ means different things to different people, depending on their background. To statisticians high dimensionaility occurs when the dimensions of the problem are greater than the number of observations, i.e. the classic p >> n problem, an example of which can be found in Chapter 7. In this work we take the more general view that a high dimensional dataset is one with a high number of attributes or features. In wind energy forecasting applications, this can occur in the input and/or output variable space. For example, multivariate forecasting of spatially distributed wind farms can be a potentially very-high dimensional problem, but so is feature engineering using ultra-high resolution NWP in this framework. Most of the work in this thesis is based on various forms of probabilistic forecasting Probabilistic forecasts are essential for risk-management, but also to risk-neutral participants in asymmetrically penalised electricity markets. Uncertainty is always present, it is merely hidden in deterministic, i.e. point, forecasts. This aspect of forecasting has been the subject of a concerted research effort over the last few years in the energy forecasting literature. However, we identify and address gaps in the literature related to dealing with high dimensional data in both the input and output side of the modelling chain. It is not necessarily given that increasing the resolution of the weather forecast increases the skill, and therefore reduces errors associated with the forecast. In fact and when regarding typical average scoring rules, they often perform worse than smoother forecasts from lower-resolution models due to spatial and/or temporal displacement errors. Here, we evaluate the potential of using ultra high resolution weather models for offshore power forecasting, using feature engineering and modern statistical learning techniques. Two methods for creating improved probabilistic wind power forecasts through the use of turbine-level data are proposed. Although standard resolution NWP data is used, high dimensionality is now present in the output variable space; the two methods scale by the number of turbines present in the wind farm, although to a different extent. A methodology for regime-switching multivariate wind power forecasting is also elaborated, with a case study demonstrated on 92 wind balancing mechanism units connected to the GB network. Finally, we look at an emerging topic in energy forecasting: offshore access forecasting. Improving access is a priority in the offshore wind sector, driven by the opportunity to increase revenues, reduce costs, and improve safety at operational wind farms. We describe a novel methodology for producing probabilistic forecasts of access conditions during crew transfers.The forecasting of future energy consumption and generation is now an essential part of power system operation. In networks with high renewable power penetration, forecasts are used to help maintain security of supply and to operate the system efficiently. Historically, uncertainties have always been present in the demand side of the network, they are now also present in the generation side with the growth of weather dependent renewables. Here, we focus on forecasting for wind energy applications at the day(s)- ahead scale. Most of the work developed is for power forecasting, although we also identify an emerging opportunity in access forecasting for offshore operations. Power forecasts are used by traders, power system operators, and asset owners to optimise decision making based on future generation. Several novel methodologies are presented based on post–processing Numerical Weather Predictions (NWP) with measured data, using modern statistical learning techniques; they are linked with the increasingly relevant challenge of dealing with high-dimensional data. The term ‘high-dimensional’ means different things to different people, depending on their background. To statisticians high dimensionaility occurs when the dimensions of the problem are greater than the number of observations, i.e. the classic p >> n problem, an example of which can be found in Chapter 7. In this work we take the more general view that a high dimensional dataset is one with a high number of attributes or features. In wind energy forecasting applications, this can occur in the input and/or output variable space. For example, multivariate forecasting of spatially distributed wind farms can be a potentially very-high dimensional problem, but so is feature engineering using ultra-high resolution NWP in this framework. Most of the work in this thesis is based on various forms of probabilistic forecasting Probabilistic forecasts are essential for risk-management, but also to risk-neutral participants in asymmetrically penalised electricity markets. Uncertainty is always present, it is merely hidden in deterministic, i.e. point, forecasts. This aspect of forecasting has been the subject of a concerted research effort over the last few years in the energy forecasting literature. However, we identify and address gaps in the literature related to dealing with high dimensional data in both the input and output side of the modelling chain. It is not necessarily given that increasing the resolution of the weather forecast increases the skill, and therefore reduces errors associated with the forecast. In fact and when regarding typical average scoring rules, they often perform worse than smoother forecasts from lower-resolution models due to spatial and/or temporal displacement errors. Here, we evaluate the potential of using ultra high resolution weather models for offshore power forecasting, using feature engineering and modern statistical learning techniques. Two methods for creating improved probabilistic wind power forecasts through the use of turbine-level data are proposed. Although standard resolution NWP data is used, high dimensionality is now present in the output variable space; the two methods scale by the number of turbines present in the wind farm, although to a different extent. A methodology for regime-switching multivariate wind power forecasting is also elaborated, with a case study demonstrated on 92 wind balancing mechanism units connected to the GB network. Finally, we look at an emerging topic in energy forecasting: offshore access forecasting. Improving access is a priority in the offshore wind sector, driven by the opportunity to increase revenues, reduce costs, and improve safety at operational wind farms. We describe a novel methodology for producing probabilistic forecasts of access conditions during crew transfers
    corecore