94 research outputs found

    A survey on the development status and application prospects of knowledge graph in smart grids

    Full text link
    With the advent of the electric power big data era, semantic interoperability and interconnection of power data have received extensive attention. Knowledge graph technology is a new method describing the complex relationships between concepts and entities in the objective world, which is widely concerned because of its robust knowledge inference ability. Especially with the proliferation of measurement devices and exponential growth of electric power data empowers, electric power knowledge graph provides new opportunities to solve the contradictions between the massive power resources and the continuously increasing demands for intelligent applications. In an attempt to fulfil the potential of knowledge graph and deal with the various challenges faced, as well as to obtain insights to achieve business applications of smart grids, this work first presents a holistic study of knowledge-driven intelligent application integration. Specifically, a detailed overview of electric power knowledge mining is provided. Then, the overview of the knowledge graph in smart grids is introduced. Moreover, the architecture of the big knowledge graph platform for smart grids and critical technologies are described. Furthermore, this paper comprehensively elaborates on the application prospects leveraged by knowledge graph oriented to smart grids, power consumer service, decision-making in dispatching, and operation and maintenance of power equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio

    Scheduling in Mapreduce Clusters

    Get PDF
    MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and the fault-tolerance feature of the framework make it very popular in Big Data processing. As MapReduce clusters get popular, their scheduling becomes increasingly important. On one hand, many MapReduce applications have high performance requirements, for example, on response time and/or throughput. On the other hand, with the increasing size of MapReduce clusters, the energy-efficient scheduling of MapReduce clusters becomes inevitable. These scheduling challenges, however, have not been systematically studied. The objective of this dissertation is to provide MapReduce applications with low cost and energy consumption through the development of scheduling theory and algorithms, energy models, and energy-aware resource management. In particular, we will investigate energy-efficient scheduling in hybrid CPU-GPU MapReduce clusters. This research work is expected to have a breakthrough in Big Data processing, particularly in providing green computing to Big Data applications such as social network analysis, medical care data mining, and financial fraud detection. The tools we propose to develop are expected to increase utilization and reduce energy consumption for MapReduce clusters. In this PhD dissertation, we propose to address the aforementioned challenges by investigating and developing 1) a match-making scheduling algorithm for improving the data locality of Map- Reduce applications, 2) a real-time scheduling algorithm for heterogeneous Map- Reduce clusters, and 3) an energy-efficient scheduler for hybrid CPU-GPU Map- Reduce cluster. Advisers: Ying Lu and David Swanso

    Stateful data-parallel processing

    Get PDF
    Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial—it brings domain-specific knowledge from broad fields—but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs—with explicit state that they can update, such as matrices in machine learning algorithms—and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state—spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.Open Acces

    Traffic and task allocation in networks and the cloud

    Get PDF
    Communication services such as telephony, broadband and TV are increasingly migrating into Internet Protocol(IP) based networks because of the consolidation of telephone and data networks. Meanwhile, the increasingly wide application of Cloud Computing enables the accommodation of tens of thousands of applications from the general public or enterprise users which make use of Cloud services on-demand through IP networks such as the Internet. Real-Time services over IP (RTIP) have also been increasingly significant due to the convergence of network services, and the real-time needs of the Internet of Things (IoT) will strengthen this trend. Such Real-Time applications have strict Quality of Service (QoS) constraints, posing a major challenge for IP networks. The Cognitive Packet Network (CPN) has been designed as a QoS-driven protocol that addresses user-oriented QoS demands by adaptively routing packets based on online sensing and measurement. Thus in this thesis we first describe our design for a novel ``Real-Time (RT) traffic over CPN'' protocol which uses QoS goals that match the needs of voice packet delivery in the presence of other background traffic under varied traffic conditions; we present its experimental evaluation via measurements of key QoS metrics such as packet delay, delay variation (jitter) and packet loss ratio. Pursuing our investigation of packet routing in the Internet, we then propose a novel Big Data and Machine Learning approach for real-time Internet scale Route Optimisation based on Quality-of-Service using an overlay network, and evaluate is performance. Based on the collection of data sampled each 22 minutes over a large number of source-destinations pairs, we observe that intercontinental Internet Protocol (IP) paths are far from optimal with respect to metrics such as end-to-end round-trip delay. On the other hand, our machine learning based overlay network routing scheme exploits large scale data collected from communicating node pairs to select overlay paths, while it uses IP between neighbouring overlay nodes. We report measurements over a week long experiment with several million data points shows substantially better end-to-end QoS than is observed with pure IP routing. Pursuing the machine learning approach, we then address the challenging problem of dispatching incoming tasks to servers in Cloud systems so as to offer the best QoS and reliable job execution; an experimental system (the Task Allocation Platform) that we have developed is presented and used to compare several task allocation schemes, including a model driven algorithm, a reinforcement learning based scheme, and a ``sensible’’ allocation algorithm that assigns tasks to sub-systems that are observed to provide lower response time. These schemes are compared via measurements both among themselves and against a standard round-robin scheduler, with two architectures (with homogenous and heterogenous hosts having different processing capacities) and the conditions under which the different schemes offer better QoS are discussed. Since Cloud systems include both locally based servers at user premises and remote servers and multiple Clouds that can be reached over the Internet, we also describe a smart distributed system that combines local and remote Cloud facilities, allocating tasks dynamically to the service that offers the best overall QoS, and it includes a routing overlay which minimizes network delay for data transfer between Clouds. Internet-scale experiments that we report exhibit the effectiveness of our approach in adaptively distributing workload across multiple Clouds.Open Acces

    Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

    Get PDF
    Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

    QoE on media deliveriy in 5G environments

    Get PDF
    231 p.5G expandirá las redes móviles con un mayor ancho de banda, menor latencia y la capacidad de proveer conectividad de forma masiva y sin fallos. Los usuarios de servicios multimedia esperan una experiencia de reproducción multimedia fluida que se adapte de forma dinámica a los intereses del usuario y a su contexto de movilidad. Sin embargo, la red, adoptando una posición neutral, no ayuda a fortalecer los parámetros que inciden en la calidad de experiencia. En consecuencia, las soluciones diseñadas para realizar un envío de tráfico multimedia de forma dinámica y eficiente cobran un especial interés. Para mejorar la calidad de la experiencia de servicios multimedia en entornos 5G la investigación llevada a cabo en esta tesis ha diseñado un sistema múltiple, basado en cuatro contribuciones.El primer mecanismo, SaW, crea una granja elástica de recursos de computación que ejecutan tareas de análisis multimedia. Los resultados confirman la competitividad de este enfoque respecto a granjas de servidores. El segundo mecanismo, LAMB-DASH, elige la calidad en el reproductor multimedia con un diseño que requiere una baja complejidad de procesamiento. Las pruebas concluyen su habilidad para mejorar la estabilidad, consistencia y uniformidad de la calidad de experiencia entre los clientes que comparten una celda de red. El tercer mecanismo, MEC4FAIR, explota las capacidades 5G de analizar métricas del envío de los diferentes flujos. Los resultados muestran cómo habilita al servicio a coordinar a los diferentes clientes en la celda para mejorar la calidad del servicio. El cuarto mecanismo, CogNet, sirve para provisionar recursos de red y configurar una topología capaz de conmutar una demanda estimada y garantizar unas cotas de calidad del servicio. En este caso, los resultados arrojan una mayor precisión cuando la demanda de un servicio es mayor

    Big Data Security (Volume 3)

    Get PDF
    After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology

    Secure data sharing in cloud computing: a comprehensive review

    Get PDF
    Cloud Computing is an emerging technology, which relies on sharing computing resources. Sharing of data in the group is not secure as the cloud provider cannot be trusted. The fundamental difficulties in distributed computing of cloud suppliers is Data Security, Sharing, Resource scheduling and Energy consumption. Key-Aggregate cryptosystem used to secure private/public data in the cloud. This key is consistent size aggregate for adaptable decisions of ciphertext in cloud storage. Virtual Machines (VMs) provisioning is effectively empowered the cloud suppliers to effectively use their accessible resources and get higher benefits. The most effective method to share information resources among the individuals from the group in distributed storage is secure, flexible and efficient. Any data stored in different cloud data centers are corrupted, recovery using regenerative coding. Security is provided many techniques like Forward security, backward security, Key-Aggregate cryptosystem, Encryption and Re-encryption etc. The energy is reduced using Energy-Efficient Virtual Machines Scheduling in Multi-Tenant Data Centers

    Secure Data Sharing in Cloud Computing: A Comprehensive Review

    Get PDF
    Cloud Computing is an emerging technology, which relies on sharing computing resources. Sharing of data in the group is not secure as the cloud provider cannot be trusted. The fundamental difficulties in distributed computing of cloud suppliers is Data Security, Sharing, Resource scheduling and Energy consumption. Key-Aggregate cryptosystem used to secure private/public data in the cloud. This key is consistent size aggregate for adaptable decisions of ciphertext in cloud storage. Virtual Machines (VMs) provisioning is effectively empowered the cloud suppliers to effectively use their accessible resources and get higher benefits. The most effective method to share information resources among the individuals from the group in distributed storage is secure, flexible and efficient. Any data stored in different cloud data centers are corrupted, recovery using regenerative coding. Security is provided many techniques like Forward security, backward security, Key-Aggregate cryptosystem, Encryption and Re-encryption etc. The energy is reduced using Energy-Efficient Virtual Machines Scheduling in Multi-Tenant Data Centers
    corecore