713,188 research outputs found

    Large Scale Data Streaming

    Get PDF
    Over the last few years, applications that require real-time processing of an huge amount of data are pushing the limits of traditional data processing infrastructure. Many applications in several domains such as telecommunications, large scale sensor networks, financial, online applications, computer network management, security and others, require real-time processing of continuos data flows: this kind of computation systems are usually called Data Stream Management Systems (DSMSs) or Stream Processing Engines (SPEs). Traditional Data Base Management Systems (DBMSs) implements the store than process paradigm; it means that: data require to be stored (persistently) and indexed before they could be processed and data processing is asynchronous in relation to their arrival. In DSMSs data streams are not stored but are rather processed on-the-fly using continuos queries: the query is constantly standing over the streaming data and results are continuously output. One of the most famous and used DSMS is called Storm. Storm is a powerful tool and has a simple programming model, but it does not provide a bulit-in implementation of stream-oriented operators: this is a strong limitation because the user is forced to write a case-specic implementation every time. The goal of the work described in this thesis is to build a distributed real-time computation system on top of Storm, called Enhanced Storm, that provides to the user built-in relation algebra and database-specic operators for streaming computation. Enhanced Storm maintains Storm fault-tolerance and scalability: in this way we supply to the user a generic, high performing and easy-to-use system. Enhanced Storm was developed at the Distributed System Laboratory(LSD) of the Universidad Politecnica de Madrid(UPM)[UPM]

    Performance assessment of real-time data management on wireless sensor networks

    Get PDF
    Technological advances in recent years have allowed the maturity of Wireless Sensor Networks (WSNs), which aim at performing environmental monitoring and data collection. This sort of network is composed of hundreds, thousands or probably even millions of tiny smart computers known as wireless sensor nodes, which may be battery powered, equipped with sensors, a radio transceiver, a Central Processing Unit (CPU) and some memory. However due to the small size and the requirements of low-cost nodes, these sensor node resources such as processing power, storage and especially energy are very limited. Once the sensors perform their measurements from the environment, the problem of data storing and querying arises. In fact, the sensors have restricted storage capacity and the on-going interaction between sensors and environment results huge amounts of data. Techniques for data storage and query in WSN can be based on either external storage or local storage. The external storage, called warehousing approach, is a centralized system on which the data gathered by the sensors are periodically sent to a central database server where user queries are processed. The local storage, in the other hand called distributed approach, exploits the capabilities of sensors calculation and the sensors act as local databases. The data is stored in a central database server and in the devices themselves, enabling one to query both. The WSNs are used in a wide variety of applications, which may perform certain operations on collected sensor data. However, for certain applications, such as real-time applications, the sensor data must closely reflect the current state of the targeted environment. However, the environment changes constantly and the data is collected in discreet moments of time. As such, the collected data has a temporal validity, and as time advances, it becomes less accurate, until it does not reflect the state of the environment any longer. Thus, these applications must query and analyze the data in a bounded time in order to make decisions and to react efficiently, such as industrial automation, aviation, sensors network, and so on. In this context, the design of efficient real-time data management solutions is necessary to deal with both time constraints and energy consumption. This thesis studies the real-time data management techniques for WSNs. It particularly it focuses on the study of the challenges in handling real-time data storage and query for WSNs and on the efficient real-time data management solutions for WSNs. First, the main specifications of real-time data management are identified and the available real-time data management solutions for WSNs in the literature are presented. Secondly, in order to provide an energy-efficient real-time data management solution, the techniques used to manage data and queries in WSNs based on the distributed paradigm are deeply studied. In fact, many research works argue that the distributed approach is the most energy-efficient way of managing data and queries in WSNs, instead of performing the warehousing. In addition, this approach can provide quasi real-time query processing because the most current data will be retrieved from the network. Thirdly, based on these two studies and considering the complexity of developing, testing, and debugging this kind of complex system, a model for a simulation framework of the real-time databases management on WSN that uses a distributed approach and its implementation are proposed. This will help to explore various solutions of real-time database techniques on WSNs before deployment for economizing money and time. Moreover, one may improve the proposed model by adding the simulation of protocols or place part of this simulator on another available simulator. For validating the model, a case study considering real-time constraints as well as energy constraints is discussed. Fourth, a new architecture that combines statistical modeling techniques with the distributed approach and a query processing algorithm to optimize the real-time user query processing are proposed. This combination allows performing a query processing algorithm based on admission control that uses the error tolerance and the probabilistic confidence interval as admission parameters. The experiments based on real world data sets as well as synthetic data sets demonstrate that the proposed solution optimizes the real-time query processing to save more energy while meeting low latency.Fundação para a Ciência e Tecnologi

    Distributed dispatchers for partially clairvoyant schedulers

    Get PDF
    This work focuses on the empirical evaluation of distributed dispatching strategies on shared and distributed memory architectures for hard real-time systems. The dispatching model accommodates process parameter variability and analyzes the effect of variable execution times.;Hard real-time systems are modeled in the E-T-C scheduling framework and dispatched if a valid schedule exists. We examine the dispatchability of Partially Clairvoyant schedules of different sizes and varying deadlines under reasonable assumptions. The effect of scaling up the number of processors used by the dispatcher is also studied. The results validate the superiority of the distributed strategies over sequential dispatching and scalability of the distributed strategies. Certain system limitations which lead to Loss of Dispatchability in the experiments were pointed out.;The model finds applications in diverse areas like safety critical systems, robotics and machine control, real-time data management, and this approach is targeted at powering up the controllers

    Cross-Fire : a grid platform to integrate geo-referenced web services for real-time risk management

    Get PDF
    Fire propagation simulation tools are useful at different levels of forest fire management. From prescribed fire planning to fuel hazard assessment or to the development of fire suppression strategies on wildfires or even training activities. Nevertheless, real time use of such tools is still very limited among the operational authorities for several reasons: lack of good real time data, lack of training or even lack of confidence on the capabilities of actual systems, among others. Wildfire management is a relevant Civil Protection (CP) activity that involves many different and autonomous actors, from public bodies to research centres and should some how reach the general public as an information and alert system. It requires a fast and reliable risk management support system, with real-time or near real-time availability of critical geo-referenced data and settings-based forecasts for fire spreading. CP applications require a strict integration of human and physical resources that must be shared in a coordinated and effective way and be available for the whole emergency procedure. The GRID and Virtual Organizations (VO) enable such integration by providing the coordination and the sharing of the available interconnected resources (computing, storage, communication, sensors and actuators) geographically scattered across national borders. On the another hand, OGC (Open Geospatial Consortium) based geo-web services are being adopted worldwide, as the technology to support the development of complex distributed applications over grid platforms, to deal with data from many different sources, including meteorological stations and satellites. Recent work clearly showed the advantage of the OGC proposals for open standards for geospatial interchange formats, over past legacy formats and applications.Fundação para a Ciência e a Tecnologia (FCT

    Analysis of Call Data Record (CDR) using Hadoop Cluster

    Get PDF
    The management of big data is the most important issue for this decade since the real world applications are generating very large scale of data in petabytes and zetabytes scale. Most popular solution of big data management is a system based on Hadoop Distributed File System. However, implementing enterprise level solution is a challenge because of the production of such huge data. In this project, we employ Hadoop cluster to the telecommunication data since it produces a huge amount of log data regarding to customer calls as well as network equipment. To emphasize more realistic solution we ponder on call data details for our Big Data application. In this project, we have acquired real-time Call Data Record (CDR) data for our implementation from telco operator named Banglalink who is operating 30 million users in Bangladesh. To narrow down the scope, CDR data analytics using Hadoop cluster can result top callers to promote customer experience. This implement can also help Banglalink to implement similar application for backup data warehouse using Hadoop cluster for CDR analytics

    Different aspects of workflow scheduling in large-scale distributed systems

    Get PDF
    As large-scale distributed systems gain momentum, the scheduling of workflow applications with multiple requirements in such computing platforms has become a crucial area of research. In this paper, we investigate the workflow scheduling problem in large-scale distributed systems, from the Quality of Service (QoS) and data locality perspectives. We present a scheduling approach, considering two models of synchronization for the tasks in a workflow application: (a) communication through the network and (b) communication through temporary files. Specifically, we investigate via simulation the performance of a heterogeneous distributed system, where multiple soft real-time workflow applications arrive dynamically. The applications are scheduled under various tardiness bounds, taking into account the communication cost in the first case study and the I/O cost and data locality in the second.The work presented in this paper has been partially supported by EU, under the COST program Action IC1305, “Network for Sustainable Ultrascale Computing (NESUS)”, and by the Ministerio de Economía y Competitividad, Spain, under the project TIN2013-41350-P, “Scalable Data Management Techniques for High-End Computing Systems”

    Review of intelligent sprinkler irrigation technologies for remote autonomous system

    Get PDF
    Changing of environmental conditions and shortage of water demands a system that can manage irrigation efficiently. Autonomous irrigation systems are developed to optimize water use for agricultural crops. In dry areas or in case of inadequate rainfall, irrigation becomes difficult. So, it needs to be automated for proper yield and handled remotely for farmer safety. The aim of this study is to review the needs of soil moisture sensors in irrigation, sensor technology and their applications in irrigation scheduling and, discussing prospects. The review further discusses the literature of sensors remotely communicating with self-propelled sprinkler irrigation systems, distributed wireless sensor networks, sensors and integrated data management schemes and autonomous sprinkler control options. On board and field-distributed sensors can collect data necessary for real-time irrigation management decisions and transmit the information directly or through wireless networks to the main control panel or base computer. Communication systems such as cell phones, satellite radios, and internet-based systems are also available allowing the operator to query the main control panel or base computer from any location at any time. Selection of the communication system for remote access depends on local and regional topography and cost. Traditional irrigation systems may provide unnecessary irrigation to one part of a field while leading to a lack of irrigation in other parts. New sensors or remotely sensing capabilities are required to collect real time data for crop growth status and other parameters pertaining to weather, crop and soil to support intelligent and efficient irrigation management systems for agricultural processes. Further development of wireless sensor applications in agriculture is also necessary for increasing efficiency, productivity and profitability of farming operations
    corecore