47 research outputs found

    Performance assessment of real-time data management on wireless sensor networks

    Get PDF
    Technological advances in recent years have allowed the maturity of Wireless Sensor Networks (WSNs), which aim at performing environmental monitoring and data collection. This sort of network is composed of hundreds, thousands or probably even millions of tiny smart computers known as wireless sensor nodes, which may be battery powered, equipped with sensors, a radio transceiver, a Central Processing Unit (CPU) and some memory. However due to the small size and the requirements of low-cost nodes, these sensor node resources such as processing power, storage and especially energy are very limited. Once the sensors perform their measurements from the environment, the problem of data storing and querying arises. In fact, the sensors have restricted storage capacity and the on-going interaction between sensors and environment results huge amounts of data. Techniques for data storage and query in WSN can be based on either external storage or local storage. The external storage, called warehousing approach, is a centralized system on which the data gathered by the sensors are periodically sent to a central database server where user queries are processed. The local storage, in the other hand called distributed approach, exploits the capabilities of sensors calculation and the sensors act as local databases. The data is stored in a central database server and in the devices themselves, enabling one to query both. The WSNs are used in a wide variety of applications, which may perform certain operations on collected sensor data. However, for certain applications, such as real-time applications, the sensor data must closely reflect the current state of the targeted environment. However, the environment changes constantly and the data is collected in discreet moments of time. As such, the collected data has a temporal validity, and as time advances, it becomes less accurate, until it does not reflect the state of the environment any longer. Thus, these applications must query and analyze the data in a bounded time in order to make decisions and to react efficiently, such as industrial automation, aviation, sensors network, and so on. In this context, the design of efficient real-time data management solutions is necessary to deal with both time constraints and energy consumption. This thesis studies the real-time data management techniques for WSNs. It particularly it focuses on the study of the challenges in handling real-time data storage and query for WSNs and on the efficient real-time data management solutions for WSNs. First, the main specifications of real-time data management are identified and the available real-time data management solutions for WSNs in the literature are presented. Secondly, in order to provide an energy-efficient real-time data management solution, the techniques used to manage data and queries in WSNs based on the distributed paradigm are deeply studied. In fact, many research works argue that the distributed approach is the most energy-efficient way of managing data and queries in WSNs, instead of performing the warehousing. In addition, this approach can provide quasi real-time query processing because the most current data will be retrieved from the network. Thirdly, based on these two studies and considering the complexity of developing, testing, and debugging this kind of complex system, a model for a simulation framework of the real-time databases management on WSN that uses a distributed approach and its implementation are proposed. This will help to explore various solutions of real-time database techniques on WSNs before deployment for economizing money and time. Moreover, one may improve the proposed model by adding the simulation of protocols or place part of this simulator on another available simulator. For validating the model, a case study considering real-time constraints as well as energy constraints is discussed. Fourth, a new architecture that combines statistical modeling techniques with the distributed approach and a query processing algorithm to optimize the real-time user query processing are proposed. This combination allows performing a query processing algorithm based on admission control that uses the error tolerance and the probabilistic confidence interval as admission parameters. The experiments based on real world data sets as well as synthetic data sets demonstrate that the proposed solution optimizes the real-time query processing to save more energy while meeting low latency.Fundação para a Ciência e Tecnologi

    Dagstuhl News January - December 1999

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    Code Generation and Global Optimization Techniques for a Reconfigurable PRAM-NUMA Multicore Architecture

    Full text link

    Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques

    Get PDF
    Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates. State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose segmented bushy parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions. Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, state spill and state relocation, to address the run-time memory shortage. We propose lazy-disk and active-disk approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose global throughput-oriented state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions. Once results have been computed and materialized, it\u27s typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O(n2n^2) maintenance queries with n being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel grouping maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework

    An Integrated Engineering-Computation Framework for Collaborative Engineering: An Application in Project Management

    Get PDF
    Today\u27s engineering applications suffer from a severe integration problem. Engineering, the entire process, consists of a myriad of individual, often complex, tasks. Most computer tools support particular tasks in engineering, but the output of one tool is different from the others\u27. Thus, the users must re-enter the relevant information in the format required by another tool. Moreover, usually in the development process of a new product/process, several teams of engineers with different backgrounds/responsibilities are involved, for example mechanical engineers, cost estimators, manufacturing engineers, quality engineers, and project manager. Engineers need a tool(s) to share technical and managerial information and to be able to instantly access the latest changes made by one member, or more, in the teams to determine right away the impacts of these changes in all disciplines (cost, time, resources, etc.). In other words, engineers need to participate in a truly collaborative environment for the achievement of a common objective, which is the completion of the product/process design project in a timely, cost effective, and optimal manner. In this thesis, a new framework that integrates the capabilities of four commercial software, Microsoft Excelâ„¢ (spreadsheet), Microsoft Projectâ„¢ (project management), What\u27s Best! (an optimization add-in), and Visual Basicâ„¢ (programming language), with a state-of-the-art object-oriented database (knowledge medium), InnerCircle2000â„¢ is being presented and applied to handle the Cost-Time Trade-Off problem in project networks. The result was a vastly superior solution over the conventional solution from the viewpoint of data handling, completeness of solution space, and in the context of a collaborative engineering-computation environment

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation

    A multiscale methodology for the preliminary screening of alternative process designs from a sustainability viewpoint adopting molecular and process simulation along with data envelopment analysis

    Get PDF
    La ricerca scientifica nell\u2019ambito dell\u2019ingegneria chimica si \ue8 focalizzata sia sul perfezionamento delle teorie e delle tecniche utilizzate attualmente, che sullo sviluppo di nuovi strumenti atti a risolvere le problematiche ancora insolute relative alle produzioni di beni e servizi tipici delle industrie chimiche, biochimiche e farmaceutiche. In questo panorama, gli approcci multiscala si sono rivelati molto utili grazie alla loro peculiarit\ue0 di coniugare aspetti che spaziano dalla quanto-meccanica tipica della nanoscala, alla meccanica classica dei materiali massivi, comprendendo prospettive molto ampie e adattando ogni teoria alle diverse applicazioni. Inoltre, il riconoscimento dei concetti legati alla sostenibilit\ue0 come principi cardine per ottenere uno sviluppo sostenibile ha generato un prolifico incremento della diffusione di metodologie per considerare aspetti sociali e ambientali, a fianco delle tradizionali stime economiche, nel quadro pi\uf9 ampio delle valutazioni degli impianti chimici. Di conseguenza, questa tesi tratta dello sviluppo di una metodologia multiscala per la stima preliminare di diverse configurazioni impiantistiche, promuovendo l\u2019adozione di strumenti computazionali differenti e comprendendo valutazioni di carattere economico, sociale e ambientale. Il fine ultimo che tale metodologia si prefigge risiede nella soddisfazione della necessit\ue0 tipica di qualsiasi impianto di produzione, ovvero nella definizione di una metodologia di valutazione di vari parametri e configurazioni impiantistiche, utilizzando un\u2019ottica sostenibile e fornendo risultati velocemente. Al lettore verranno fornite le adeguate informazioni sull\u2019argomento in maniera progressiva attraverso i capitoli di questa tesi. Nel Chapter I saranno descritti il concetto di sostenibilit\ue0 e di sviluppo sostenibile. Seguir\ue0 una trattazione riguardante la loro applicazione nella societ\ue0 odierna da diverse prospettive: a partire da quella pi\uf9 generalista delle istituzioni, fino a quella pi\uf9 particolare dell\u2019industria, per concludere con una parte specifica sull\u2019industria chimica, corredata di esempi di metodologie applicate a processi chimici. Il Chapter II descriver\ue0 i passaggi necessari ad ottenere la valutazione della sostenibilit\ue0 delle alternative impiantistiche. Dal reperimento delle informazioni necessarie, all\u2019implementazione dei modelli nei simulatori di processo, seguito dal calcolo degli indici rappresentativi dei pilastri della sostenibilit\ue0, i cui valori vengono successivamente valutati tramite un algoritmo matematico (DEA) per identificare la configurazione impiantistica ottimale. Infine \ue8 necessario analizzare le alternative inefficienti di modo da comprendere su quali variabili si debba intervenire per migliorarne le prestazioni attraverso una retrofit analisi. Il Chapter III affronter\ue0 l\u2019utilizzo di diverse tecniche di simulazione molecolare per la stima del coefficiente di ripartizione ottanolo-acqua (Kow), che \ue8 un propriet\ue0 fondamentale per il calcolo di alcuni indici utilizzati. Il lettore trover\ue0 alcuni casi di studio descritti nel Chapter IV. Il primo appartiene al ramo della farmaceutica e si occupa della produzione del pioglitazone cloridrato attraverso l\u2019utilizzo di diverse vie di sintesi appartenenti a numerosi brevetti. La seconda applicazione della metodologia riguarda l\u2019industria biochimica e ottimizza le condizioni operative di un reattore utilizzato per la produzione di biodiesel a partire da olio vegetale. L\u2019ultimo caso di studio esplora il mondo dei materiali nanostrutturati, valutando diversi parametri di reazione utilizzati per condurre la sintesi di CdSe quantum dot. L\u2019ultimo Chapter V conterr\ue0 le valutazioni conclusive e le prospettive future.Research activity in chemical engineering is focused on the refinement of theories and techniques employed for the development of new tools aiming at solving issues directly related to the generation of goods and services supplied by chemical, biochemical and pharmaceutical industries. In this context, multiscale approaches revealed to be very useful, since they embrace theories from quantum mechanics at the nanoscale to classical mechanics at the macroscale, contemplating wide perspectives and enabling the adaptation of each theory to an abundance of disparate applications. Furthermore, the acknowledgment of sustainability among the cornerstones of future development led to a copious diffusion of sustainability evaluation methodologies, aiming to account for economic, social and environmental concerns among chemical processes assessments. Therefore, this thesis deals with the development of a multiscale framework for the preliminary screening of chemical process designs, promoting the adoption of various computational tools along with sustainability considerations. The purpose of this methodology resides in the fulfillment of an emblematic need for any production site, i.e. evaluating a production process considering possible modifications from different perspectives in order to identify as fast as possible the most efficient design including economic, social and environmental concerns. The reader will be guided through this topic following the chapters of this dissertation. In Chapter I, the concept of sustainability and sustainable development will be presented, followed by some applications starting from the wider panorama of institutions to the industry perspective, concluding with some relevant examples from chemical process engineering. Chapter II will describe each step to be performed in order to gain the sustainability evaluation of the process alternatives. From retrieving the promising process designs, to implementing each flowsheet in a process simulator, then calculating several indicators based on the sustainability pillars, which is followed by employing a mathematical tool (DEA) in order to select the most efficient designs and finally investigating how to enhance the sub-optimal alternatives through a retrofit analysis. Chapter III will deal with the application of different molecular simulation techniques in order to estimate the octanol-water partition coefficient (Kow), which is an essential parameter for the calculation of several sustainability indicators. Then the reader will encounter the three case studies shown in details in Chapter IV. The first one belongs to the pharmaceutical field and deals with the production of pioglitazone hydrochloride considering different synthesis routes from various patents. The second application regards the biochemical industry, optimizing the operating conditions of a reactor employed for the production of biodiesel from vegetable oil. The last one explores the synthesis of nanomaterials, evaluating several reaction parameters involved in the laboratory production of CdSe quantum dots from a sustainability viewpoint. Some concluding remarks and future perspectives will be included in the final Chapter V

    Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

    Get PDF
    Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat Politècnica de Catalunya i Université Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanisLes données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massivesPostprint (published version
    corecore