Search CORE

9 research outputs found

Orchestration of distributed ingestion and processing of IoT data for fog platforms

Author: Pérez Rico Juan Luis
Publication venue: Universitat Politècnica de Catalunya
Publication date: 26/11/2018
Field of study

In recent years there has been an extraordinary growth of the Internet of Things (IoT) and its protocols. The increasing diffusion of electronic devices with identification, computing and communication capabilities is laying ground for the emergence of a highly distributed service and networking environment. The above mentioned situation implies that there is an increasing demand for advanced IoT data management and processing platforms. Such platforms require support for multiple protocols at the edge for extended connectivity with the objects, but also need to exhibit uniform internal data organization and advanced data processing capabilities to fulfill the demands of the application and services that consume IoT data. One of the initial approaches to address this demand is the integration between IoT and the Cloud computing paradigm. There are many benefits of integrating IoT with Cloud computing. The IoT generates massive amounts of data, and Cloud computing provides a pathway for that data to travel to its destination. But today’s Cloud computing models do not quite fit for the volume, variety, and velocity of data that the IoT generates. Among the new technologies emerging around the Internet of Things to provide a new whole scenario, the Fog Computing paradigm has become the most relevant. Fog computing was introduced a few years ago in response to challenges posed by many IoT applications, including requirements such as very low latency, real-time operation, large geo-distribution, and mobility. Also this low latency, geo-distributed and mobility environments are covered by the network architecture MEC (Mobile Edge Computing) that provides an IT service environment and Cloud-computing capabilities at the edge of the mobile network, within the Radio Access Network (RAN) and in close proximity to mobile subscribers. Fog computing addresses use cases with requirements far beyond Cloud-only solution capabilities. The interplay between Cloud and Fog computing is crucial for the evolution of the so-called IoT, but the reach and specification of such interplay is an open problem. This thesis aims to find the right techniques and design decisions to build a scalable distributed system for the IoT under the Fog Computing paradigm to ingest and process data. The final goal is to explore the trade-offs and challenges in the design of a solution from Edge to Cloud to address opportunities that current and future technologies will bring in an integrated way. This thesis describes an architectural approach that addresses some of the technical challenges behind the convergence between IoT, Cloud and Fog with special focus on bridging the gap between Cloud and Fog. To that end, new models and techniques are introduced in order to explore solutions for IoT environments. This thesis contributes to the architectural proposals for IoT ingestion and data processing by 1) proposing the characterization of a platform for hosting IoT workloads in the Cloud providing multi-tenant data stream processing capabilities, the interfaces over an advanced data-centric technology, including the building of a state-of-the-art infrastructure to evaluate the performance and to validate the proposed solution. 2) studying an architectural approach following the Fog paradigm that addresses some of the technical challenges found in the first contribution. The idea is to study an extension of the model that addresses some of the central challenges behind the converge of Fog and IoT. 3) Design a distributed and scalable platform to perform IoT operations in a moving data environment. The idea after study data processing in Cloud, and after study the convenience of the Fog paradigm to solve the IoT close to the Edge challenges, is to define the protocols, the interfaces and the data management to solve the ingestion and processing of data in a distributed and orchestrated manner for the Fog Computing paradigm for IoT in a moving data environment.En els últims anys hi ha hagut un gran creixement del Internet of Things (IoT) i els seus protocols. La creixent difusió de dispositius electrònics amb capacitats d'identificació, computació i comunicació esta establint les bases de l’aparició de serveis altament distribuïts i del seu entorn de xarxa. L’esmentada situació implica que hi ha una creixent demanda de plataformes de processament i gestió avançada de dades per IoT. Aquestes plataformes requereixen suport per a múltiples protocols al Edge per connectivitat amb el objectes, però també necessiten d’una organització de dades interna i capacitats avançades de processament de dades per satisfer les demandes de les aplicacions i els serveis que consumeixen dades IoT. Una de les aproximacions inicials per abordar aquesta demanda és la integració entre IoT i el paradigma del Cloud computing. Hi ha molts avantatges d'integrar IoT amb el Cloud. IoT genera quantitats massives de dades i el Cloud proporciona una via perquè aquestes dades viatgin a la seva destinació. Però els models actuals del Cloud no s'ajusten del tot al volum, varietat i velocitat de les dades que genera l'IoT. Entre les noves tecnologies que sorgeixen al voltant del IoT per proporcionar un escenari nou, el paradigma del Fog Computing s'ha convertit en la més rellevant. Fog Computing es va introduir fa uns anys com a resposta als desafiaments que plantegen moltes aplicacions IoT, incloent requisits com baixa latència, operacions en temps real, distribució geogràfica extensa i mobilitat. També aquest entorn està cobert per l'arquitectura de xarxa MEC (Mobile Edge Computing) que proporciona serveis de TI i capacitats Cloud al edge per la xarxa mòbil dins la Radio Access Network (RAN) i a prop dels subscriptors mòbils. El Fog aborda casos d?us amb requisits que van més enllà de les capacitats de solucions només Cloud. La interacció entre Cloud i Fog és crucial per a l'evolució de l'anomenat IoT, però l'abast i especificació d'aquesta interacció és un problema obert. Aquesta tesi té com objectiu trobar les decisions de disseny i les tècniques adequades per construir un sistema distribuït escalable per IoT sota el paradigma del Fog Computing per a ingerir i processar dades. L'objectiu final és explorar els avantatges/desavantatges i els desafiaments en el disseny d'una solució des del Edge al Cloud per abordar les oportunitats que les tecnologies actuals i futures portaran d'una manera integrada. Aquesta tesi descriu un enfocament arquitectònic que aborda alguns dels reptes tècnics que hi ha darrere de la convergència entre IoT, Cloud i Fog amb especial atenció a reduir la bretxa entre el Cloud i el Fog. Amb aquesta finalitat, s'introdueixen nous models i tècniques per explorar solucions per entorns IoT. Aquesta tesi contribueix a les propostes arquitectòniques per a la ingesta i el processament de dades IoT mitjançant 1) proposant la caracterització d'una plataforma per a l'allotjament de workloads IoT en el Cloud que proporcioni capacitats de processament de flux de dades multi-tenant, les interfícies a través d'una tecnologia centrada en dades incloent la construcció d'una infraestructura avançada per avaluar el rendiment i validar la solució proposada. 2) estudiar un enfocament arquitectònic seguint el paradigma Fog que aborda alguns dels reptes tècnics que es troben en la primera contribució. La idea és estudiar una extensió del model que abordi alguns dels reptes centrals que hi ha darrere de la convergència de Fog i IoT. 3) Dissenyar una plataforma distribuïda i escalable per a realitzar operacions IoT en un entorn de dades en moviment. La idea després d'estudiar el processament de dades a Cloud, i després d'estudiar la conveniència del paradigma Fog per resoldre el IoT prop dels desafiaments Edge, és definir els protocols, les interfícies i la gestió de dades per resoldre la ingestió i processament de dades en un distribuït i orquestrat per al paradigma Fog Computing per a l'IoT en un entorn de dades en moviment

Tesis Doctorals en Xarxa

Orchestration of distributed ingestion and processing of IoT data for fog platforms

Author: Pérez Rico Juan Luis
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Analysis of Educational Data in the Current State of University Learning for the Transition to a Hybrid Education Model

Author: Luján-Mora Sergio
Palacios-Pacheco Xavier
Román-Cañizares Milton
Villegas-CH William
Publication venue: 'MDPI AG'
Publication date: 01/02/2021
Field of study

Currently, the 2019 Coronavirus Disease pandemic has caused serious damage to health throughout the world. Its contagious capacity has forced the governments of the world to decree isolation and quarantine to try to control the pandemic. The consequences that it leaves in all sectors of society have been disastrous. However, technological advances have allowed people to continue their different activities to some extent while maintaining isolation. Universities have great penetration in the use of technology, but they have also been severely affected. To give continuity to education, universities have been forced to move to an educational model based on synchronous encounters, but they have maintained the methodology of a face-to-face educational model, what has caused several problems in the learning of students. This work proposes the transition to a hybrid educational model, provided that this transition is supported by data analysis to identify the new needs of students. The knowledge obtained is contrasted with the performance presented by the students in the face-to-face modality and the necessary parameters for the transition to this modality are clearly established. In addition, the guidelines and methodology of online education are considered in order to take advantage of the best of both modalities and guarantee learning

Repositorio Institucional de la Universidad de Alicante

Directory of Open Access Journals

FIN-DM: finantsteenuste andmekaeve protsessi mudel

Author: Plotnikova Veronika
Publication venue
Publication date: 23/11/2021
Field of study

Andmekaeve hõlmab reeglite kogumit, protsesse ja algoritme, mis võimaldavad ettevõtetel iga päev kogutud andmetest rakendatavaid teadmisi ammutades suurendada tulusid, vähendada kulusid, optimeerida tooteid ja kliendisuhteid ning saavutada teisi eesmärke. Andmekaeves ja -analüütikas on vaja hästi määratletud metoodikat ja protsesse. Saadaval on mitu andmekaeve ja -analüütika standardset protsessimudelit. Kõige märkimisväärsem ja laialdaselt kasutusele võetud standardmudel on CRISP-DM. Tegu on tegevusalast sõltumatu protsessimudeliga, mida kohandatakse sageli sektorite erinõuetega. CRISP-DMi tegevusalast lähtuvaid kohandusi on pakutud mitmes valdkonnas, kaasa arvatud meditsiini-, haridus-, tööstus-, tarkvaraarendus- ja logistikavaldkonnas. Seni pole aga mudelit kohandatud finantsteenuste sektoris, millel on omad valdkonnapõhised erinõuded. Doktoritöös käsitletakse seda lünka finantsteenuste sektoripõhise andmekaeveprotsessi (FIN-DM) kavandamise, arendamise ja hindamise kaudu. Samuti uuritakse, kuidas kasutatakse andmekaeve standardprotsesse eri tegevussektorites ja finantsteenustes. Uurimise käigus tuvastati mitu tavapärase raamistiku kohandamise stsenaariumit. Lisaks ilmnes, et need meetodid ei keskendu piisavalt sellele, kuidas muuta andmekaevemudelid tarkvaratoodeteks, mida saab integreerida organisatsioonide IT-arhitektuuri ja äriprotsessi. Peamised finantsteenuste valdkonnas tuvastatud kohandamisstsenaariumid olid seotud andmekaeve tehnoloogiakesksete (skaleeritavus), ärikesksete (tegutsemisvõime) ja inimkesksete (diskrimineeriva mõju leevendus) aspektidega. Seejärel korraldati tegelikus finantsteenuste organisatsioonis juhtumiuuring, mis paljastas 18 tajutavat puudujääki CRISP- DMi protsessis. Uuringu andmete ja tulemuste abil esitatakse doktoritöös finantsvaldkonnale kohandatud CRISP-DM nimega FIN-DM ehk finantssektori andmekaeve protsess (Financial Industry Process for Data Mining). FIN-DM laiendab CRISP-DMi nii, et see toetab privaatsust säilitavat andmekaevet, ohjab tehisintellekti eetilisi ohte, täidab riskijuhtimisnõudeid ja hõlmab kvaliteedi tagamist kui osa andmekaeve elutsüklisData mining is a set of rules, processes, and algorithms that allow companies to increase revenues, reduce costs, optimize products and customer relationships, and achieve other business goals, by extracting actionable insights from the data they collect on a day-to-day basis. Data mining and analytics projects require well-defined methodology and processes. Several standard process models for conducting data mining and analytics projects are available. Among them, the most notable and widely adopted standard model is CRISP-DM. It is industry-agnostic and often is adapted to meet sector-specific requirements. Industry- specific adaptations of CRISP-DM have been proposed across several domains, including healthcare, education, industrial and software engineering, logistics, etc. However, until now, there is no existing adaptation of CRISP-DM for the financial services industry, which has its own set of domain-specific requirements. This PhD Thesis addresses this gap by designing, developing, and evaluating a sector-specific data mining process for financial services (FIN-DM). The PhD thesis investigates how standard data mining processes are used across various industry sectors and in financial services. The examination identified number of adaptations scenarios of traditional frameworks. It also suggested that these approaches do not pay sufficient attention to turning data mining models into software products integrated into the organizations' IT architectures and business processes. In the financial services domain, the main discovered adaptation scenarios concerned technology-centric aspects (scalability), business-centric aspects (actionability), and human-centric aspects (mitigating discriminatory effects) of data mining. Next, an examination by means of a case study in the actual financial services organization revealed 18 perceived gaps in the CRISP-DM process. Using the data and results from these studies, the PhD thesis outlines an adaptation of CRISP-DM for the financial sector, named the Financial Industry Process for Data Mining (FIN-DM). FIN-DM extends CRISP-DM to support privacy-compliant data mining, to tackle AI ethics risks, to fulfill risk management requirements, and to embed quality assurance as part of the data mining life-cyclehttps://www.ester.ee/record=b547227

DSpace at Tartu University Library

Clustering Approaches for Multi-source Entity Resolution

Author: Saeedi Alieh
Publication venue
Publication date: 10/12/2021
Field of study

Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

Query Processing on Attributed Graphs

Author: Yung Ka Wai
Publication venue
Publication date: 31/01/2018
Field of study

An attributed graph is a powerful tool for modeling a variety of information networks. It is not only able to represent relationships between objects easily, but it also allows every vertex and edge to have its attributes. Hence, a lot of data, such as the web, sensor networks, biological networks, economic graphs, and social networks, are modeled as attributed graphs. Due to the popularity of attributed graphs, the study of attributed graphs has caught attentions of researchers. For example, there are studies of attributed graph OLAP, query engine, clustering, summary, constrained pattern matching query, and graph visualization, etc. However, to the best of our knowledge, the studies of topological and attribute relationships between vertices on attributed graphs have not drawn much attentions of researchers. Given the high expressive power and popularity of attributed graph, in this thesis, we define and study the processing of three new attributed graph queries, which would help users to understand the topological and attribute relationships between entities in attributed graphs. For example, a reachability query on a social network can tell whether two persons can be connected given certain attribute constraints; a reachability query on a biological network can tell whether a compound can be transformed to another compound under given chemical reaction conditions; a How-to-Reach query can tell why the answers of the above two reachability query are negative; a visualizable path summary query can offer an overall picture of topological and attribute relationship between any two vertices in attributed graphs. Except for the proposed query types in this thesis, we believe that there is still penalty of meaningful attributed graph query types that have not been proposed and studied by the database and data mining community since an attributed graph is a very rich source of information. Through this thesis, we hope to draw people's attentions on attributed graph query processing so that more hidden information contained in attributed graphs can be queried and discovered

D-Scholarship@Pitt

On Pattern Mining in Graph Data to Support Decision-Making

Author: Petermann André
Publication venue
Publication date: 14/02/2019
Field of study

In recent years graph data models became increasingly important in both research and industry. Their core is a generic data structure of things (vertices) and connections among those things (edges). Rich graph models such as the property graph model promise an extraordinary analytical power because relationships can be evaluated without knowledge about a domain-specific database schema. This dissertation studies the usage of graph models for data integration and data mining of business data. Although a typical company's business data implicitly describes a graph it is usually stored in multiple relational databases. Therefore, we propose the first semi-automated approach to transform data from multiple relational databases into a single graph whose vertices represent domain objects and whose edges represent their mutual relationships. This transformation is the base of our conceptual framework BIIIG (Business Intelligence with Integrated Instance Graphs). We further proposed a graph-based approach to data integration. The process is executed after the transformation. In established data mining approaches interrelated input data is mostly represented by tuples of measure values and dimension values. In the context of graphs these values must be attached to the graph structure and aggregated measure values are graph attributes. Since the latter was not supported by any existing model, we proposed the use of collections of property graphs. They act as data structure of the novel Extended Property Graph Model (EPGM). The model supports vertices and edges that may appear in different graphs as well as graph properties. Further on, we proposed some operators that benefit from this data structure, for example, graph-based aggregation of measure values. A primitive operation of graph pattern mining is frequent subgraph mining (FSM). However, existing algorithms provided no support for directed multigraphs. We extended the popular gSpan algorithm to overcome this limitation. Some patterns might not be frequent while their generalizations are. Generalized graph patterns can be mined by attaching vertices to taxonomies. We proposed a novel approach to Generalized Multidimensional Frequent Subgraph Mining (GM-FSM), in particular the first solution to generalized FSM that supports not only directed multigraphs but also multiple dimensional taxonomies. In scenarios that compare patterns of different categories, e.g., fraud or not, FSM is not sufficient since pattern frequencies may differ by category. Further on, determining all pattern frequencies without frequency pruning is not an option due to the computational complexity of FSM. Thus, we developed an FSM extension to extract patterns that are characteristic for a specific category according to a user-defined interestingness function called Characteristic Subgraph Mining (CSM). Parts of this work were done in the context of GRADOOP, a framework for distributed graph analytics. To make the primitive operation of frequent subgraph mining available to this framework, we developed Distributed In-Memory gSpan (DIMSpan), a frequent subgraph miner that is tailored to the characteristics of shared-nothing clusters and distributed dataflow systems. Finally, the results of use case evaluations in cooperation with a large scale enterprise will be presented. This includes a report of practical experiences gained in implementation and application of the proposed algorithms

Qucosa - Publikationsserver der Universität Leipzig

Planning and Scheduling Optimization

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Although planning and scheduling optimization have been explored in the literature for many years now, it still remains a hot topic in the current scientific research. The changing market trends, globalization, technical and technological progress, and sustainability considerations make it necessary to deal with new optimization challenges in modern manufacturing, engineering, and healthcare systems. This book provides an overview of the recent advances in different areas connected with operations research models and other applications of intelligent computing techniques used for planning and scheduling optimization. The wide range of theoretical and practical research findings reported in this book confirms that the planning and scheduling problem is a complex issue that is present in different industrial sectors and organizations and opens promising and dynamic perspectives of research and development

Directory of Open Access Books (DOAB)