13 research outputs found

    Orchestration of distributed ingestion and processing of IoT data for fog platforms

    Get PDF
    In recent years there has been an extraordinary growth of the Internet of Things (IoT) and its protocols. The increasing diffusion of electronic devices with identification, computing and communication capabilities is laying ground for the emergence of a highly distributed service and networking environment. The above mentioned situation implies that there is an increasing demand for advanced IoT data management and processing platforms. Such platforms require support for multiple protocols at the edge for extended connectivity with the objects, but also need to exhibit uniform internal data organization and advanced data processing capabilities to fulfill the demands of the application and services that consume IoT data. One of the initial approaches to address this demand is the integration between IoT and the Cloud computing paradigm. There are many benefits of integrating IoT with Cloud computing. The IoT generates massive amounts of data, and Cloud computing provides a pathway for that data to travel to its destination. But today’s Cloud computing models do not quite fit for the volume, variety, and velocity of data that the IoT generates. Among the new technologies emerging around the Internet of Things to provide a new whole scenario, the Fog Computing paradigm has become the most relevant. Fog computing was introduced a few years ago in response to challenges posed by many IoT applications, including requirements such as very low latency, real-time operation, large geo-distribution, and mobility. Also this low latency, geo-distributed and mobility environments are covered by the network architecture MEC (Mobile Edge Computing) that provides an IT service environment and Cloud-computing capabilities at the edge of the mobile network, within the Radio Access Network (RAN) and in close proximity to mobile subscribers. Fog computing addresses use cases with requirements far beyond Cloud-only solution capabilities. The interplay between Cloud and Fog computing is crucial for the evolution of the so-called IoT, but the reach and specification of such interplay is an open problem. This thesis aims to find the right techniques and design decisions to build a scalable distributed system for the IoT under the Fog Computing paradigm to ingest and process data. The final goal is to explore the trade-offs and challenges in the design of a solution from Edge to Cloud to address opportunities that current and future technologies will bring in an integrated way. This thesis describes an architectural approach that addresses some of the technical challenges behind the convergence between IoT, Cloud and Fog with special focus on bridging the gap between Cloud and Fog. To that end, new models and techniques are introduced in order to explore solutions for IoT environments. This thesis contributes to the architectural proposals for IoT ingestion and data processing by 1) proposing the characterization of a platform for hosting IoT workloads in the Cloud providing multi-tenant data stream processing capabilities, the interfaces over an advanced data-centric technology, including the building of a state-of-the-art infrastructure to evaluate the performance and to validate the proposed solution. 2) studying an architectural approach following the Fog paradigm that addresses some of the technical challenges found in the first contribution. The idea is to study an extension of the model that addresses some of the central challenges behind the converge of Fog and IoT. 3) Design a distributed and scalable platform to perform IoT operations in a moving data environment. The idea after study data processing in Cloud, and after study the convenience of the Fog paradigm to solve the IoT close to the Edge challenges, is to define the protocols, the interfaces and the data management to solve the ingestion and processing of data in a distributed and orchestrated manner for the Fog Computing paradigm for IoT in a moving data environment.En els últims anys hi ha hagut un gran creixement del Internet of Things (IoT) i els seus protocols. La creixent difusió de dispositius electrònics amb capacitats d'identificació, computació i comunicació esta establint les bases de l’aparició de serveis altament distribuïts i del seu entorn de xarxa. L’esmentada situació implica que hi ha una creixent demanda de plataformes de processament i gestió avançada de dades per IoT. Aquestes plataformes requereixen suport per a múltiples protocols al Edge per connectivitat amb el objectes, però també necessiten d’una organització de dades interna i capacitats avançades de processament de dades per satisfer les demandes de les aplicacions i els serveis que consumeixen dades IoT. Una de les aproximacions inicials per abordar aquesta demanda és la integració entre IoT i el paradigma del Cloud computing. Hi ha molts avantatges d'integrar IoT amb el Cloud. IoT genera quantitats massives de dades i el Cloud proporciona una via perquè aquestes dades viatgin a la seva destinació. Però els models actuals del Cloud no s'ajusten del tot al volum, varietat i velocitat de les dades que genera l'IoT. Entre les noves tecnologies que sorgeixen al voltant del IoT per proporcionar un escenari nou, el paradigma del Fog Computing s'ha convertit en la més rellevant. Fog Computing es va introduir fa uns anys com a resposta als desafiaments que plantegen moltes aplicacions IoT, incloent requisits com baixa latència, operacions en temps real, distribució geogràfica extensa i mobilitat. També aquest entorn està cobert per l'arquitectura de xarxa MEC (Mobile Edge Computing) que proporciona serveis de TI i capacitats Cloud al edge per la xarxa mòbil dins la Radio Access Network (RAN) i a prop dels subscriptors mòbils. El Fog aborda casos d?us amb requisits que van més enllà de les capacitats de solucions només Cloud. La interacció entre Cloud i Fog és crucial per a l'evolució de l'anomenat IoT, però l'abast i especificació d'aquesta interacció és un problema obert. Aquesta tesi té com objectiu trobar les decisions de disseny i les tècniques adequades per construir un sistema distribuït escalable per IoT sota el paradigma del Fog Computing per a ingerir i processar dades. L'objectiu final és explorar els avantatges/desavantatges i els desafiaments en el disseny d'una solució des del Edge al Cloud per abordar les oportunitats que les tecnologies actuals i futures portaran d'una manera integrada. Aquesta tesi descriu un enfocament arquitectònic que aborda alguns dels reptes tècnics que hi ha darrere de la convergència entre IoT, Cloud i Fog amb especial atenció a reduir la bretxa entre el Cloud i el Fog. Amb aquesta finalitat, s'introdueixen nous models i tècniques per explorar solucions per entorns IoT. Aquesta tesi contribueix a les propostes arquitectòniques per a la ingesta i el processament de dades IoT mitjançant 1) proposant la caracterització d'una plataforma per a l'allotjament de workloads IoT en el Cloud que proporcioni capacitats de processament de flux de dades multi-tenant, les interfícies a través d'una tecnologia centrada en dades incloent la construcció d'una infraestructura avançada per avaluar el rendiment i validar la solució proposada. 2) estudiar un enfocament arquitectònic seguint el paradigma Fog que aborda alguns dels reptes tècnics que es troben en la primera contribució. La idea és estudiar una extensió del model que abordi alguns dels reptes centrals que hi ha darrere de la convergència de Fog i IoT. 3) Dissenyar una plataforma distribuïda i escalable per a realitzar operacions IoT en un entorn de dades en moviment. La idea després d'estudiar el processament de dades a Cloud, i després d'estudiar la conveniència del paradigma Fog per resoldre el IoT prop dels desafiaments Edge, és definir els protocols, les interfícies i la gestió de dades per resoldre la ingestió i processament de dades en un distribuït i orquestrat per al paradigma Fog Computing per a l'IoT en un entorn de dades en moviment

    Orchestration of distributed ingestion and processing of IoT data for fog platforms

    Get PDF
    In recent years there has been an extraordinary growth of the Internet of Things (IoT) and its protocols. The increasing diffusion of electronic devices with identification, computing and communication capabilities is laying ground for the emergence of a highly distributed service and networking environment. The above mentioned situation implies that there is an increasing demand for advanced IoT data management and processing platforms. Such platforms require support for multiple protocols at the edge for extended connectivity with the objects, but also need to exhibit uniform internal data organization and advanced data processing capabilities to fulfill the demands of the application and services that consume IoT data. One of the initial approaches to address this demand is the integration between IoT and the Cloud computing paradigm. There are many benefits of integrating IoT with Cloud computing. The IoT generates massive amounts of data, and Cloud computing provides a pathway for that data to travel to its destination. But today’s Cloud computing models do not quite fit for the volume, variety, and velocity of data that the IoT generates. Among the new technologies emerging around the Internet of Things to provide a new whole scenario, the Fog Computing paradigm has become the most relevant. Fog computing was introduced a few years ago in response to challenges posed by many IoT applications, including requirements such as very low latency, real-time operation, large geo-distribution, and mobility. Also this low latency, geo-distributed and mobility environments are covered by the network architecture MEC (Mobile Edge Computing) that provides an IT service environment and Cloud-computing capabilities at the edge of the mobile network, within the Radio Access Network (RAN) and in close proximity to mobile subscribers. Fog computing addresses use cases with requirements far beyond Cloud-only solution capabilities. The interplay between Cloud and Fog computing is crucial for the evolution of the so-called IoT, but the reach and specification of such interplay is an open problem. This thesis aims to find the right techniques and design decisions to build a scalable distributed system for the IoT under the Fog Computing paradigm to ingest and process data. The final goal is to explore the trade-offs and challenges in the design of a solution from Edge to Cloud to address opportunities that current and future technologies will bring in an integrated way. This thesis describes an architectural approach that addresses some of the technical challenges behind the convergence between IoT, Cloud and Fog with special focus on bridging the gap between Cloud and Fog. To that end, new models and techniques are introduced in order to explore solutions for IoT environments. This thesis contributes to the architectural proposals for IoT ingestion and data processing by 1) proposing the characterization of a platform for hosting IoT workloads in the Cloud providing multi-tenant data stream processing capabilities, the interfaces over an advanced data-centric technology, including the building of a state-of-the-art infrastructure to evaluate the performance and to validate the proposed solution. 2) studying an architectural approach following the Fog paradigm that addresses some of the technical challenges found in the first contribution. The idea is to study an extension of the model that addresses some of the central challenges behind the converge of Fog and IoT. 3) Design a distributed and scalable platform to perform IoT operations in a moving data environment. The idea after study data processing in Cloud, and after study the convenience of the Fog paradigm to solve the IoT close to the Edge challenges, is to define the protocols, the interfaces and the data management to solve the ingestion and processing of data in a distributed and orchestrated manner for the Fog Computing paradigm for IoT in a moving data environment.En els últims anys hi ha hagut un gran creixement del Internet of Things (IoT) i els seus protocols. La creixent difusió de dispositius electrònics amb capacitats d'identificació, computació i comunicació esta establint les bases de l’aparició de serveis altament distribuïts i del seu entorn de xarxa. L’esmentada situació implica que hi ha una creixent demanda de plataformes de processament i gestió avançada de dades per IoT. Aquestes plataformes requereixen suport per a múltiples protocols al Edge per connectivitat amb el objectes, però també necessiten d’una organització de dades interna i capacitats avançades de processament de dades per satisfer les demandes de les aplicacions i els serveis que consumeixen dades IoT. Una de les aproximacions inicials per abordar aquesta demanda és la integració entre IoT i el paradigma del Cloud computing. Hi ha molts avantatges d'integrar IoT amb el Cloud. IoT genera quantitats massives de dades i el Cloud proporciona una via perquè aquestes dades viatgin a la seva destinació. Però els models actuals del Cloud no s'ajusten del tot al volum, varietat i velocitat de les dades que genera l'IoT. Entre les noves tecnologies que sorgeixen al voltant del IoT per proporcionar un escenari nou, el paradigma del Fog Computing s'ha convertit en la més rellevant. Fog Computing es va introduir fa uns anys com a resposta als desafiaments que plantegen moltes aplicacions IoT, incloent requisits com baixa latència, operacions en temps real, distribució geogràfica extensa i mobilitat. També aquest entorn està cobert per l'arquitectura de xarxa MEC (Mobile Edge Computing) que proporciona serveis de TI i capacitats Cloud al edge per la xarxa mòbil dins la Radio Access Network (RAN) i a prop dels subscriptors mòbils. El Fog aborda casos d?us amb requisits que van més enllà de les capacitats de solucions només Cloud. La interacció entre Cloud i Fog és crucial per a l'evolució de l'anomenat IoT, però l'abast i especificació d'aquesta interacció és un problema obert. Aquesta tesi té com objectiu trobar les decisions de disseny i les tècniques adequades per construir un sistema distribuït escalable per IoT sota el paradigma del Fog Computing per a ingerir i processar dades. L'objectiu final és explorar els avantatges/desavantatges i els desafiaments en el disseny d'una solució des del Edge al Cloud per abordar les oportunitats que les tecnologies actuals i futures portaran d'una manera integrada. Aquesta tesi descriu un enfocament arquitectònic que aborda alguns dels reptes tècnics que hi ha darrere de la convergència entre IoT, Cloud i Fog amb especial atenció a reduir la bretxa entre el Cloud i el Fog. Amb aquesta finalitat, s'introdueixen nous models i tècniques per explorar solucions per entorns IoT. Aquesta tesi contribueix a les propostes arquitectòniques per a la ingesta i el processament de dades IoT mitjançant 1) proposant la caracterització d'una plataforma per a l'allotjament de workloads IoT en el Cloud que proporcioni capacitats de processament de flux de dades multi-tenant, les interfícies a través d'una tecnologia centrada en dades incloent la construcció d'una infraestructura avançada per avaluar el rendiment i validar la solució proposada. 2) estudiar un enfocament arquitectònic seguint el paradigma Fog que aborda alguns dels reptes tècnics que es troben en la primera contribució. La idea és estudiar una extensió del model que abordi alguns dels reptes centrals que hi ha darrere de la convergència de Fog i IoT. 3) Dissenyar una plataforma distribuïda i escalable per a realitzar operacions IoT en un entorn de dades en moviment. La idea després d'estudiar el processament de dades a Cloud, i després d'estudiar la conveniència del paradigma Fog per resoldre el IoT prop dels desafiaments Edge, és definir els protocols, les interfícies i la gestió de dades per resoldre la ingestió i processament de dades en un distribuït i orquestrat per al paradigma Fog Computing per a l'IoT en un entorn de dades en moviment.Postprint (published version

    The Magnitude Of Big Data 5VS In Business Macroclimate

    Get PDF
    This paper discusses and features the age of Big Data in business macroclimate. The systematic review of influences the rapid changing business world today. This research explores Big Data and today’s business literature is used in this research to further understand the concepts of Big Data and how Big Data processes, models and the Internet way of utilizing data for breakthrough innovation to acclimate for the age of technology. This paper also examines the depth understanding Big Data and what will Big Data bring to our society and businesses is essential for managers and top management to fully utilize the Big Data for competitive advantage. Big Data 5V are discussed in context of business macroclimate and the Big Data influences towards business processes. The garnered interest of researchers on Big Data in business over the years is evaluated to understand the need to grasp an understanding of 5V and the conceptual framework of big data analytics in business decision making is formed. As a conclusion, the Big Data phenomenon is illustrated in business concepts and intelligence and utilizing Big Data for competitive advantages in the competitive business world with advance information age in this millennium

    The Magnitude Of Big Data 5VS In Business Macroclimate

    Get PDF
    This paper discusses and features the age of Big Data in business macroclimate. The systematic review of influences the rapid changing business world today. This research explores Big Data and today’s business literature is used in this research to further understand the concepts of Big Data and how Big Data processes, models and the Internet way of utilizing data for breakthrough innovation to acclimate for the age of technology. This paper also examines the depth understanding Big Data and what will Big Data bring to our society and businesses is essential for managers and top management to fully utilize the Big Data for competitive advantage. Big Data 5V are discussed in context of business macroclimate and the Big Data influences towards business processes. The garnered interest of researchers on Big Data in business over the years is evaluated to understand the need to grasp an understanding of 5V and the conceptual framework of big data analytics in business decision making is formed. As a conclusion, the Big Data phenomenon is illustrated in business concepts and intelligence and utilizing Big Data for competitive advantages in the competitive business world with advance information age in this millennium

    Analysis of Educational Data in the Current State of University Learning for the Transition to a Hybrid Education Model

    Get PDF
    Currently, the 2019 Coronavirus Disease pandemic has caused serious damage to health throughout the world. Its contagious capacity has forced the governments of the world to decree isolation and quarantine to try to control the pandemic. The consequences that it leaves in all sectors of society have been disastrous. However, technological advances have allowed people to continue their different activities to some extent while maintaining isolation. Universities have great penetration in the use of technology, but they have also been severely affected. To give continuity to education, universities have been forced to move to an educational model based on synchronous encounters, but they have maintained the methodology of a face-to-face educational model, what has caused several problems in the learning of students. This work proposes the transition to a hybrid educational model, provided that this transition is supported by data analysis to identify the new needs of students. The knowledge obtained is contrasted with the performance presented by the students in the face-to-face modality and the necessary parameters for the transition to this modality are clearly established. In addition, the guidelines and methodology of online education are considered in order to take advantage of the best of both modalities and guarantee learning

    Review of Big Data integration in construction industry digitalization

    Get PDF
    The 2030 agenda for sustainable development has embraced the importance of sustainable practices in the construction industry. Parallel to the Industry revolution 4.0, the construction industry needs to keep pace with technological advances in data management to keep pace with the revolution through the ability to process and extract value from data. This phenomenon attracts the requirement of Big Data (BD). The construction industry deals with large volumes of heterogeneous data, which is expected to increase exponentially following an intense use of modern technologies. This research presents a comprehensive study of the literature, investigating the potential application of BD integration in the construction industry. The adoption of such technologies in this industry remains at a nascent stage and lags broad uptake of these technologies in other fields. The Construction Industry is driving to boost its productivity through the implementation of data technologies, hence, significant research is needed in this area. Currently, there is a lack of deep comprehensive research on BD integration applications that provide insight for the construction industry. This research closes the gap and gives an overview of the literature. The discussion presented the current utilization, the issues, and ways for potential works along with the challenges companion with the implementation

    FIN-DM: finantsteenuste andmekaeve protsessi mudel

    Get PDF
    Andmekaeve hõlmab reeglite kogumit, protsesse ja algoritme, mis võimaldavad ettevõtetel iga päev kogutud andmetest rakendatavaid teadmisi ammutades suurendada tulusid, vähendada kulusid, optimeerida tooteid ja kliendisuhteid ning saavutada teisi eesmärke. Andmekaeves ja -analüütikas on vaja hästi määratletud metoodikat ja protsesse. Saadaval on mitu andmekaeve ja -analüütika standardset protsessimudelit. Kõige märkimisväärsem ja laialdaselt kasutusele võetud standardmudel on CRISP-DM. Tegu on tegevusalast sõltumatu protsessimudeliga, mida kohandatakse sageli sektorite erinõuetega. CRISP-DMi tegevusalast lähtuvaid kohandusi on pakutud mitmes valdkonnas, kaasa arvatud meditsiini-, haridus-, tööstus-, tarkvaraarendus- ja logistikavaldkonnas. Seni pole aga mudelit kohandatud finantsteenuste sektoris, millel on omad valdkonnapõhised erinõuded. Doktoritöös käsitletakse seda lünka finantsteenuste sektoripõhise andmekaeveprotsessi (FIN-DM) kavandamise, arendamise ja hindamise kaudu. Samuti uuritakse, kuidas kasutatakse andmekaeve standardprotsesse eri tegevussektorites ja finantsteenustes. Uurimise käigus tuvastati mitu tavapärase raamistiku kohandamise stsenaariumit. Lisaks ilmnes, et need meetodid ei keskendu piisavalt sellele, kuidas muuta andmekaevemudelid tarkvaratoodeteks, mida saab integreerida organisatsioonide IT-arhitektuuri ja äriprotsessi. Peamised finantsteenuste valdkonnas tuvastatud kohandamisstsenaariumid olid seotud andmekaeve tehnoloogiakesksete (skaleeritavus), ärikesksete (tegutsemisvõime) ja inimkesksete (diskrimineeriva mõju leevendus) aspektidega. Seejärel korraldati tegelikus finantsteenuste organisatsioonis juhtumiuuring, mis paljastas 18 tajutavat puudujääki CRISP- DMi protsessis. Uuringu andmete ja tulemuste abil esitatakse doktoritöös finantsvaldkonnale kohandatud CRISP-DM nimega FIN-DM ehk finantssektori andmekaeve protsess (Financial Industry Process for Data Mining). FIN-DM laiendab CRISP-DMi nii, et see toetab privaatsust säilitavat andmekaevet, ohjab tehisintellekti eetilisi ohte, täidab riskijuhtimisnõudeid ja hõlmab kvaliteedi tagamist kui osa andmekaeve elutsüklisData mining is a set of rules, processes, and algorithms that allow companies to increase revenues, reduce costs, optimize products and customer relationships, and achieve other business goals, by extracting actionable insights from the data they collect on a day-to-day basis. Data mining and analytics projects require well-defined methodology and processes. Several standard process models for conducting data mining and analytics projects are available. Among them, the most notable and widely adopted standard model is CRISP-DM. It is industry-agnostic and often is adapted to meet sector-specific requirements. Industry- specific adaptations of CRISP-DM have been proposed across several domains, including healthcare, education, industrial and software engineering, logistics, etc. However, until now, there is no existing adaptation of CRISP-DM for the financial services industry, which has its own set of domain-specific requirements. This PhD Thesis addresses this gap by designing, developing, and evaluating a sector-specific data mining process for financial services (FIN-DM). The PhD thesis investigates how standard data mining processes are used across various industry sectors and in financial services. The examination identified number of adaptations scenarios of traditional frameworks. It also suggested that these approaches do not pay sufficient attention to turning data mining models into software products integrated into the organizations' IT architectures and business processes. In the financial services domain, the main discovered adaptation scenarios concerned technology-centric aspects (scalability), business-centric aspects (actionability), and human-centric aspects (mitigating discriminatory effects) of data mining. Next, an examination by means of a case study in the actual financial services organization revealed 18 perceived gaps in the CRISP-DM process. Using the data and results from these studies, the PhD thesis outlines an adaptation of CRISP-DM for the financial sector, named the Financial Industry Process for Data Mining (FIN-DM). FIN-DM extends CRISP-DM to support privacy-compliant data mining, to tackle AI ethics risks, to fulfill risk management requirements, and to embed quality assurance as part of the data mining life-cyclehttps://www.ester.ee/record=b547227

    Clustering Approaches for Multi-source Entity Resolution

    Get PDF
    Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

    Query Processing on Attributed Graphs

    Get PDF
    An attributed graph is a powerful tool for modeling a variety of information networks. It is not only able to represent relationships between objects easily, but it also allows every vertex and edge to have its attributes. Hence, a lot of data, such as the web, sensor networks, biological networks, economic graphs, and social networks, are modeled as attributed graphs. Due to the popularity of attributed graphs, the study of attributed graphs has caught attentions of researchers. For example, there are studies of attributed graph OLAP, query engine, clustering, summary, constrained pattern matching query, and graph visualization, etc. However, to the best of our knowledge, the studies of topological and attribute relationships between vertices on attributed graphs have not drawn much attentions of researchers. Given the high expressive power and popularity of attributed graph, in this thesis, we define and study the processing of three new attributed graph queries, which would help users to understand the topological and attribute relationships between entities in attributed graphs. For example, a reachability query on a social network can tell whether two persons can be connected given certain attribute constraints; a reachability query on a biological network can tell whether a compound can be transformed to another compound under given chemical reaction conditions; a How-to-Reach query can tell why the answers of the above two reachability query are negative; a visualizable path summary query can offer an overall picture of topological and attribute relationship between any two vertices in attributed graphs. Except for the proposed query types in this thesis, we believe that there is still penalty of meaningful attributed graph query types that have not been proposed and studied by the database and data mining community since an attributed graph is a very rich source of information. Through this thesis, we hope to draw people's attentions on attributed graph query processing so that more hidden information contained in attributed graphs can be queried and discovered
    corecore