346 research outputs found

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas ĂĄreas urbanas. Este aumento, aliado ao robustecimento de uma classe mĂ©dia com maior poder econĂłmico, introduzem novos e complexos desafios na mobilidade de pessoas em ĂĄreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estĂŁo a adotar soluçÔes inovadoras no domĂ­nio dos sistemas de Dados em Larga Escala nos domĂ­nios da Mobilidade e TrĂĄfego. Estas soluçÔes irĂŁo apoiar os processos de decisĂŁo com o intuito de libertar uma infraestrutura de estradas e transportes jĂĄ sobrecarregada. A informação colecionada da mobilidade diĂĄria e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestĂŁo de trĂąnsito e de infraestruturas de estradas (em inglĂȘs, road infrastructure and traffic management operators — RITMOs) estĂŁo limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domĂ­nio da Mobilidade e TrĂĄfego (em inglĂȘs, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estĂŁo a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tĂłpicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estĂŁo dispersos, e a literatura existente nĂŁo oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ĂȘnfase nas sĂ©rie temporais georreferenciadas (em inglĂȘs, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglĂȘs, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisĂŁo. Esta dissertação doutoral propĂ”e uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. AlĂ©m de uma revisĂŁo de literatura completa nas ĂĄreas de Dados Espaciotemporais, Big Data e na junção destas ĂĄreas atravĂ©s do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de caracterĂ­sticas gerais, requisitos tĂ©cnicos, componentes lĂłgicos, fluxos de dados e modelos de infraestrutura tecnolĂłgica, bem como diretrizes e boas prĂĄticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiĂĄ-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodolĂłgico de suporte, baseado em Arqui- teturas de ReferĂȘncia e diretrizes amplamente utilizadas, mas enriquecido com as caracterĂ­sti- cas e assuntos implĂ­citos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vĂĄrios cenĂĄrios reais no Ăąmbito de projetos de investigação financiados pela ComissĂŁo Europeia e pelo Governo portuguĂȘs, nos quais foram implementados mĂ©todos, ferramentas e tecnologias nas ĂĄreas de GestĂŁo de Dados, Processamento de Dados e CiĂȘncia e Visualização de Dados em plataformas MobiTrafficB

    Growth of relational model: Interdependence and complementary to big data

    Get PDF
    A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

    Graph Processing in Main-Memory Column Stores

    Get PDF
    Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes

    Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists

    Content And Multimedia Database Management Systems

    Get PDF
    A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems

    Data Mining in Promoting Flight Safety

    Get PDF
    The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.Lentoliikenne kasvoi huomattavasti 1950-luvulla pÀÀasiassa suihkumatkustajakoneiden myötĂ€, mikĂ€ aiheutti poikkeamatietojen jĂ€rjestelmĂ€llisen kerÀÀmisen ja tutkimuksen tarpeen. MÀÀrĂ€muotoinen tieto voidaan helposti analysoida tietokantakyselyillĂ€ esittĂ€en tulokset kĂ€yttĂ€en graafisia työkaluja, mutta tekstianalyysiin, jonka avulla tapauksista saadaan usein tarkempia tietoja, tarvitaan louhintatyökaluja. Tekstimuotoisen tiedon automaattinen analysointi ei ole ollut mahdollista ennen louhintatyökalujen kehittĂ€mistĂ€. Silti niiden kĂ€yttö, ainakin ilmailun piirissĂ€, on edelleen vĂ€hĂ€istĂ€. Tutkimuksen tarkoituksena oli havaita vaarallisia kehityskulkuja lentoturvallisuusraporteissa. 1 200 lentoturvallisuusraportin selostusosiot vuosilta 1994 –1996 kĂ€siteltiin kolmella tekstinlouhintatyökalulla. Yksi nĂ€istĂ€ oli tĂ€ysin kieliriippumaton, toisessa oli lisĂ€osa, jossa oli mahdollisuus kĂ€sitellĂ€ suomen kieltĂ€ ja kolmas oli rakennettu alun perin ainoastaan englanninkielisen tekstin louhintaan, mutta espanjan kielellĂ€ saavutettujen rohkaisevien tulosten pohjalta pÀÀtettiin kokeilla myös suomenkielistĂ€ tekstiĂ€. Lento-onnettomuuksien mÀÀrĂ€ liikenteeseen nĂ€hden on vakiintumassa maailmanlaajuisesti katsottuna ja turvallisuustaso voidaan katsoa tyydyttĂ€vĂ€ksi. Kuitenkin liikenteen kasvaessa myös onnettomuuksien mÀÀrĂ€ lisÀÀntyy vuosittain, mikĂ€li lentoturvallisuutta ei kyetĂ€ parantamaan. Turvallisuustiedon kerÀÀminen ja raportointijĂ€rjestelmĂ€t ovat jo saavuttaneet huippunsa. Analysoinnin parantaminen on avain lentoturvallisuuden parantamiseen. Lentoliikenteen on ennustettu kasvavan 5 – 6 prosenttia vuodessa seuraavien kahden vuosikymmenen ajan. Samana aikana lentoliikenne saattaa kaksinkertaistua jopa vaatimattomimpien talouskasvuennusteiden mukaan. TĂ€llainen kehitys asettaa lentoliikenteen pÀÀttĂ€jille yhĂ€ kasvavia paineita kiristyvĂ€n kilpailun, polttoaineiden hinnannousun ja liikenteen kasvun aiheuttaman onnettomuuksien mÀÀrĂ€n vĂ€hentĂ€miseksi. TĂ€mĂ€ korostaa uusien menetelmien ja työkalujen kiireellistĂ€ tarvetta. Kaikilla louhintajĂ€rjestelmillĂ€ saatiin rohkaisevia tuloksia mutta ne nostivat samalla esille haasteita, jotka tulisi vielĂ€ voittaa. Lentoturvallisuutta voidaan vielĂ€ parantaa kĂ€yttĂ€mĂ€llĂ€ tĂ€ssĂ€ esille tuotuja analyysimenetelmiĂ€ ja –työkaluja kuten tiedonlouhintaa ja soveltamalla nĂ€in saatuja tuloksia johdon pÀÀtöksenteon tukena.Siirretty Doriast

    Efficient Partitioning and Allocation of Data for Workflow Compositions

    Get PDF
    Our aim is to provide efficient partitioning and allocation of data for web service compositions. Web service compositions are represented as partial order database transactions. We accommodate a variety of transaction types, such as read-only and write-oriented transactions, to support workloads in cloud environments. We introduce an approach that partitions and allocates small units of data, called micropartitions, to multiple database nodes. Each database node stores only the data needed to support a specific workload. Transactions are routed directly to the appropriate data nodes. Our approach guarantees serializability and efficient execution. In Phase 1, we cluster transactions based on data requirements. We associate each cluster with an abstract query definition. An abstract query represents the minimal data requirement that would satisfy all the queries that belong to a given cluster. A micropartition is generated by executing the abstract query on the original database. We show that our abstract query definition is complete and minimal. Intuitively, completeness means that all queries of the corresponding cluster can be correctly answered using the micropartition generated from the abstract query. The minimality property means that no smaller partition of the data can satisfy all of the queries in the cluster. We also aim to support efficient web services execution. Our approach reduces the number of data accesses to distributed data. We also aim to limit the number of replica updates. Our empirical results show that the partitioning approach improves data access efficiency over standard partitioning of data. In Phase 2, we investigate the performance improvement via parallel execution.Based on the data allocation achieved in Phase I, we develop a scheduling approach. Our approach guarantees serializability while efficiently exploiting parallel execution of web services. We achieve conflict serializability by scheduling conflicting operations in a predefined order. This order is based on the calculation of a minimal delay requirement. We use this delay to schedule services to preserve serializability without the traditional locking mechanisms
    • 

    corecore