11 research outputs found

    Efficient analysis of large-scale social networks using big-data platforms

    Get PDF
    Ankara : The Computer Engineering and The Graduate School of Engineering and Science of Bilkent University, 2014.Thesis (Ph. D.) -- Bilkent University, 2014.Includes bibliographical references leaves 133-145.In recent years, the rise of very large, rich content networks re-ignited interest to complex/social network analysis at the big data scale, which makes it possible to understand social interactions at large scale while it poses computation challenges to early works with algorithm complexity greater than O(n). This thesis analyzes social networks at very large-scales to derive important parameters and characteristics in an efficient and effective way using big-data platforms. With the popularization of mobile phone usage, telecommunication networks have turned into a socially binding medium and enables researches to analyze social interactions at very large scales. Degree distribution is one of the most important characteristics of social networks and to study degree characteristics and structural properties in large-scale social networks, in this thesis we first gathered a tera-scale dataset of telecommunication call detail records. Using this data we empirically evaluate some statistical models against the degree distribution of the country’s call graph and determine that a Pareto log-normal distribution provides the best fit, despite claims in the literature that power-law distribution is the best model. We also question and derive answers for how network operator, size, density and location affect degree distribution to understand the parameters governing it in social networks. Besides structural property analysis, community identification is of great interest in practice to learn high cohesive subnetworks about different subjects in a social network. In graph theory, k-core is a key metric used to identify subgraphs of high cohesion, also known as the ‘dense’ regions of a graph. As the real world graphs such as social network graphs grow in size, the contents get richer and the topologies change dynamically, we are challenged not only to materialize k-core subgraphs for one time but also to maintain them in order to keep up with continuous updates. These challenges inspired us to propose a new set of distributed algorithms for k-core view construction and maintenance on a horizontally scaling storage and computing platform. Experimental evaluation results demonstrated orders of magnitude speedup and advantages of maintaining k-core incrementally and in batch windows over complete reconstruction approaches. Moreover, the intensity of community engagement can be distinguished at multiple levels, resulting in a multiresolution community representation that has to be maintained over time. We also propose distributed algorithms to construct and maintain a multi-k-core graphs, implemented on the scalable big-data platform Apache HBase. Our experimental evaluation results demonstrate orders of magnitude speedup by maintaining multi-k-core incrementally over complete reconstruction. Furthermore, we propose a graph aware cache system designed for distributed graph processing. Experimental results demonstrate up to 15x speedup compared to traditional LRU based cache systems.Aksu, HidayetPh.D

    Large Scale Data Management for Enterprise Workloads

    Get PDF

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    Design of a reference architecture for an IoT sensor network

    Get PDF

    Extensão de propriedades SQL a SGBD NoSQL através de call level interfaces

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaOs arquitetos de software usam ferramentas, tais como Call Level Interfaces (CLI), para guardar, atualizar e retirar dados de Sistemas de Gestão de Bases de Dados (SGBD). Estas ferramentas estão desenhadas para efetuarem a junção entre os paradigmas de Base de Dados Relacional e da Programação Orientada a Objetos e fornecem funcionalidades padrão para interagir com SGBD. No entanto, a emergência do paradigma NoSQL, e particularmente de novos fornecedores de SGBD NoSQL, leva a situações onde algumas das funcionalidades padrão fornecidas por CLI não são suportadas. Isto deve-se normalmente à distância entre o modelo SQL e NoSQL, ou devido a restrições de design. Assim, quando um arquiteto de sistema precisa de evoluir, nomeadamente de um SGBD relacional para um SGBD NoSQL, tem de ultrapassar as dificuldades que emergem por existirem funcionalidades não suportadas pelo SGBD NoSQL. Não só isso, mas as CLI costumam ignorar políticas de controlo de acesso estabelecidas e, portanto, programadores de aplicações têm de dominar as ditas políticas de maneira a desenvolverem software em concordância com elas. Escolher o SGBD NoSQL errado pode levar a problemas de grandes dimensões quando as aplicações pedem funcionalidades não suportadas ou a que não têm acesso. Esta tese foca-se em implementar funcionalidades que não são comummente suportadas por SGBD NoSQL, tais como Stored Procedures, Transações, Save Points e interações com estruturas de memória local, através de uma framework baseada numa CLI padrão. O modelo de implementação de funcionalidades é definido por módulos da nossa framework, e permite a criação de sistemas distribuídos e tolerantes a falhas, que simulam as funcionalidades anteriormente referidas e abstraem as funcionalidades da base de dados subjacente de clientes. Também temos como objetivo integrar a nossa framework com trabalho anterior, a S-DRACA, uma arquitetura dinâmica e segura de controlo de acesso para aplicações relacionais, onde as permissões são definidas como sequências de expressões create, read, update e delete. Com esta integração, conseguimos fornecer Role-Based Access Control e outras funcionalidades de segurança a qualquer tipo de SGBD. Desenvolvemos várias formas de utilizar cada componente (localmente ou distribuído) e a framework está construída de forma modular, o que permite aos vários componentes serem utilizados individualmente ou em grupo, assim como permite o acrescento de funcionalidades ou SGBD adicionais por administradores de sistema que queiram adaptar a framework às suas necessidades particulares.To store, update and retrieve data from database management systems (DBMS), software architects use tools, like call level interfaces (CLI), which provide standard functionality to interact with DBMS. These tools are designed to bring together the relational database and object-oriented programming paradigms, but the emergence of the NoSQL paradigm, and particularly new NoSQL DBMS providers, leads to situations where some of the standard functionality provided by CLI are not supported, very often due to their distance from the relational model or due to design constraints. As such, when a system architect needs to evolve, namely from a relational DBMS to a NoSQL DBMS, he must overcome the difficulties conveyed by the features not provided by the NoSQL DBMS. Not only that, but CLI usually forsake applied access control policies. As such, application developers must master the established policies as a means to develop software that is conformant with them. Choosing the wrong NoSQL DBMS risks major issues with applications requesting non-supported features and with unauthorized accesses. This thesis focuses on deploying features that are not so commonly supported by NoSQL DBMS, such as Stored Procedures, Transactions, Save Points and interactions with local memory structures, through a framework based in a standard CLI. The feature implementation model is defined by modules of our framework, and allows for distributed and fault-tolerant systems to be deployed, which simulate the previously mentioned features and abstract the underlying database features from clients. It is also our goal to integrate our framework with previous work, S-DRACA, a dynamic secure access control architecture for relational applications, where permissions are defined as a sequence of create, read, update and delete expressions. With the integration, we can provide dynamic Role-Based Access Control and other security features to any kind of DBMS. We developed several ways of using each component (locally or distributed) and the framework is built in a modular fashion, which allows several components to be used individually or together, as well as extra features or DBMS to be added by system administrators that wish to adapt the framework to their particular needs

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Deep Learning in Mobile and Wireless Networking: A Survey

    Get PDF
    The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, agile management of network resource to maximize user experience, and extraction of fine-grained real-time analytics. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques to help managing the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research

    WICC 2017 : XIX Workshop de Investigadores en Ciencias de la Computación

    Get PDF
    Actas del XIX Workshop de Investigadores en Ciencias de la Computación (WICC 2017), realizado en el Instituto Tecnológico de Buenos Aires (ITBA), el 27 y 28 de abril de 2017.Red de Universidades con Carreras en Informática (RedUNCI
    corecore