11 research outputs found
Efficient analysis of large-scale social networks using big-data platforms
Ankara : The Computer Engineering and The Graduate School of Engineering and Science of Bilkent University, 2014.Thesis (Ph. D.) -- Bilkent University, 2014.Includes bibliographical references leaves 133-145.In recent years, the rise of very large, rich content networks re-ignited interest to
complex/social network analysis at the big data scale, which makes it possible
to understand social interactions at large scale while it poses computation challenges
to early works with algorithm complexity greater than O(n). This thesis
analyzes social networks at very large-scales to derive important parameters and
characteristics in an efficient and effective way using big-data platforms. With the
popularization of mobile phone usage, telecommunication networks have turned
into a socially binding medium and enables researches to analyze social interactions
at very large scales. Degree distribution is one of the most important
characteristics of social networks and to study degree characteristics and structural
properties in large-scale social networks, in this thesis we first gathered
a tera-scale dataset of telecommunication call detail records. Using this data
we empirically evaluate some statistical models against the degree distribution
of the country’s call graph and determine that a Pareto log-normal distribution
provides the best fit, despite claims in the literature that power-law distribution
is the best model. We also question and derive answers for how network operator,
size, density and location affect degree distribution to understand the parameters
governing it in social networks.
Besides structural property analysis, community identification is of great interest
in practice to learn high cohesive subnetworks about different subjects in a
social network. In graph theory, k-core is a key metric used to identify subgraphs
of high cohesion, also known as the ‘dense’ regions of a graph. As the real world
graphs such as social network graphs grow in size, the contents get richer and the
topologies change dynamically, we are challenged not only to materialize k-core
subgraphs for one time but also to maintain them in order to keep up with continuous
updates. These challenges inspired us to propose a new set of distributed algorithms for k-core view construction and maintenance on a horizontally scaling
storage and computing platform. Experimental evaluation results demonstrated
orders of magnitude speedup and advantages of maintaining k-core incrementally
and in batch windows over complete reconstruction approaches.
Moreover, the intensity of community engagement can be distinguished at
multiple levels, resulting in a multiresolution community representation that has
to be maintained over time. We also propose distributed algorithms to construct
and maintain a multi-k-core graphs, implemented on the scalable big-data platform
Apache HBase. Our experimental evaluation results demonstrate orders
of magnitude speedup by maintaining multi-k-core incrementally over complete
reconstruction. Furthermore, we propose a graph aware cache system designed
for distributed graph processing. Experimental results demonstrate up to 15x
speedup compared to traditional LRU based cache systems.Aksu, HidayetPh.D
Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required.
To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems.
In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries.
The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms.
In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms.
Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies
Extensão de propriedades SQL a SGBD NoSQL através de call level interfaces
Mestrado em Engenharia de Computadores e TelemáticaOs arquitetos de software usam ferramentas, tais como Call Level Interfaces
(CLI), para guardar, atualizar e retirar dados de Sistemas de Gestão
de Bases de Dados (SGBD). Estas ferramentas estão desenhadas para efetuarem
a junção entre os paradigmas de Base de Dados Relacional e da
Programação Orientada a Objetos e fornecem funcionalidades padrão para
interagir com SGBD. No entanto, a emergência do paradigma NoSQL, e particularmente
de novos fornecedores de SGBD NoSQL, leva a situações onde
algumas das funcionalidades padrão fornecidas por CLI não são suportadas.
Isto deve-se normalmente à distância entre o modelo SQL e NoSQL, ou devido
a restrições de design. Assim, quando um arquiteto de sistema precisa de
evoluir, nomeadamente de um SGBD relacional para um SGBD NoSQL, tem
de ultrapassar as dificuldades que emergem por existirem funcionalidades não
suportadas pelo SGBD NoSQL. Não só isso, mas as CLI costumam ignorar
polÃticas de controlo de acesso estabelecidas e, portanto, programadores de
aplicações têm de dominar as ditas polÃticas de maneira a desenvolverem
software em concordância com elas. Escolher o SGBD NoSQL errado pode
levar a problemas de grandes dimensões quando as aplicações pedem funcionalidades
não suportadas ou a que não têm acesso.
Esta tese foca-se em implementar funcionalidades que não são comummente
suportadas por SGBD NoSQL, tais como Stored Procedures, Transações,
Save Points e interações com estruturas de memória local, através de uma
framework baseada numa CLI padrão. O modelo de implementação de funcionalidades
é definido por módulos da nossa framework, e permite a criação
de sistemas distribuÃdos e tolerantes a falhas, que simulam as funcionalidades
anteriormente referidas e abstraem as funcionalidades da base de dados
subjacente de clientes. Também temos como objetivo integrar a nossa
framework com trabalho anterior, a S-DRACA, uma arquitetura dinâmica e
segura de controlo de acesso para aplicações relacionais, onde as permissões
são definidas como sequências de expressões create, read, update e delete.
Com esta integração, conseguimos fornecer Role-Based Access Control e
outras funcionalidades de segurança a qualquer tipo de SGBD. Desenvolvemos
várias formas de utilizar cada componente (localmente ou distribuÃdo)
e a framework está construÃda de forma modular, o que permite aos vários
componentes serem utilizados individualmente ou em grupo, assim como
permite o acrescento de funcionalidades ou SGBD adicionais por administradores
de sistema que queiram adaptar a framework às suas necessidades
particulares.To store, update and retrieve data from database management systems
(DBMS), software architects use tools, like call level interfaces (CLI), which
provide standard functionality to interact with DBMS. These tools are designed
to bring together the relational database and object-oriented programming
paradigms, but the emergence of the NoSQL paradigm, and particularly
new NoSQL DBMS providers, leads to situations where some of the standard
functionality provided by CLI are not supported, very often due to their
distance from the relational model or due to design constraints. As such,
when a system architect needs to evolve, namely from a relational DBMS to
a NoSQL DBMS, he must overcome the difficulties conveyed by the features
not provided by the NoSQL DBMS. Not only that, but CLI usually forsake
applied access control policies. As such, application developers must master
the established policies as a means to develop software that is conformant
with them. Choosing the wrong NoSQL DBMS risks major issues with applications
requesting non-supported features and with unauthorized accesses.
This thesis focuses on deploying features that are not so commonly supported
by NoSQL DBMS, such as Stored Procedures, Transactions, Save
Points and interactions with local memory structures, through a framework
based in a standard CLI. The feature implementation model is defined by
modules of our framework, and allows for distributed and fault-tolerant systems
to be deployed, which simulate the previously mentioned features and
abstract the underlying database features from clients. It is also our goal to
integrate our framework with previous work, S-DRACA, a dynamic secure
access control architecture for relational applications, where permissions are
defined as a sequence of create, read, update and delete expressions. With
the integration, we can provide dynamic Role-Based Access Control and
other security features to any kind of DBMS. We developed several ways
of using each component (locally or distributed) and the framework is built
in a modular fashion, which allows several components to be used individually
or together, as well as extra features or DBMS to be added by system
administrators that wish to adapt the framework to their particular needs
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Deep Learning in Mobile and Wireless Networking: A Survey
The rapid uptake of mobile devices and the rising popularity of mobile
applications and services pose unprecedented demands on mobile and wireless
networking infrastructure. Upcoming 5G systems are evolving to support
exploding mobile traffic volumes, agile management of network resource to
maximize user experience, and extraction of fine-grained real-time analytics.
Fulfilling these tasks is challenging, as mobile environments are increasingly
complex, heterogeneous, and evolving. One potential solution is to resort to
advanced machine learning techniques to help managing the rise in data volumes
and algorithm-driven applications. The recent success of deep learning
underpins new and powerful tools that tackle problems in this space.
In this paper we bridge the gap between deep learning and mobile and wireless
networking research, by presenting a comprehensive survey of the crossovers
between the two areas. We first briefly introduce essential background and
state-of-the-art in deep learning techniques with potential applications to
networking. We then discuss several techniques and platforms that facilitate
the efficient deployment of deep learning onto mobile systems. Subsequently, we
provide an encyclopedic review of mobile and wireless networking research based
on deep learning, which we categorize by different domains. Drawing from our
experience, we discuss how to tailor deep learning to mobile environments. We
complete this survey by pinpointing current challenges and open future
directions for research
WICC 2017 : XIX Workshop de Investigadores en Ciencias de la Computación
Actas del XIX Workshop de Investigadores en Ciencias de la Computación (WICC 2017), realizado en el Instituto Tecnológico de Buenos Aires (ITBA), el 27 y 28 de abril de 2017.Red de Universidades con Carreras en Informática (RedUNCI