72 research outputs found

    Data modeling with NoSQL : how, when and why

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Enabling Social Applications via Decentralized Social Data Management

    Full text link
    An unprecedented information wealth produced by online social networks, further augmented by location/collocation data, is currently fragmented across different proprietary services. Combined, it can accurately represent the social world and enable novel socially-aware applications. We present Prometheus, a socially-aware peer-to-peer service that collects social information from multiple sources into a multigraph managed in a decentralized fashion on user-contributed nodes, and exposes it through an interface implementing non-trivial social inferences while complying with user-defined access policies. Simulations and experiments on PlanetLab with emulated application workloads show the system exhibits good end-to-end response time, low communication overhead and resilience to malicious attacks.Comment: 27 pages, single ACM column, 9 figures, accepted in Special Issue of Foundations of Social Computing, ACM Transactions on Internet Technolog

    Building fast and consistent (geo-)replicated systems : from principles to practice

    Get PDF
    Distributing data across replicas within a data center or across multiple data centers plays an important role in building Internet-scale services that provide a good user experience, namely low latency access and high throughput. This approach often compromises on strong consistency semantics, which helps maintain application-specific desired properties, namely, state convergence and invariant preservation. To relieve such inherent tension, in the past few years, many proposals have been designed to allow programmers to selectively weaken consistency levels of certain operations to avoid costly immediate coordination for concurrent user requests. However, these fail to provide principles to guide programmers to make a correct decision of assigning consistency levels to various operations so that good performance is extracted while the system behavior still complies with its specification. The primary goal of this thesis work is to provide programmers with principles and tools for building fast and consistent (geo-) replicated systems by allowing programmers to think about various consistency levels in the same framework. The first step we took was to propose RedBlue consistency, which presents sufficient conditions that allow programmers to safely separate weakly consistent operations from strongly consistent ones in a coarse-grained manner. Second, to improve the practicality of RedBlue consistency, we built SIEVE - a tool that explores both Commutative Replicated Data Types and program analysis techniques to assign proper consistency levels to different operations and to maximize the weakly consistent operation space. Finally, we generalized the tradeoff between consistency and performance and proposed Partial Order-Restrictions consistency (or short, PoR consistency) - a generic consistency definition that captures various consistency levels in terms of visibility restrictions among pairs of operations and allows programmers to tune the restrictions to obtain a fine-grained control of their targeted consistency semantics.Daten auf mehrere Repliken in einem Datenzentrum oder über mehrere Datenzentren zu verteilen, nimmt einen hohen Stellenwert ein, um Internet-weite Services mit guter Nutzererfahrung, nsbesondere mit niedrigen Zugriffszeiten und hohem Datendurchsatz, zu implementieren. Diese Methode beeinträchtigt in der Regel die starke Konsitenzsemantik, die hilft gewünschte anwendungsspezifische Eigenschaften, die Zustandskonvergenz und Erhaltung von Invarianten, aufrechtzuerhalten. Um diesen Kompromiss zu mildern, wurde in den letzten Jahren mehrere Vorschläge entworfen, die es dem Programmierer ermöglichen für einzelne Operationen ein schwächeres Konsitenzlevel auszuwählen, um der aufwendigen Koordination paralleler Benutzeranfragen zu entgehen. Allerdings liefern diese Leitsätze für die Programmierer keine Lösungsansätze, wann welches Konsistenzlevel für eine Operation anzuwenden ist, so dass die höchstmögliche Leistung erreicht wird und gleichzeitig die Handlung des Systems die Spezifikation erfüllen. Das Hauptziel dieser Doktorarbeit ist es Leitsätzen und Werkzeuge für Programmierer bereitzustellen, die die Entwicklung von leistungsstarken, konsistenten und (weltweit) replizierten Sytemen ermöglichen, in dem dem Programmierer mit Hilfe eines Frameworks gleichzeitig zwischen verschiedenen Konsistenzlevel wählen kann. Als ersten Schritt entwickelten wir RedBlue Konsistenz, welches die hinreichende Bedingungen erläutert, die es einem Programmierer erlauben zwischen schwacher Konsistenz und starker Konsistenz zu wählen. Um die Praktikabilität von RedBlue Konsistenz im zweiten Schritt weiter zu erhöhen, entwickelten wir SIEVE - ein Werkzeug, das sowohl kommutative, replizierte Datentypen und Programmanalyseverfahren verwendet, um den richtigen Konsistenzlevel zu verschiedenen Operationen zuzuordnen und dabei die schwach konsistenten Operationen zu maximieren. Abschliessend verallgemeinern wir den Kompromiss zwischen Konsistenz und Leistungsstärke und stellen die partiell, eingeschränkt geordnete Konsistenz vor (PoR Konsistenz) - eine generische Konsistenzdefinition, die verschiedene Konsistenz level, hinsichtlich der Einschränkung der Sichtbarkeit zwischen paaren von Operationen, umfasst und dem Programmierer erlaubt, die Einschränkungen zu justieren, um die gewünschte Konsistenzsemantik zu erzielen

    Multicloud Resource Allocation:Cooperation, Optimization and Sharing

    Get PDF
    Nowadays our daily life is not only powered by water, electricity, gas and telephony but by "cloud" as well. Big cloud vendors such as Amazon, Microsoft and Google have built large-scale centralized data centers to achieve economies of scale, on-demand resource provisioning, high resource availability and elasticity. However, those massive data centers also bring about many other problems, e.g., bandwidth bottlenecks, privacy, security, huge energy consumption, legal and physical vulnerabilities. One of the possible solutions for those problems is to employ multicloud architectures. In this thesis, our work provides research contributions to multicloud resource allocation from three perspectives of cooperation, optimization and data sharing. We address the following problems in the multicloud: how resource providers cooperate in a multicloud, how to reduce information leakage in a multicloud storage system and how to share the big data in a cost-effective way. More specifically, we make the following contributions: Cooperation in the decentralized cloud. We propose a decentralized cloud model in which a group of SDCs can cooperate with each other to improve performance. Moreover, we design a general strategy function for SDCs to evaluate the performance of cooperation based on different dimensions of resource sharing. Through extensive simulations using a realistic data center model, we show that the strategies based on reciprocity are more effective than other strategies, e.g., those using prediction based on historical data. Our results show that the reciprocity-based strategy can thrive in a heterogeneous environment with competing strategies. Multicloud optimization on information leakage. In this work, we firstly study an important information leakage problem caused by unplanned data distribution in multicloud storage services. Then, we present StoreSim, an information leakage aware storage system in multicloud. StoreSim aims to store syntactically similar data on the same cloud, thereby minimizing the user's information leakage across multiple clouds. We design an approximate algorithm to efficiently generate similarity-preserving signatures for data chunks based on MinHash and Bloom filter, and also design a function to compute the information leakage based on these signatures. Next, we present an effective storage plan generation algorithm based on clustering for distributing data chunks with minimal information leakage across multiple clouds. Finally, we evaluate our scheme using two real datasets from Wikipedia and GitHub. We show that our scheme can reduce the information leakage by up to 60% compared to unplanned placement. Furthermore, our analysis in terms of system attackability demonstrates that our scheme makes attacks on information much more complex. Smart data sharing. Moving large amounts of distributed data into the cloud or from one cloud to another can incur high costs in both time and bandwidth. The optimization on data sharing in the multicloud can be conducted from two different angles: inter-cloud scheduling and intra-cloud optimization. We first present CoShare, a P2P inspired decentralized cost effective sharing system for data replication to optimize network transfer among small data centers. Then we propose a data summarization method to reduce the total size of dataset, thereby reducing network transfer

    Raspberry Pi Technology

    Get PDF

    Deployment and operational aspects of rural broadband wireless access networks

    Get PDF
    Broadband speeds, Internet literacy and digital technologies have been steadily evolving over the last decade. Broadband infrastructure has become a key asset in today’s society, enabling innovation, driving economic efficiency and stimulating cultural inclusion. However, populations living in remote and rural communities are unable to take advantage of these trends. Globally, a significant part of the world population is still deprived of basic access to the Internet. Broadband Wireless Access (BWA) networks are regarded as a viable solution for providing Internet access to populations living in rural regions. In recent years, Wireless Internet Service Providers (WISPs) and community organizations around the world proved that rural BWA networks can be an effective strategy and a profitable business. This research began by deploying a BWA network testbed, which also provides Internet access to several remote communities in the harsh environment of the Scottish Highlands and Islands. The experience of deploying and operating this network pointed out three unresolved research challenges that need to be addressed to ease the path towards widespread deployment of rural BWA networks, thereby bridging the rural-urban broadband divide. Below, our research contributions are outlined with respect to these challenges. Firstly, an effective planning paradigm for deploying BWA networks is proposed: incremental planning. Incremental planning allows to anticipate return of investment and to overcome the limited network infrastructure (e.g., backhaul fibre links) in rural areas. I have developed a software tool called IncrEase and underlying network planning algorithms to consider a varied set of operational metrics to guide the operator in identifying the regions that would benefit the most from a network upgrade, automatically suggesting the best long-term strategy to the network administrator. Second, we recognize that rural and community networks present additional issues for network management. As the Internet uplink is often the most expensive part of the operational expenses for such deployments, it is desirable to minimize overhead for network management. Also, unreliable connectivity between the network operation centre and the network being managed can render traditional centralized management approaches ineffective. Finally, the number of skilled personnel available to maintain such networks is limited. I have developed a distributed network management platform called Stix for BWA networks, to make it easy to manage such networks for rural/community deployments and WISPs alike while keeping the network management infrastructure scalable and flexible. Our approach is based on the notions of goal-oriented and in-network management: administrators graphically specify network management activities as workflows, which are run in the network on a distributed set of agents that cooperate in executing those workflows and storing management information. The Stix system was implemented on low-cost and small form-factor embedded boards and shown to have a low memory footprint. Third, the research focus moves to the problem of assessing broadband coverage and quality in a given geographic region. The outcome is BSense, a flexible framework that combines data provided by ISPs with measurements gathered by distributed software agents. The result is a census (presented as maps and tables) of the coverage and quality of broadband connections available in the region of interest. Such information can be exploited by ISPs to drive their growth, and by regulators and policy makers to get the true picture of broadband availability in the region and make informed decisions. In exchange for installing the multi-platform measurement software (that runs in the background) on their computers, users can get statistics about their Internet connection and those in their neighbourhood. Finally, the lessons learned through this research are summarised. The outcome is a set of suggestions about how the deployment and operation of rural BWA networks, including our own testbed, can be made more efficient by using the proper tools. The software systems presented in this thesis have been evaluated in lab settings and in real networks, and are available as open-source software
    corecore