7 research outputs found

    Big Data: Metadatos y su uso para la vigilancia global

    Full text link
    [IT] La mia tesi consiste, prima di tutto, nella descrizione del mondo del Big Data, definendolo, facendo un breve riassunto della sua cronologia e analizzando alcuni dei problemi sorti in questo campo. Successivamente, mi sono concentrato sulla descrizione dei metadati, definendoli e mostrando l’importanza che hanno assunto oggigiorno. Inoltre sviluppo alcune delle sue applicazioni. Il tema centrale del mio lavoro, quindi, tratta della relazione che intercorre tra il Big Data e i metadati, dell’uso che ne fanno i governi, specialmente quello Americano, raccogliendo dati di massa al fine di creare una rete di sorveglianza globale della popolazione. Nel Capitolo 5 mostro numerose rivelazioni riguardanti queste reti di sorveglianza pubblicate attraverso vari mezzi di comunicazione.[ES] Dentro del mundo del Big Data, profundizar y mostrar qué son los metadatos, cómo se recolectan y el uso que les da el gobierno para la vigilancia global. Para todo ello, lo describiré a través del caso Snowden, un extrabajador de la NSA (Agencia Nacional de Seguridad de EEUU), que rebeló métodos de EEUU para la vigilancia de sus ciudadanos. Mi trabajo será estructurado de la siguiente manera: 1. Introducción 1.1 Definición Big Data 1.2 Evolución Big Data 1.3 Definición e importancia de los metadatos 1.4 Recolección metadatos y almacenamiento Big Data 1.5 Uso de los metadatos: el espionaje 2. Objetivos 2.1 Caso Snowden 2.2 Mostrar cómo se recolectan los metadatos, y como son material de estudio por las agencias de seguridad. Su posible uso a efectos de vigilancia mundial. 3. Situación actual 3.1 Fuentes de recolección de datos y metadatos 3.2 Las leyes internacionales dedicadas a la protección de datos y la privacidad personal 3.3 Debate entre la seguridad nacional frente a la privacidad. ¿Qué debería tener prioridad? 4. Conclusiones .Arguisuelas León, JA. (2017). Big Data: Metadati e il suo uso per la sorveglianza. http://hdl.handle.net/10251/89496TFG

    Cost-Aware Resource Management for Decentralized Internet Services

    Full text link
    Decentralized network services, such as naming systems, content distribution networks, and publish-subscribe systems, play an increasingly critical role and are required to provide high performance, low latency service, achieve high availability in the presence of network and node failures, and handle a large volume of users. Judicious utilization of expensive system resources, such as memory space, network bandwidth, and number of machines, is fundamental to achieving the above properties. Yet, current network services typically rely on less-informed, heuristic-based techniques to manage scarce resources, and often fall short of expectations. This thesis presents a principled approach for building high performance, robust, and scalable network services. The key contribution of this thesis is to show that resolving the fundamental cost-benefit tradeoff between resource consumption and performance through mathematical optimization is practical in large-scale distributed systems, and enables decentralized network services to meet efficiently system-wide performance goals. This thesis presents a practical approach for resource management in three stages: analytically model the cost-benefit tradeoff as a constrained optimization problem, determine a near-optimal resource allocation strategy on the fly, and enforce the derived strategy through light-weight, decentralized mechanisms. It builds on self-organizing structured overlays, which provide failure resilience and scalability, and complements them with stronger performance guarantees and robustness under sudden changes in workload. This work enables applications to meet system-wide performance targets, such as low average response times, high cache hit rates, and small update dissemination times with low resource consumption. Alternatively, applications can make the maximum use of available resources, such as storage and bandwidth, and derive large gains in performance. I have implemented an extensible framework called Honeycomb to perform cost-aware resource management on structured overlays based on the above approach and built three critical network services using it. These services consist of a new name system for the Internet called CoDoNS that distributes data associated with domain names, an open-access content distribution network called CobWeb that caches web content for faster access by users, and an online information monitoring system called Corona that notifies users about changes to web pages. Simulations and performance measurements from a planetary-scale deployment show that these services provide unprecedented performance improvement over the current state of the art

    Replication of non-deterministic objects

    Get PDF
    This thesis discusses replication of non-deterministic objects in distributed systems to achieve fault tolerance against crash failures. The objects replicated are the virtual nodes of a distributed application. Replication is viewed as an issue that is to be dealt with only during the configuration of a distributed application and that should not affect the development of the application. Hence, replication of virtual nodes should be transparent to the application. Like all measures to achieve fault tolerance, replication introduces redundancy in the system. Not surprisingly, the main difficulty is guaranteeing the consistency of all replicas such that they behave in the same way as if the object was not replicated (replication transparency). This is further complicated if active objects (like virtual nodes) are replicated, and these objects themselves can be clients of still further objects in the distributed application. The problems of replication of active non-deterministic objects are analyzed in the context of distributed Ada 95 applications. The ISO standard for Ada 95 defines a model for distributed execution based on remote procedure calls (RPC). Virtual nodes in Ada 95 use this as their sole communication paradigm, but they may contain tasks to execute activities concurrently, thus making the execution potentially non-deterministic due to implicit timing dependencies. Such non-determinism cannot be avoided by choosing deterministic tasking policies. I present two different approaches to maintain replica consistency despite this non-determinism. In a first approach, I consider the run-time support of Ada 95 as a black box (except for the part handling remote communications). This corresponds to a non-deterministic computation model. I show that replication of non-deterministic virtual nodes requires that remote procedure calls are implemented as nested transactions. Unfortunately, effects of failures are not local to the replicas of a virtual node: when a failure occurs, nested remote calls made to other virtual nodes must be undone. Also, using transactional semantics for RPCs necessitates a compromise regarding transparency: the application must identify global state for it cannot be determined reliably in an automatic way. Further study reveals that this approach cannot be implemented in a transparent way at all because the consistency criterion of Ada 95 (linearizability) is much weaker than that of transactions (serializability). An execution of remote procedure calls as transactions may thus lead to incompatibilities with the semantics of the programming language. If remotely called subprograms on a replicated virtual node perform partial operations, i.e., entry calls on global protected objects, deadlocks that cannot be broken can occur in certain cases. Such deadlocks do not occur when the virtual node is not replicated. The transactional semantics of RPCs must therefore be exposed to the application. A second approach is based on a piecewise deterministic computation model, i.e., the execution of a virtual node is seen as a sequence of deterministic state intervals. Whenever a non-deterministic event occurs, a new state interval is started. I study replica organization under this computation model (semi-active replication). In this model, all non-deterministic decisions are made on one distinguished replica (the leader), while all other replicas (the followers) are forced to follow the same sequence of non-deterministic events. I show that it suffices to synchronize the followers with the leader upon each observable event, i.e., when the leader sends a message to some other virtual node. It is not necessary to synchronize upon each and every non-deterministic event — which would incur a prohibitively high overhead. Non-deterministic events occurring on the leader between observable events are logged and sent to the followers just before the leader executes an observable event. Consequently, it is guaranteed that the followers will reach the same state as the leader, and thus the effects of failures remain mostly local to the replicas. A prototype implementation called RAPIDS (Replicated Ada Partitions In Distributed Systems) serves as a proof of concept for this second approach, demonstrating its feasibility. RAPIDS is an Ada 95 implementation of a replication manager for semi-active replication for the GNAT development system for Ada 95. It is entirely contained within the run-time support and hence largely transparent for the application

    Protocolos de pertenencia a grupos para entornos dinámicos

    Full text link
    Los sistemas distribuidos gozan hoy de fundamental importancia entre los sistemas de información, debido a sus potenciales capacidades de tolerancia a fallos y escalabilidad, que permiten su adecuación a las aplicaciones actuales, crecientemente exigentes. Por otra parte, el desarrollo de aplicaciones distribuidas presenta también dificultades específicas, precisamente para poder ofrecer la escalabilidad, tolerancia a fallos y alta disponibilidad que constituyen sus ventajas. Por eso es de gran utilidad contar con componentes distribuidas específicamente diseñadas para proporcionar, a más bajo nivel, un conjunto de servicios bien definidos, sobre los cuales las aplicaciones de más alto nivel puedan construir su propia semántica más fácilmente. Es el caso de los servicios orientados a grupos, de uso muy extendido por las aplicaciones distribuidas, a las que permiten abstraerse de los detalles de las comunicaciones. Tales servicios proporcionan primitivas básicas para la comunicación entre dos miembros del grupo o, sobre todo, las transmisiones de mensajes a todo el grupo, con garantías concretas. Un caso particular de servicio orientado a grupos lo constituyen los servicios de pertenencia a grupos, en los cuales se centra esta tesis. Los servicios de pertenencia a grupos proporcionan a sus usuarios una imagen del conjunto de procesos o máquinas del sistema que permanecen simultáneamente conectados y correctos. Es más, los diversos participantes reciben esta información con garantías concretas de consistencia. Así pues, los servicios de pertenencia constituyen una componente fundamental para el desarrollo de sistemas de comunicación a grupos y otras aplicaciones distribuidas. El problema de pertenencia a grupos ha sido ampliamente tratado en la literatura tanto desde un punto de vista teórico como práctico, y existen múltiples realizaciones de servicios de pertenencia utilizables. A pesar de ello, la definición del problema no es única. Por el contrario, dependienBañuls Polo, MDC. (2006). Protocolos de pertenencia a grupos para entornos dinámicos [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1886Palanci

    Open Multithreaded Transactions: A Transaction Model for Concurrent Object-Oriented Programming

    Get PDF
    To read the abstract, please go to my PhD home page
    corecore