37 research outputs found

    Inhabiting the machine

    No full text
    Like many cities across India, Chennai (capital of Tamil Nadu) has two tiers of slums — those with official government recognition and those without. Slums with official government recognition are then further categorised to either be objectionable or unobjectionable. Recognised slums receive government funding to provide new tenements and basic services on site. But recent studies have shown that 4.8 sq km of the Chennai metropolitan area are comprised of either unrecognised or objectionable slums. The current government strategy is to forcibly relocate families from unrecognised or objectionable slums to large-scale, high-rise settlement colonies on the distant outskirts of Chennai. Numerous civil society organisations, however, have documented that eviction and relocation results in extreme trauma for these families. The Transparent Chennai Project at the Institute for Financial Management and Research in Chennai argues that: “A far more reasonable strategy would be to once again implement the Tamil Nadu Slum Clearance Act in the spirit that it was written, and start to recognise slums and improve them in situ” (Raman and Narayan). This thesis proposes that architectural design can improve conditions for Chennai’s urban poor without resorting to forced relocation. It argues that a new framework for slum housing can be designed that is capable of: protecting slum dwellers from environmental disasters such as rising sea levels, storm surge, and tsunamis; mitigating environmental pollution to improve hygiene; and providing economic sources of fresh water and energy through sustainable means. It further argues that this framework can be achieved in a culturally sensitive manner by acknowledging traditional and historically significant regional architectural typologies

    Artificial intelligence driven anomaly detection for big data systems

    Get PDF
    The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces

    Cybersecurity of Digital Service Chains

    Get PDF
    This open access book presents the main scientific results from the H2020 GUARD project. The GUARD project aims at filling the current technological gap between software management paradigms and cybersecurity models, the latter still lacking orchestration and agility to effectively address the dynamicity of the former. This book provides a comprehensive review of the main concepts, architectures, algorithms, and non-technical aspects developed during three years of investigation; the description of the Smart Mobility use case developed at the end of the project gives a practical example of how the GUARD platform and related technologies can be deployed in practical scenarios. We expect the book to be interesting for the broad group of researchers, engineers, and professionals daily experiencing the inadequacy of outdated cybersecurity models for modern computing environments and cyber-physical systems

    Cybersecurity of Digital Service Chains

    Get PDF
    This open access book presents the main scientific results from the H2020 GUARD project. The GUARD project aims at filling the current technological gap between software management paradigms and cybersecurity models, the latter still lacking orchestration and agility to effectively address the dynamicity of the former. This book provides a comprehensive review of the main concepts, architectures, algorithms, and non-technical aspects developed during three years of investigation; the description of the Smart Mobility use case developed at the end of the project gives a practical example of how the GUARD platform and related technologies can be deployed in practical scenarios. We expect the book to be interesting for the broad group of researchers, engineers, and professionals daily experiencing the inadequacy of outdated cybersecurity models for modern computing environments and cyber-physical systems

    Rack-Scale Memory Pooling for Datacenters

    Get PDF
    The rise of web-scale services has led to a staggering growth in user data on the Internet. To transform such a vast raw data into valuable information for the user and provide quality assurances, it is important to minimize access latency and enable in-memory processing. For more than a decade, the only practical way to accommodate for ever-growing data in memory has been to scale out server resources, which has led to the emergence of large-scale datacenters and distributed non-relational databases (NoSQL). Such horizontal scaling of resources translates to an increasing number of servers that participate in processing individual user requests. Typically, each user request results in hundreds of independent queries targeting different NoSQL nodes - servers, and the larger the number of servers involved, the higher the fan-out. To complete a single user request, all of the queries associated with that request have to complete first, and thus, the slowest query determines the completion time. Because of skewed popularity distributions and resource contention, the more servers we have, the harder it is to achieve high throughput and facilitate server utilization, without violating service level objectives. This thesis proposes rack-scale memory pooling (RSMP), a new scaling technique for future datacenters that reduces networking overheads and improves the performance of core datacenter software. RSMP is an approach to building larger, rack-scale capacity units for datacenters through specialized fabric interconnects with support for one-sided operations, and using them, in lieu of conventional servers (e.g. 1U), to scale out. We define an RSMP unit to be a server rack connecting 10s to 100s of servers to a secondary network enabling direct, low-latency access to the global memory of the rack. We, then, propose a new RSMP design - Scale-Out NUMA that leverages integration and a NUMA fabric to bridge the gap between local and remote memory to only 5Ă— difference in access latency. Finally, we show how RSMP impacts NoSQL data serving, a key datacenter service used by most web-scale applications today. We show that using fewer larger data shards leads to less load imbalance and higher effective throughput, without violating applicationsÂż service level objectives. For example, by using Scale-Out NUMA, RSMP improves the throughput of a key-value store up to 8.2Ă— over a traditional scale-out deployment

    Building interactive distributed processing applications at a global scale

    Get PDF
    Along with the continuous engagement with technology, many latency-sensitive interactive applications have emerged, e.g., global content sharing in social networks, adaptive lights/temperatures in smart buildings, and online multi-user games. These applications typically process a massive amount of data at a global scale. In this cases, distributing storage and processing is key to handling the large scale. Distribution necessitates handling two main aspects: a) the placement of data/processing and b) the data motion across the distributed locations. However, handling the distribution while meeting latency guarantees at large scale comes with many challenges around hiding heterogeneity and diversity of devices and workload, handling dynamism in the environment, providing continuous availability despite failures, and supporting persistent large state. In this thesis, we show how latency-driven designs for placement and data-motion can be used to build production infrastructures for interactive applications at a global scale, while also being able to address myriad challenges on heterogeneity, dynamism, state, and availability. We demonstrate a latency-driven approach is general and applicable at all layers of the stack: from storage, to processing, down to networking. We designed and built four distinct systems across the spectrum. We have developed Ambry (collaboration with LinkedIn), a geo-distributed storage system for interactive data sharing across the globe. Ambry is LinkedIn's mainstream production system for all its media content running across 4 datacenters and over 500 million users. Ambry minimizes user perceived latency via smart data placement and propagation. Second, we have built two processing systems, a traditional model, Samza, and the avant-garde model, Steel. Samza (collaboration with LinkedIn) is a production stream processing framework used at 15 companies (including LinkedIn, Uber, Netflix, and TripAdvisor), powering >200 pipelines at LinkedIn alone. Samza minimizes the impact of data motion on the end-to-end latency, thus, enabling large persistent state (100s of TB) along with processing. Steel (collaboration with Microsoft) extends processing to the emerging edge. Integrated with Azure, Steel dynamically optimizes placement and data-motion across the entire edge-cloud environment. Finally, we have designed FreeFlow, a high performance networking mechanisms for containers. Using the container placement, FreeFlow opportunistically bypasses networking layers, minimizing data motion and reducing latency (up to 3 orders of magnitude)

    Toxic Timescapes: Examining Toxicity across Time and Space

    Get PDF
    An interdisciplinary environmental humanities volume that explores human-environment relationships on our permanently polluted planet. While toxicity and pollution are ever present in modern daily life, politicians, juridical systems, media outlets, scholars, and the public alike show great difficulty in detecting, defining, monitoring, or generally coming to terms with them. This volume’s contributors argue that the source of this difficulty lies in the struggle to make sense of the intersecting temporal and spatial scales working on the human and more-than-human body, while continuing to acknowledge race, class, and gender in terms of global environmental justice and social inequality. The term toxic timescapes refers to this intricate intersectionality of time, space, and bodies in relation to toxic exposure. As a tool of analysis, it unpacks linear understandings of time and explores how harmful substances permeate temporal and physical space as both event and process. It equips scholars with new ways of creating data and conceptualizing the past, present, and future presence and possible effects of harmful substances and provides a theoretical framework for new environmental narratives. To think in terms of toxic timescapes is to radically shift our understanding of toxicants in the complex web of life. Toxicity, pollution, and modes of exposure are never static; therefore, dose, timing, velocity, mixture, frequency, and chronology matter as much as the geographic location and societal position of those exposed. Together, these factors create a specific toxic timescape that lies at the heart of each contributor’s narrative. Contributors from the disciplines of history, human geography, science and technology studies, philosophy, and political ecology come together to demonstrate the complex reality of a toxic existence. Their case studies span the globe as they observe the intersection of multiple times and spaces at such diverse locations as former battlefields in Vietnam, aging nuclear-weapon storage facilities in Greenland, waste deposits in southern Italy, chemical facilities along the Gulf of Mexico, and coral-breeding laboratories across the world.https://ohioopen.library.ohio.edu/oupress/1014/thumbnail.jp
    corecore