30 research outputs found

    Combatting Advanced Persistent Threat via Causality Inference and Program Analysis

    Get PDF
    Cyber attackers are becoming more and more sophisticated. In particular, Advanced Persistent Threat (APT) is a new class of attack that targets a specifc organization and compromises systems over a long time without being detected. Over the years, we have seen notorious examples of APTs including Stuxnet which disrupted Iranian nuclear centrifuges and data breaches affecting millions of users. Investigating APT is challenging as it occurs over an extended period of time and the attack process is highly sophisticated and stealthy. Also, preventing APTs is diffcult due to ever-expanding attack vectors. In this dissertation, we present proposals for dealing with challenges in attack investigation. Specifcally, we present LDX which conducts precise counter-factual causality inference to determine dependencies between system calls (e.g., between input and output system calls) and allows investigators to determine the origin of an attack (e.g., receiving a spam email) and the propagation path of the attack, and assess the consequences of the attack. LDX is four times more accurate and two orders of magnitude faster than state-of-the-art taint analysis techniques. Moreover, we then present a practical model-based causality inference system, MCI, which achieves precise and accurate causality inference without requiring any modifcation or instrumentation in end-user systems. Second, we show a general protection system against a wide spectrum of attack vectors and methods. Specifcally, we present A2C that prevents a wide range of attacks by randomizing inputs such that any malicious payloads contained in the inputs are corrupted. The protection provided by A2C is both general (e.g., against various attack vectors) and practical (7% runtime overhead)

    Mapreduce and Heterogeneity: Power-Aware Bag-of-Tasks, Framework Parameter Sensitivity, and Dynamic Cluster Aware Framework Configuration

    Get PDF
    This dissertation presents the techniques for adaptation of MapReduce frameworks to incorporate heterogeneity-aware scheduling algorithms, an inspection of cluster configurations and how they impact these scheduling algorithms, an analysis regarding how the cluster configuration and the heterogeneity-aware scheduling can work together to minimize turnaround time and/or power consumption of the cluster when executing MapReduce applications, and how these lessons can be applied more broadly to Big Data infrastructure outside of MapReduce that supports multiple Big Data frameworks simultaneously. Heterogeneity exists in various capacities in any given cluster, from static (Physical and Platform) heterogeneity to dynamic heterogeneity (Transient Data, Transient Applications, and Irregular Hardware Behavior). Within the cluster there are historically several types of mitigation strategies for each of these types of heterogeneity, and each has their pros and cons. We discuss these mitigation strategies and the types of heterogeneity each of these strategies is able to address, and the history of the related work in the field. After this, we consider taking host-level metrics and using them to schedule tasks in real time, with a desire to address cluster-wide energy usage. To do this, we consider estimators for power consumption that are available on-chip, namely temperature. We establish a correlation between CPU temperature and power consumption, then derive a scheduling algorithm that eliminates nodes that are consuming too much power from the pool of schedule-able resources. In order to do this we focus on the ability of MapReduce frameworks, constructed as we have constructed the frameworks described in this thesis, to delay binding of tasks to specific workers. We analyze the impacts this has on turnaround time of a MapReduce application, with analysis around setting this threshold properly to reduce impact on turnaround time while shifting power consumption around in the cluster, away from nodes that are over-consuming. We also address concerns with respect to upgrading a cluster in stages, introducing more Physical Heterogeneity at various levels and the types of adjustments that need to be made to MapReduce configurations in order to combat the increased Heterogeneity. In particular, we look at the concerns for MapReduce platform mis-configuration and its impacts on turnaround time, analyzing the ways in which these types of errors can be mitigated between incremental platform upgrades. In an effort to address this, we introduce a Dynamic Heterogeneity Awareness (DHA) module to our MapReduce framework in order to address these upgrades, and allow better spreading of tasks by the framework, in order to further improve turnaround time and resource utilization. Finally we consider the implications for framework and application co-tenancy, and we describe the state of art in these areas. We focus on describing what co-tenancy is, why it\u27s important, and how the state of the art can be expanded to in order to leverage findings from this thesis to make these co-tenant clusters increase application and framework performance as well as improving these clusters with considerations for energy efficiency

    FabTouch: A Tool to Enable Communication and Design of Tactile and Affective Fabric Experiences

    Get PDF
    The tactile experience of fabric is not only a sensory experience but also an affective one. Our choice of fabric products, like clothing, is often based on how they feel. Effectively communicating such experiences is crucial for designing tactile fabric experiences. However, there remains a lack of comprehensive understanding of the fabric tactile and affective experiences, preventing the development of tools to facilitate the communication of these experiences. In this paper, we examine the fabric experiences of 27 participants towards nine cotton samples. We combine qualitative and quantitative methods to create FabTouch, a novel tool to facilitate a dialogue in the design of fabric experiences. We found six phases of fabric touch experiences including fabric touch responses, sensory associations, and emotional responses. Initial feedback from designers suggested that FabTouch could enrich design processes both in practice and in education and can create inspiration for physical and digital design explorations

    RF Sensing Based Breathing Patterns Detection Leveraging USRP Devices

    Get PDF
    Non-contact detection of the breathing patterns in a remote and unobtrusive manner has significant value to healthcare applications and disease diagnosis, such as in COVID-19 infection prediction. During the epidemic prevention and control period of COVID-19, non-contact approaches have great significance because they minimize the physical burden on the patient and have the least requirement of active cooperation of the infected individual. During the pandemic, these non-contact approaches also reduce environmental constraints and remove the need for extra preparations. According to the latest medical research, the breathing pattern of a person infected with COVID-19 is unlike the breathing associated with flu and the common cold. One noteworthy symptom that occurs in COVID-19 is an abnormal breathing rate; individuals infected with COVID-19 have more rapid breathing. This requires continuous real-time detection of breathing patterns, which can be helpful in the prediction, diagnosis, and screening for people infected with COVID-19. In this research work, software-defined radio (SDR)-based radio frequency (RF) sensing techniques and machine learning (ML) algorithms are exploited to develop a platform for the detection and classification of different abnormal breathing patterns. ML algorithms are used for classification purposes, and their performance is evaluated on the basis of accuracy, prediction speed, and training time. The results show that this platform can detect and classify breathing patterns with a maximum accuracy of 99.4% through a complex tree algorithm. This research has a significant clinical impact because this platform can also be deployed for practical use in pandemic and non-pandemic situations

    Memory Subsystems for Security, Consistency, and Scalability

    Get PDF
    In response to the continuous demand for the ability to process ever larger datasets, as well as discoveries in next-generation memory technologies, researchers have been vigorously studying memory-driven computing architectures that shall allow data-intensive applications to access enormous amounts of pooled non-volatile memory. As applications continue to interact with increasing amounts of components and datasets, existing systems struggle to eÿciently enforce the principle of least privilege for security. While non-volatile memory can retain data even after a power loss and allow for large main memory capacity, programmers have to bear the burdens of maintaining the consistency of program memory for fault tolerance as well as handling huge datasets with traditional yet expensive memory management interfaces for scalability. Today’s computer systems have become too sophisticated for existing memory subsystems to handle many design requirements. In this dissertation, we introduce three memory subsystems to address challenges in terms of security, consistency, and scalability. Specifcally, we propose SMVs to provide threads with fne-grained control over access privileges for a partially shared address space for security, NVthreads to allow programmers to easily leverage nonvolatile memory with automatic persistence for consistency, and PetaMem to enable memory-centric applications to freely access memory beyond the traditional process boundary with support for memory isolation and crash recovery for security, consistency, and scalability

    Online Data Cleaning

    Get PDF
    Data-centric applications have never been more ubiquitous in our lives, e.g., search engines, route navigation and social media. This has brought along a new age where digital data is at the core of many decisions we make as individuals, e.g., looking for the most scenic route to plan a road trip, or as professionals, e.g., analysing customers’ transactions to predict the best time to restock different products. However, the surge in data generation has also led to creating massive amounts of dirty data, i.e., inaccurate or redundant data. Using dirty data to inform business decisions comes with dire consequences, for instance, an IBM report estimates that dirty data costs the U.S. $3.1 trillion a year. Dirty data is the product of many factors which include data entry errors and integration of several data sources. Data integration of multiple sources is especially prone to producing dirty data. For instance, while individual sources may not have redundant data, they often carry redundant data across each other. Furthermore, different data sources may obey different business rules (sometimes not even known) which makes it challenging to reconcile the integrated data. Even if the data is clean at the time of the integration, data updates would compromise its quality over time. There is a wide spectrum of errors that can be found in the data, e,g, duplicate records, missing values, obsolete data, etc. To address these problems, several data cleaning efforts have been proposed, e.g., record linkage to identify duplicate records, data fusion to fuse duplicate data items into a single representation and enforcing integrity constraints on the data. However, most existing efforts make two key assumptions: (1) Data cleaning is done in one shot; and (2) The data is available in its entirety. Those two assumptions do not hold in our age where data is highly volatile and integrated from several sources. This calls for a paradigm shift in approaching data cleaning: it has to be made iterative where data comes in chunks and not all at once. Consequently, cleaning the data should not be repeated from scratch whenever the data changes, but instead, should be done only for data items affected by the updates. Moreover, the repair should be computed effciently to support applications where cleaning is performed online (e.g. query time data cleaning). In this dissertation, we present several proposals to realize this paradigm for two major types of data errors: duplicates and integrity constraint violations. We frst present a framework that supports online record linkage and fusion over Web databases. Our system processes queries posted to Web databases. Query results are deduplicated, fused and then stored in a cache for future reference. The cache is updated iteratively with new query results. This effort makes it possible to perform record linkage and fusion effciently, but also effectively, i.e., the cache contains data items seen in previous queries which are jointly cleaned with incoming query results. To address integrity constraints violations, we propose a novel way to approach Functional Dependency repairs, develop a new class of repairs and then demonstrate it is superior to existing efforts, in runtime and accuracy. We then show how our framework can be easily tuned to work iteratively to support online applications. We implement a proof-ofconcept query answering system to demonstrate the iterative capability of our system

    The Second Conference on Lunar Bases and Space Activities of the 21st Century, volume 2

    Get PDF
    These 92 papers comprise a peer-reviewed selection of presentations by authors from NASA, the Lunar and Planetary Institute (LPI), industry, and academia at the Second Conference on Lunar Bases and Space Activities of the 21st Century. These papers go into more technical depth than did those published from the first NASA-sponsored symposium on the topic, held in 1984. Session topics included the following: (1) design and operation of transportation systems to, in orbit around, and on the Moon; (2) lunar base site selection; (3) design, architecture, construction, and operation of lunar bases and human habitats; (4) lunar-based scientific research and experimentation in astronomy, exobiology, and lunar geology; (5) recovery and use of lunar resources; (6) environmental and human factors of and life support technology for human presence on the Moon; and (7) program management of human exploration of the Moon and space
    corecore