88 research outputs found

    Relaxing coherence for modern learning applications

    Get PDF
    The main objective of this research is to efficiently execute learning (model training) of modern machine learning (ML) applications. The recent explosion in data has led to the emergence of data-intensive ML applications whose key phase is learning that requires significant amounts of computation. A unique characteristic of learning is that it is iterative- convergent, where a consistent view of memory does not always need to be guaranteed such that parallel workers are allowed to compute using stale values in intermediate computations to relax certain read-after-write data dependencies. While multiple workers read-and- modify shared model parameters multiple times during learning, incurring multiple data communication between workers, most of the data communication is redundant, due to the stale value tolerant characteristic. Relaxing coherence for these learning applications has the potential to provide extraordinary performance and energy benefits but requires innovations across the system stack from hardware and software. While considerable effort has utilized the stale value tolerance on distributed learning, still inefficient utilization of the full performance potential of this characteristic has caused modern ML applications to have low execution efficiency on the state-of-the-art systems. The inefficiency mainly comes from the lack of architectural considerations and detailed understanding of the different stale value tolerance of different ML applications. Today’s architecture, designed to cater to the needs of more traditional workloads, incurs high and often unnecessary overhead. The lack of detailed understanding has led to ambiguity for the stale value tolerance thus failing to take the full performance potential of this characteristic. This dissertation presents several innovations regarding this challenge. First, this dissertation proposes Bounded Staled Sync (BSSync), hardware support for the bounded staleness consistency model, which accompanies simple logic layers in the memory hierarchy, for reducing atomic operation overhead on data synchronization intensive workloads. The long latency and serialization caused by atomic operations have a significant impact on performance. The proposed technique overlaps the long latency atomic operation with the main computation. Compared to previous work that allows stale values for read operations, BSSync utilizes staleness for write operations, allowing stale- writes. It reduces the inefficiency coming from the data movement between where they are stored and where they are processed. Second, this dissertation presents StaleLearn, a learning acceleration mechanism to reduce the memory divergence overhead of GPU learning with sparse data. Sparse data induces divergent memory accesses with low locality, thereby consuming a large fraction of total execution time on transferring data across the memory hierarchy. StaleLearn trans- forms the problem of divergent memory accesses into the synchronization problem by replicating the model, and reduces the synchronization overhead by asynchronous synchronization on Processor-in-Memory (PIM). The stale value tolerance makes possible to clearly decompose tasks between the GPU and PIM, which can effectively exploit parallelism be- tween PIM and GPU cores by overlapping PIM operations with the main computation on GPU cores. Finally, this dissertation provides a detailed understanding of the different stale value tolerance of different ML applications. While relaxing coherence can reduce the data communication overhead, its complicated impact on the progress of learning has not been well studied thus leading to ambiguity for domain experts and modern systems. We define the stale value tolerance of ML training with the effective learning rate. The effective learning rate can be defined by the implicit momentum hyperparameter, the update density, the activation function selection, RNN cell types, and learning rate adaptation. Findings of this work will open further exploration of asynchronous learning including improving the findings laid out in this dissertation.Ph.D

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

    Performance and Reliability Evaluation of Apache Kafka Messaging System

    Get PDF
    Streaming data is now flowing across various devices and applications around us. This type of data means any unbounded, ever growing, infinite data set which is continuously generated by all kinds of sources. Examples include sensor data transmitted among different Internet of Things (IoT) devices, user activity records collected on websites and payment requests sent from mobile devices. In many application scenarios, streaming data needs to be processed in real-time because its value can be futile over time. A variety of stream processing systems have been developed in the last decade and are evolving to address rising challenges. A typical stream processing system consists of multiple processing nodes in the topology of a DAG (directed acyclic graph). To build real-time streaming data pipelines across those nodes, message middleware technology is widely applied. As a distributed messaging system with high durability and scalability, Apache Kafka has become very popular among modern companies. It ingests streaming data from upstream applications and store the data in its distributed cluster, which provides a fault-tolerant data source for stream processors. Therefore, Kafka plays a critical role to ensure the completeness, correctness and timeliness of streaming data delivery. However, it is impossible to meet all the user requirements in real-time cases with a simple and fixed data delivery strategy. In this thesis, we address the challenge of choosing a proper configuration to guarantee both performance and reliability of Kafka for complex streaming application scenarios. We investigate the features that have an impact on the performance and reliability metrics. We propose a queueing based prediction model to predict the performance metrics, including producer throughput and packet latency of Kafka. We define two reliability metrics, the probability of message loss and the probability of message duplication. We create an ANN model to predict these metrics given unstable network metrics like network delay and packet loss rate. To collect sufficient training data we build a Docker-based Kafka testbed with a fault injection module. We use a new quality-of-service metric, timely throughput to help us choosing proper batch size in Kafka. Based on this metric, we propose a dynamic configuration method, which reactively guarantees both performance and reliability of Kafka under complex operation conditions

    Increasing Code Completion Accuracy in Pythia Models for Non-Standard Python Libraries

    Get PDF
    Contemporary software development with modern programming languages leverages Integrated Development Environments, smart text editors, and similar tooling with code completion capabilities to increase the efficiency of software developers. Recent code completion research has shown that the combination of natural language processing with recurrent neural networks configured with long short-term memory can improve the accuracy of code completion predictions over prior models. It is well known that the accuracy of predictive systems based on training data is correlated to the quality and the quantity of the training data. This dissertation demonstrates that by expanding the training data set to include more references to specific Python third-party modules, the quality of the predictions increase for those specific Python third-party modules without degrading the quality of predictions of the originally represented modules

    Sample efficiency, transfer learning and interpretability for deep reinforcement learning

    Get PDF
    Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Improving Software Project Health Using Machine Learning

    Get PDF
    In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools
    • …
    corecore