810 research outputs found

    On the non-efficient PAC learnability of conjunctive queries

    Get PDF
    This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of acyclicity; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.<p/

    Causality Diagrams using Hybrid Vector Clocks

    Full text link
    Causality in distributed systems is a concept that has long been explored and numerous approaches have been made to use causality as a way to trace distributed system execution. Traditional approaches usually used system profiling and newer approaches profiled clocks of systems to detect failures and construct timelines that caused those failures. Since the advent of logical clocks, these profiles have become more and more accurate with ways to characterize concurrency and distributions, with accurate diagrams for message passing. Vector clocks addressed the shortcomings of using traditional logical clocks, by storing information about other processes in the system as well. Hybrid vector clocks are a novel approach to this concept where clocks need not store all the process information. Rather, we store information of processes within an acceptable skew of the focused process. This gives us an efficient way of profiling with substantially reduced costs to the system. Building on this idea, we propose the idea of building causal traces using information generated from the hybrid vector clock. The hybrid vector clock would provide us with a strong sense of concurrency and distribution, and we theorize that all the information generated from the clock is sufficient to develop a causal trace for debugging. We post-process and parse the clocks generated from an execution trace to develop a swimlane on a web interface, that traces the points of failure of a distributed system. We also provide an API to reuse this concept for any generic distributed system framework

    Automated tailoring of system software stacks

    Get PDF
    In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries. However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications. In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems. Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent. These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time

    A Discourse-Analytic Approach to the Study of Information Disorders: How Online Communities Legitimate Social Bonds When Communing Around Misinformation and Disinformation

    Full text link
    Information disorders have become prevalent concerns in current social media research. This thesis is focused on the interpersonal dimension of information disorders, in other words, how we can trace, through linguistic and multimodal analysis, the social bonding that occurs when online communities commune around misinformation and disinformation, and how these social bonds are legitimated to enhance perceived credibility. Social bonding in this thesis refers to a social semiotic perspective on the shared values that communities use to construe alignment with others. False information can spread when groups have a shared vested interest, and so information disorders need to be elucidated through an investigation of sociality and bonding, rather than via logical points alone. The term ‘information disorder’ encompasses the spectrum of false information ranging from misinformation (misleading content) to disinformation (deliberately false content), and it is within this landscape of information disorders that this thesis emerges. Two key forms of social semiotic discourse analysis were applied to a dataset of YouTube videos (n=30) and comments (n=1500): affiliation (analysis of social bonding) and legitimation (analysis of resources used to construct legitimacy). The dataset constituted two contrasting case studies. The first was non-politically motivated misinformation in the form of an internet hoax leveraging moral panic about children using technologies. The second was politically motivated conspiracy theories relating to the Notre Dame Cathedral fire. The key findings of this thesis include the multimodal congruence of affiliation and legitimation across YouTube videos, the emergence of technological authority as a key legitimation strategy in online discourse, and the notion of textual personae investigating the complex array of identities that engage with information disorders in comment threads. Additionally, six macro-categories were identified regarding communicative strategies derived from comment threads: scepticism, criticism, education and expertise, nationalism, hate speech, and storytelling and conspiracy. This shows not only how information disorders are spread, but also how they can be countered. The method outlined in this thesis can be applied to future interdisciplinary analyses of political propaganda and current global concerns to develop linguistic and multimodal profiles of various communities engaging with information disorders

    Balancing Static Islands in Dynamically Scheduled Circuits using Continuous Petri Nets

    Get PDF
    High-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C&#x002B;&#x002B;, into a low-level hardware description. A key challenge in HLS is scheduling, i.e. determining the start time of all the operations in the untimed program. A major shortcoming of existing approaches to scheduling &#x2013; whether they are static (start times determined at compile-time), dynamic (start times determined at run-time), or a hybrid of both &#x2013; is that the static analysis cannot efficiently explore the run-time hardware behaviours. Existing approaches either assume the timing behaviour in extreme cases, which can cause sub-optimal performance or larger area, or use simulation-based approaches, which take a long time to explore enough program traces. In this article, we propose an efficient approach using probabilistic analysis for HLS tools to efficiently explore the timing behaviour of scheduled hardware. We capture the performance of the hardware using Timed Continous Petri nets with immediate transitions, allowing us to leverage efficient Petri net analysis tools for making HLS decisions. We demonstrate the utility of our approach by using it to automatically estimate the hardware throughput for balancing the throughput for statically scheduled components (also known as static islands) computing in a dynamically scheduled circuit. Over a set of benchmarks, we show that our approach on average incurs a 2&#x0025; overhead in area-delay product compared to optimal designs by exhaustive search

    UTP, Circus, and Isabelle

    Get PDF
    We dedicate this paper with great respect and friendship to He Jifeng on the occasion of his 80th birthday. Our research group owes much to him. The authors have over 150 publications on unifying theories of programming (UTP), a research topic Jifeng created with Tony Hoare. Our objective is to recount the history of Circus (a combination of Z, CSP, Dijkstra’s guarded command language, and Morgan’s refinement calculus) and the development of Isabelle/UTP. Our paper is in two parts. (1) We first discuss the activities needed to model systems: we need to formalise data models and their behaviours. We survey our work on these two aspects in the context of Circus. (2) Secondly, we describe our practical implementation of UTP in Isabelle/HOL. Mechanising UTP theories is the basis of novel verification tools. We also discuss ongoing and future work related to (1) and (2). Many colleagues have contributed to these works, and we acknowledge their support

    Blockchain and distributed ledger technologies for supply chain traceability: industry considerations and consumer preferences

    Get PDF
    Several businesses and academic circles were quick to proclaim blockchain, the distributed ledger technology behind digital currencies, as the solution to a plethora of industry challenges. That was especially true for supply chain management and traceability applications for coffee products, where the technology's features were viewed as a potential solution to longstanding issues of communication inefficiencies, production monitoring, and communicating provenance information to the end consumer. However, despite the excessive amount of investment, research, and experimentation, blockchain growth and adoption have stagnated. This thesis suggests that a plausible reason for the current gridlock the technology finds itself in lies in the absence of primary research that goes beyond its technical implementations and provides clear insights on both how industry professionals understand blockchain and structure their decision-making process to adopt it, as well as on how consumers perceive coffee products that utilise the technology for traceability and provenance purposes. In attempting to fill that knowledge gap, add to the overall understanding of consumer perception of provenance and traceability information and, ultimately, provide companies and organisations with actionable suggestions and insights, this PhD answers two critical questions. One addresses how industry decision-makers perceive fundamental characteristics of blockchain and identify the determining factors for deciding whether they need to adopt and implement the technology in their supply chains. The second examines using blockchain as a traceability certification solution in the coffee industry, how consumers will perceive products that utilise it, and how it compares with existing traceability certifications in the market. The online survey used to explore the views of industry professionals revealed that despite the overall positive attitudes around blockchain and the importance the technology plays in their future business plans, issues around regulatory compliance, operational frameworks and concerns around the role and nature of system participation are hindering broader adoption and implementation. Inevitably, the proposed decision- making flowchart revealed that blockchain was a suitable business solution for less than half of them. At the same time, a questionnaire based on an extended version of the Theory of Planned Behaviour combined with an online experimental study on multiple coffee certifications revealed that consumers positively value the features offered by a blockchain traceability system and found it easy to comprehend the proposed phone app format of presenting provenance information. However, a possible equation effect emerged when blockchain was compared with multiple traceability certifications in a market-like environment, highlighting the importance of consumer awareness around provenance information and the importance of product differentiation. The multifaceted insights provided in this thesis can significantly contribute to helping businesses and organisations formulate their strategies for implementing blockchain in their supply chains while also adopting a user-centred approach of considering consumer preferences and attitudes around the technology

    Balancing static islands in dynamically scheduled circuits using continuous petri nets

    Get PDF
    High-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A key challenge in HLS is scheduling, i.e. determining the start time of all the operations in the untimed program. A major shortcoming of existing approaches to scheduling – whether they are static (start times determined at compile-time), dynamic (start times determined at run-time), or a hybrid of both – is that the static analysis cannot efficiently explore the run-time hardware behaviours. Existing approaches either assume the timing behaviour in extreme cases, which can cause sub-optimal performance or larger area, or use simulation-based approaches, which take a long time to explore enough program traces. In this article, we propose an efficient approach using probabilistic analysis for HLS tools to efficiently explore the timing behaviour of scheduled hardware. We capture the performance of the hardware using Timed Continous Petri nets with immediate transitions, allowing us to leverage efficient Petri net analysis tools for making HLS decisions. We demonstrate the utility of our approach by using it to automatically estimate the hardware throughput for balancing the throughput for statically scheduled components (also known as static islands) computing in a dynamically scheduled circuit. Over a set of benchmarks, we show that our approach on average incurs a 2% overhead in area-delay product compared to optimal designs by exhaustive search

    AN EMPIRICAL STUDY OF CONCURRENT FEATURE USAGE IN GO

    Get PDF
    The Go language includes support for running functions or methods concurrently as goroutines, which are lightweight threads managed directly by the Go language runtime. Go is probably best known for the use of a channel-based, message-passing concurrency mechanism, based on Hoare's Communicating Sequential Processes (CSP), for inter-thread communication. However, Go also includes support for traditional concurrency features, such as mutexes and condition variables, that are commonly used in other languages. In this paper, we analyze the use of these traditional concurrency features, using a corpus of Go programs used in earlier work to study the use of message-passing concurrency features in Go. The goal of this work is to better support developers in using traditional concurrency features, or a combination of traditional and message-passing features, in Go
    • …
    corecore