92 research outputs found

    The impact of technology on data collection: Case studies in privacy and economics

    Get PDF
    Technological advancement can often act as a catalyst for scientific paradigm shifts. Today the ability to collect and process large amounts of data about individuals is arguably a paradigm-shift enabling technology in action. One manifestation of this technology within the sciences is the ability to study historically qualitative fields with a more granular quantitative lens than ever before. Despite the potential for this technology, wide-adoption is accompanied by some risks. In this thesis, I will present two case studies. The first, focuses on the impact of machine learning in a cheapest-wins motor insurance market by designing a competition-based data collection mechanism. Pricing models in the insurance industry are changing from statistical methods to machine learning. In this game, close to 2000 participants, acting as insurance companies, trained and submitted pricing models to compete for profit using real motor insurance policies --- with a roughly equal split between legacy and advanced models. With this trend towards machine learning in motion, preliminary analysis of the results suggest that future markets might realise cheaper prices for consumers. Additionally legacy models competing against modern algorithms, may experience a reduction in earning stability --- accelerating machine learning adoption. Overall, the results of this field experiment demonstrate the potential for digital competition-based studies of markets in the future. The second case studies the privacy risks of data collection technologies. Despite a large body of research in re-identification of anonymous data, the question remains: if a dataset was big enough, would records become anonymous by being "lost in the crowd"? Using 3 months of location data, we show that the risk of re-identification decreases slowly with dataset size. This risk is modelled and extrapolated to larger populations with 93% of people being uniquely identifiable using 4 points of auxiliary information among 60M people. These results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets and that alternative paradigms of data sharing are still required.Open Acces

    Crashworthy Code

    Get PDF
    Code crashes. Yet for decades, software failures have escaped scrutiny for tort liability. Those halcyon days are numbered: self-driving cars, delivery drones, networked medical devices, and other cyber-physical systems have rekindled interest in understanding how tort law will apply when software errors lead to loss of life or limb. Even after all this time, however, no consensus has emerged. Many feel strongly that victims should not bear financial responsibility for decisions that are entirely automated, while others fear that cyber-physical manufacturers must be shielded from crushing legal costs if we want such companies to exist at all. Some insist the existing liability regime needs no modernist cure, and that the answer for all new technologies is patience. This Article observes that no consensus is imminent as long as liability is pegged to a standard of “crashproof” code. The added prospect of cyber-physical injury has not changed the underlying complexities of software development. Imposing damages based on failure to prevent code crashes will not improve software quality, but will impede the rollout of cyber-physical systems. This Article offers two lessons from the “crashworthy” doctrine, a novel tort theory pioneered in the late 1960s in response to a rising epidemic of automobile accidents, which held automakers accountable for unsafe designs that injured occupants during car crashes. The first is that tort liability can be metered on the basis of mitigation, not just prevention. When code crashes are statistically inevitable, cyber-physical manufacturers may be held to have a duty to provide for safer code crashes, rather than no code crashes at all. Second, the crashworthy framework teaches courts to segment their evaluation of code, and make narrower findings of liability based solely on whether cyber-physical manufacturers have incorporated adequate software fault tolerance into their designs. Requiring all code to be perfect is impossible, but expecting code to be crashworthy is reasonable

    Toward Understanding Visual Perception in Machines with Human Psychophysics

    Get PDF
    Over the last several years, Deep Learning algorithms have become more and more powerful. As such, they are being deployed in increasingly many areas including ones that can directly affect human lives. At the same time, regulations like the GDPR or the AI Act are putting the request and need to better understand these artificial algorithms on legal grounds. How do these algorithms come to their decisions? What limits do they have? And what assumptions do they make? This thesis presents three publications that deepen our understanding of deep convolutional neural networks (DNNs) for visual perception of static images. While all of them leverage human psychophysics, they do so in two different ways: either via direct comparison between human and DNN behavioral data or via an evaluation of the helpfulness of an explainability method. Besides insights on DNNs, these works emphasize good practices: For comparison studies, we propose a checklist on how to design, conduct and interpret experiments between different systems. And for explainability methods, our evaluations exemplify that quantitatively testing widely spread intuitions can help put their benefits in a realistic perspective. In the first publication, we test how similar DNNs are to the human visual system, and more specifically its capabilities and information processing. Our experiments reveal that DNNs (1)~can detect closed contours, (2)~perform well on an abstract visual reasoning task and (3)~correctly classify small image crops. On a methodological level, these experiments illustrate that (1)~human bias can influence our interpretation of findings, (2)~distinguishing necessary and sufficient mechanisms can be challenging, and (3)~the degree of aligning experimental conditions between systems can alter the outcome. In the second and third publications, we evaluate how helpful humans find the explainability method feature visualization. The purpose of this tool is to grant insights into the features of a DNN. To measure the general informativeness and causal understanding supported via feature visualizations, we test participants on two different psychophysical tasks. Our data unveil that humans can indeed understand the inner DNN semantics based on this explainability tool. However, other visualizations such as natural data set samples also provide useful, and sometimes even \emph{more} useful, information. On a methodological level, our work illustrates that human evaluations can adjust our expectations toward explainability methods and that different claims have to match the experiment

    Programming Language Evolution and Source Code Rejuvenation

    Get PDF
    Programmers rely on programming idioms, design patterns, and workaround techniques to express fundamental design not directly supported by the language. Evolving languages often address frequently encountered problems by adding language and library support to subsequent releases. By using new features, programmers can express their intent more directly. As new concerns, such as parallelism or security, arise, early idioms and language facilities can become serious liabilities. Modern code sometimes bene fits from optimization techniques not feasible for code that uses less expressive constructs. Manual source code migration is expensive, time-consuming, and prone to errors. This dissertation discusses the introduction of new language features and libraries, exemplifi ed by open-methods and a non-blocking growable array library. We describe the relationship of open-methods to various alternative implementation techniques. The benefi ts of open-methods materialize in simpler code, better performance, and similar memory footprint when compared to using alternative implementation techniques. Based on these findings, we develop the notion of source code rejuvenation, the automated migration of legacy code. Source code rejuvenation leverages enhanced program language and library facilities by finding and replacing coding patterns that can be expressed through higher-level software abstractions. Raising the level of abstraction improves code quality by lowering software entropy. In conjunction with extensions to programming languages, source code rejuvenation o ers an evolutionary trajectory towards more reliable, more secure, and better performing code. We describe the tools that allow us efficient implementations of code rejuvenations. The Pivot source-to-source translation infrastructure and its traversal mechanism forms the core of our machinery. In order to free programmers from representation details, we use a light-weight pattern matching generator that turns a C like input language into pattern matching code. The generated code integrates seamlessly with the rest of the analysis framework. We utilize the framework to build analysis systems that find common workaround techniques for designated language extensions of C 0x (e.g., initializer lists). Moreover, we describe a novel system (TACE | template analysis and concept extraction) for the analysis of uninstantiated template code. Our tool automatically extracts requirements from the body of template functions. TACE helps programmers understand the requirements that their code de facto imposes on arguments and compare those de facto requirements to formal and informal specifications

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    DEMOCRATIZING BIOETHICS. ONLINE PARTICIPATION AND THE LIFE SCIENCES

    Get PDF
    Bioethics has historically taken up the challenge of creating an arena for the adjudication of permissibility claims for practices in the broad field of the Life Sciences. Long-standing academic arguments have thus managed to percolate into proper political debates and actual policy-making. With the pressing urge to democratize politics overall, the ways in which bioethical issues have been and still are officially discussed have been thoroughly contested. A number of solutions to the alleged lack of transparency, inclusiveness and accountability in bioethical decision-making have been suggested. Some of these solutions resorted to ICTs for their implementation. However it is unclear, so far, exactly to what extent these initiatives have been able to recruit proper participation, foster reasoned deliberation, and, most importantly, cast politically legitimate decisions. Democratizing Bioethics tackles the unresolved issues of political legitimacy that underlie the current approach to deliberative public engagement initiatives for science policy-making. In doing so, it provides a political framework in which to test political theories supposed to apply to the political management of moral disagreement. Furthermore, it articulates and defends an actual political theory\u2014moderate epistocracy for online deliberation\u2014as a proper political means to deal with disagreement that is essentially moral arising from scientific and technical progress. Finally, the theory is preliminarily empirically tested via a tool for online direct competent participation

    Towards BIM/GIS interoperability: A theoretical framework and practical generation of spaces to support infrastructure Asset Management

    Get PDF
    The past ten years have seen the widespread adoption of Building Information Modelling (BIM) among both the Architectural, Engineering and Construction (AEC) and the Asset Management/ Facilities Management (AM/FM) communities. This has been driven by the use of digital information to support collaborative working and a vision for more efficient reuse of data. Within this context, spatial information is either held in a Geographic Information Systems (GIS) or as Computer-Aided Design (CAD) models in a Common Data Environment (CDE). However, these being heterogeneous systems, there are inevitable interoperability issues that result in poor integration. For this thesis, the interoperability challenges were investigated within a case study to ask: Can a better understanding of the conceptual and technical challenges to the integration of BIM and GIS provide improved support for the management of asset information in the context of a major infrastructure project? Within their respective fields, the terms BIM and GIS have acquired a range of accepted meanings, that do not align well with each other. A seven-level socio-technical framework is developed to harmonise concepts in spatial information systems. This framework is used to explore the interoperability gaps that must be resolved to enable design and construction information to be joined up with operational asset information. The Crossrail GIS and BIM systems were used to investigate some of the interoperability challenges that arise during the design, construction and operation of an infrastructure asset. One particular challenge concerns a missing link between AM-based information and CAD-based geometry which hinders engineering assets from being located within the geometric model and preventing geospatial analysis. A process is developed to link these CAD-based elements with AM-based assets using defined 3D spaces to locate assets. However, other interoperability challenges must first be overcome; firstly, the extraction, transformation and loading of geometry from CAD to GIS; secondly, the creation of an explicit representation of each 3D space from the implicit enclosing geometry. This thesis develops an implementation of the watershed transform algorithm to use real-world Crossrail geometry to generate voxelated interior spaces that can then be converted into a B-Rep mesh for use in 3D GIS. The issues faced at the technical level in this case study provide insight into the differences that must also be addressed at the conceptual level. With this in mind, this thesis develops a Spatial Information System Framework to classify the nature of differences between BIM, GIS and other spatial information systems

    Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

    Get PDF
    Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data
    • …
    corecore