632 research outputs found

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Dataset Condensation with Distribution Matching

    Get PDF

    Mining Butterflies in Streaming Graphs

    Get PDF
    This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection. sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals

    Multimodal Dataset Distillation for Image-Text Retrieval

    Full text link
    Dataset distillation methods offer the promise of reducing a large-scale dataset down to a significantly smaller set of (potentially synthetic) training examples, which preserve sufficient information for training a new model from scratch. So far dataset distillation methods have been developed for image classification. However, with the rise in capabilities of vision-language models, and especially given the scale of datasets necessary to train these models, the time is ripe to expand dataset distillation methods beyond image classification. In this work, we take the first steps towards this goal by expanding on the idea of trajectory matching to create a distillation method for vision-language datasets. The key challenge is that vision-language datasets do not have a set of discrete classes. To overcome this, our proposed multimodal dataset distillation method jointly distill the images and their corresponding language descriptions in a contrastive formulation. Since there are no existing baselines, we compare our approach to three coreset selection methods (strategic subsampling of the training dataset), which we adapt to the vision-language setting. We demonstrate significant improvements on the challenging Flickr30K and COCO retrieval benchmark: the best coreset selection method which selects 1000 image-text pairs for training is able to achieve only 5.6% image-to-text retrieval accuracy (recall@1); in contrast, our dataset distillation approach almost doubles that with just 100 (an order of magnitude fewer) training pairs.Comment: 28 pages, 11 figure

    Does Graph Distillation See Like Vision Dataset Counterpart?

    Full text link
    Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have attracted increasing concerns. Existing graph condensation methods primarily focus on optimizing the feature matrices of condensed graphs while overlooking the impact of the structure information from the original graphs. To investigate the impact of the structure information, we conduct analysis from the spectral domain and empirically identify substantial Laplacian Energy Distribution (LED) shifts in previous works. Such shifts lead to poor performance in cross-architecture generalization and specific tasks, including anomaly detection and link prediction. In this paper, we propose a novel Structure-broadcasting Graph Dataset Distillation (SGDD) scheme for broadcasting the original structure information to the generation of the synthetic one, which explicitly prevents overlooking the original structure information. Theoretically, the synthetic graphs by SGDD are expected to have smaller LED shifts than previous works, leading to superior performance in both cross-architecture settings and specific tasks. We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98.6% test accuracy of training on the original graph dataset with 1,000 times saving on the scale of the graph. Moreover, we empirically evaluate there exist 17.6% ~ 31.4% reductions in LED shift crossing 9 datasets. Extensive experiments and analysis verify the effectiveness and necessity of the proposed designs. The code is available in the GitHub repository: https://github.com/RingBDStack/SGDD.Comment: Accepted by NeurIPS 202

    Optimizing Flow Routing Using Network Performance Analysis

    Get PDF
    Relevant conferences were attended at which work was often presented and several papers were published in the course of this project. • Muna Al-Saadi, Bogdan V Ghita, Stavros Shiaeles, Panagiotis Sarigiannidis. A novel approach for performance-based clustering and management of network traffic flows, IWCMC, ©2019 IEEE. • M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: Unsupervised Machine Learning-Based Elephant and Mice Flow Identification, Computing Conference 2021. • M. Al-Saadi, A. Khan, V. Kelefouras, D. J. Walker, and B. Al-Saadi: SDN-Based Routing Framework for Elephant and Mice Flows Using Unsupervised Machine Learning, Network, 3(1), pp.218-238, 2023.The main task of a network is to hold and transfer data between its nodes. To achieve this task, the network needs to find the optimal route for data to travel by employing a particular routing system. This system has a specific job that examines each possible path for data and chooses the suitable one and transmit the data packets where it needs to go as fast as possible. In addition, it contributes to enhance the performance of network as optimal routing algorithm helps to run network efficiently. The clear performance advantage that provides by routing procedures is the faster data access. For example, the routing algorithm take a decision that determine the best route based on the location where the data is stored and the destination device that is asking for it. On the other hand, a network can handle many types of traffic simultaneously, but it cannot exceed the bandwidth allowed as the maximum data rate that the network can transmit. However, the overloading problem are real and still exist. To avoid this problem, the network chooses the route based on the available bandwidth space. One serious problem in the network is network link congestion and disparate load caused by elephant flows. Through forwarding elephant flows, network links will be congested with data packets causing transmission collision, congestion network, and delay in transmission. Consequently, there is not enough bandwidth for mice flows, which causes the problem of transmission delay. Traffic engineering (TE) is a network application that concerns with measuring and managing network traffic and designing feasible routing mechanisms to guide the traffic of the network for improving the utilization of network resources. The main function of traffic engineering is finding an obvious route to achieve the bandwidth requirements of the network consequently optimizing the network performance [1]. Routing optimization has a key role in traffic engineering by finding efficient routes to achieve the desired performance of the network [2]. Furthermore, routing optimization can be considered as one of the primary goals in the field of networks. In particular, this goal is directly related to traffic engineering, as it is based on one particular idea: to achieve that traffic is routed according to accurate traffic requirements [3]. Therefore, we can say that traffic engineering is one of the applications of multiple improvements to routing; routing can also be optimized based on other factors (not just on traffic requirements). In addition, these traffic requirements are variable depending on analyzed dataset that considered if it is data or traffic control. In this regard, the logical central view of the Software Defined Network (SDN) controller facilitates many aspects compared to traditional routing. The main challenge in all network types is performance optimization, but the situation is different in SDN because the technique is changed from distributed approach to a centralized one. The characteristics of SDN such as centralized control and programmability make the possibility of performing not only routing in traditional distributed manner but also routing in centralized manner. The first advantage of centralized routing using SDN is the existence of a path to exchange information between the controller and infrastructure devices. Consequently, the controller has the information for the entire network, flexible routing can be achieved. The second advantage is related to dynamical control of routing due to the capability of each device to change its configuration based on the controller commands [4]. This thesis begins with a wide review of the importance of network performance analysis and its role for understanding network behavior, and how it contributes to improve the performance of the network. Furthermore, it clarifies the existing solutions of network performance optimization using machine learning (ML) techniques in traditional networks and SDN environment. In addition, it highlights recent and ongoing studies of the problem of unfair use of network resources by a particular flow (elephant flow) and the possible solutions to solve this problem. Existing solutions are predominantly, flow routing-based and do not consider the relationship between network performance analysis and flow characterization and how to take advantage of it to optimize flow routing by finding the convenient path for each type of flow. Therefore, attention is given to find a method that may describe the flow based on network performance analysis and how to utilize this method for managing network performance efficiently and find the possible integration for the traffic controlling in SDN. To this purpose, characteristics of network flows is identified as a mechanism which may give insight into the diversity in flow features based on performance metrics and provide the possibility of traffic engineering enhancement using SDN environment. Two different feature sets with respect to network performance metrics are employed to characterize network traffic. Applying unsupervised machine learning techniques including Principal Component Analysis (PCA) and k-means cluster analysis to derive a traffic performance-based clustering model. Afterward, thresholding-based flow identification paradigm has been built using pre-defined parameters and thresholds. Finally, the resulting data clusters are integrated within a unified SDN architectural solution, which improves network management by finding the best flow routing based on the type of flow, to be evaluated against a number of traffic data sources and different performance experiments. The validation process of the novel framework performance has been done by making a performance comparison between SDN-Ryu controller and the proposed SDN-external application based on three factors: throughput, bandwidth,and data transfer rate by conducting two experiments. Furthermore, the proposed method has been validated by using different Data Centre Network (DCN) topologies to demonstrate the effectiveness of the network traffic management solution. The overall validation metrics shows real gains, the results show that 70% of the time, it has high performance with different flows. The proposed routing SDN traffic-engineering paradigm for a particular flow therefore, dynamically provisions network resources among different flow types

    Graph Condensation via Eigenbasis Matching

    Full text link
    The increasing amount of graph data places requirements on the efficiency and scalability of graph neural networks (GNNs), despite their effectiveness in various graph-related applications. Recently, the emerging graph condensation (GC) sheds light on reducing the computational cost of GNNs from a data perspective. It aims to replace the real large graph with a significantly smaller synthetic graph so that GNNs trained on both graphs exhibit comparable performance. However, our empirical investigation reveals that existing GC methods suffer from poor generalization, i.e., different GNNs trained on the same synthetic graph have obvious performance gaps. What factors hinder the generalization of GC and how can we mitigate it? To answer this question, we commence with a detailed analysis and observe that GNNs will inject spectrum bias into the synthetic graph, resulting in a distribution shift. To tackle this issue, we propose eigenbasis matching for spectrum-free graph condensation, named GCEM, which has two key steps: First, GCEM matches the eigenbasis of the real and synthetic graphs, rather than the graph structure, which eliminates the spectrum bias of GNNs. Subsequently, GCEM leverages the spectrum of the real graph and the synthetic eigenbasis to construct the synthetic graph, thereby preserving the essential structural information. We theoretically demonstrate that the synthetic graph generated by GCEM maintains the spectral similarity, i.e., total variation, of the real graph. Extensive experiments conducted on five graph datasets verify that GCEM not only achieves state-of-the-art performance over baselines but also significantly narrows the performance gaps between different GNNs.Comment: Under Revie

    You Only Condense Once: Two Rules for Pruning Condensed Datasets

    Full text link
    Dataset condensation is a crucial tool for enhancing training efficiency by reducing the size of the training dataset, particularly in on-device scenarios. However, these scenarios have two significant challenges: 1) the varying computational resources available on the devices require a dataset size different from the pre-defined condensed dataset, and 2) the limited computational resources often preclude the possibility of conducting additional condensation processes. We introduce You Only Condense Once (YOCO) to overcome these limitations. On top of one condensed dataset, YOCO produces smaller condensed datasets with two embarrassingly simple dataset pruning rules: Low LBPE Score and Balanced Construction. YOCO offers two key advantages: 1) it can flexibly resize the dataset to fit varying computational constraints, and 2) it eliminates the need for extra condensation processes, which can be computationally prohibitive. Experiments validate our findings on networks including ConvNet, ResNet and DenseNet, and datasets including CIFAR-10, CIFAR-100 and ImageNet. For example, our YOCO surpassed various dataset condensation and dataset pruning methods on CIFAR-10 with ten Images Per Class (IPC), achieving 6.98-8.89% and 6.31-23.92% accuracy gains, respectively. The code is available at: https://github.com/he-y/you-only-condense-once.Comment: Accepted by NeurIPS 202
    • …
    corecore