320 research outputs found

    Understanding and Mitigating Multi-sided Exposure Bias in Recommender Systems

    Get PDF
    Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. It is especially important in multi-sided recommendation platforms where it may be crucial to optimize utilities not just for the end user, but also for other actors such as item sellers or producers who desire a fair representation of their items. Existing solutions do not properly address various aspects of multi-sided fairness in recommendations as they may either solely have one-sided view (i.e. improving the fairness only for one side), or do not appropriately measure the fairness for each actor involved in the system. In this thesis, I aim at first investigating the impact of unfair recommendations on the system and how these unfair recommendations can negatively affect major actors in the system. Then, I seek to propose solutions to tackle the unfairness of recommendations. I propose a rating transformation technique that works as a pre-processing step before building the recommendation model to alleviate the inherent popularity bias in the input data and consequently to mitigate the exposure unfairness for items and suppliers in the recommendation lists. Also, as another solution, I propose a general graph-based solution that works as a post-processing approach after recommendation generation for mitigating the multi-sided exposure bias in the recommendation results. For evaluation, I introduce several metrics for measuring the exposure fairness for items and suppliers, and show that these metrics better capture the fairness properties in the recommendation results. I perform extensive experiments to evaluate the effectiveness of the proposed solutions. The experiments on different publicly-available datasets and comparison with various baselines confirm the superiority of the proposed solutions in improving the exposure fairness for items and suppliers.Comment: Doctoral thesi

    A transfer learning-aided decision support system for multi-cloud brokers

    Get PDF
    Decision-making in a cloud environment is a formidable task due to the proliferation of service offerings, pricing models, and technology standards. A customer entering the diverse cloud market is likely to be overwhelmed with a host of difficult choices in terms of service selection. This applies to all levels of service, but Infrastructure as a Service (IaaS) level is particularly important for the end user given the fact that IaaS provides more choices and control for application developers. In the IaaS domain, however, there is no straightforward method to compare virtual machine performance and, more generally cost/performance trade-offs, within or across cloud providers. A wrong decision can result in a financial loss as well as a reduced application performance. A cloud broker can help in resolving such issues by acting as an intermediary between the cloud provider and the cloud consumer – hence, serving as a decision support system for assisting the customer in the decision process. In this thesis, we exploit machine learning for building an intelligent decision support system which assists customers in making application-driven decisions in a multi-cloud environment. The thesis examines a representative set of appropriate inference and prediction based learning techniques, that are essential for capturing application behaviour on different deployment setups, such as Polynomial Regression and Support Vector Regression (SVR). In addition, the thesis examines the efficiency of the learning techniques, recognising that machine learning can impose significant training overhead. The thesis also introduces a novel transfer learning aided technique, leading to substantial reduction in this overhead. By definition, transfer learning aims to solve the new problem faster or with a better solution by using the previously learned knowledge. Quantitatively, we observed a reduction of approximately 60% in the learning time and cost by transferring the existing knowledge about the application and cloud platform in order to learn a new prediction model for some other application or cloud provider. Intensive experimentation has been performed in this study for learning and evaluation of proposed decision support system. Explicitly, we have used three different representative applications over two cloud providers, namely Amazon and Google. Our proposed decision support system, enriched with transfer learning methods, is capable of generating decisions that are viable across different applications in a multi-cloud environment. Finally, we also discuss lessons learned in terms of architectural principles and techniques for intelligent multi-cloud brokerage

    OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

    Full text link
    Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce

    Optimization of privacy-utility trade-offs under informational self-determination

    No full text
    The pervasiveness of Internet of Things results in vast volumes of personal data generated by smart devices of users (data producers) such as smart phones, wearables and other embedded sensors. It is a common requirement, especially for Big Data analytics systems, to transfer these large in scale and distributed data to centralized computational systems for analysis. Nevertheless, third parties that run and manage these systems (data consumers) do not always guarantee users’ privacy. Their primary interest is to improve utility that is usually a metric related to the performance, costs and the quality of service. There are several techniques that mask user-generated data to ensure privacy, e.g. differential privacy. Setting up a process for masking data, referred to in this paper as a ‘privacy setting’, decreases on the one hand the utility of data analytics, while, on the other hand, increases privacy. This paper studies parameterizations of privacy settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the estimations of aggregation functions. Privacy settings can be universally applied as system-wide parameterizations and policies (homogeneous data sharing). Nonetheless they can also be applied autonomously by each user or decided under the influence of (monetary) incentives (heterogeneous data sharing). This latter diversity in data sharing by informational self-determination plays a key role on the privacy-utility trajectories as shown in this paper both theoretically and empirically. A generic and novel computational framework is introduced for measuring privacy-utility trade-offs and their Pareto optimization. The framework computes a broad spectrum of such trade-offs that form privacy-utility trajectories under homogeneous and heterogeneous data sharing. The practical use of the framework is experimentally evaluated using real-world data from a Smart Grid pilot project in which energy consumers protect their privacy by regulating the quality of the shared power demand data, while utility companies make accurate estimations of the aggregate load in the network to manage the power grid. Over 20,000 differential privacy settings are applied to shape the computational trajectories that in turn provide a vast potential for data consumers and producers to participate in viable participatory data sharing systems

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
    • …
    corecore