231,537 research outputs found

    Framework to Automatically Determine the Quality of Open Data Catalogs

    Full text link
    Data catalogs play a crucial role in modern data-driven organizations by facilitating the discovery, understanding, and utilization of diverse data assets. However, ensuring their quality and reliability is complex, especially in open and large-scale data environments. This paper proposes a framework to automatically determine the quality of open data catalogs, addressing the need for efficient and reliable quality assessment mechanisms. Our framework can analyze various core quality dimensions, such as accuracy, completeness, consistency, scalability, and timeliness, offer several alternatives for the assessment of compatibility and similarity across such catalogs as well as the implementation of a set of non-core quality dimensions such as provenance, readability, and licensing. The goal is to empower data-driven organizations to make informed decisions based on trustworthy and well-curated data assets. The source code that illustrates our approach can be downloaded from https://www.github.com/jorge-martinez-gil/dataq/.Comment: 25 page

    A literature review of expert problem solving using analogy

    Get PDF
    We consider software project cost estimation from a problem solving perspective. Taking a cognitive psychological approach, we argue that the algorithmic basis for CBR tools is not representative of human problem solving and this mismatch could account for inconsistent results. We describe the fundamentals of problem solving, focusing on experts solving ill-defined problems. This is supplemented by a systematic literature review of empirical studies of expert problem solving of non-trivial problems. We identified twelve studies. These studies suggest that analogical reasoning plays an important role in problem solving, but that CBR tools do not model this in a biologically plausible way. For example, the ability to induce structure and therefore find deeper analogies is widely seen as the hallmark of an expert. However, CBR tools fail to provide support for this type of reasoning for prediction. We conclude this mismatch between experts’ cognitive processes and software tools contributes to the erratic performance of analogy-based prediction

    Ethical Challenges in Data-Driven Dialogue Systems

    Full text link
    The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven systems, the rise of adversarial examples, potential sources of privacy violations, safety concerns, special considerations for reinforcement learning systems, and reproducibility concerns. We also suggest areas stemming from these issues that deserve further investigation. Through this initial survey, we hope to spur research leading to robust, safe, and ethically sound dialogue systems.Comment: In Submission to the AAAI/ACM conference on Artificial Intelligence, Ethics, and Societ

    Transferable knowledge for Low-cost Decision Making in Cloud Environments

    Get PDF
    Users of Infrastructure as a Service (IaaS) are increasingly overwhelmed with the wide range of providers and services offered by each provider. As such, many users select services based on description alone. An emerging alternative is to use a decision support system (DSS), which typically relies on gaining insights from observational data in order to assist a customer in making decisions regarding optimal deployment of cloud applications. The primary activity of such systems is the generation of a prediction model (e.g. using machine learning), which requires a significantly large amount of training data. However, considering the varying architectures of applications, cloud providers, and cloud offerings, this activity is not sustainable as it incurs additional time and cost to collect data to train the models. We overcome this through developing a Transfer Learning (TL) approach where knowledge (in the form of a prediction model and associated data set) gained from running an application on a particular IaaS is transferred in order to substantially reduce the overhead of building new models for the performance of new applications and/or cloud infrastructures. In this paper, we present our approach and evaluate it through extensive experimentation involving three real world applications over two major public cloud providers, namely Amazon and Google. Our evaluation shows that our novel two-mode TL scheme increases overall efficiency with a factor of 60% reduction in the time and cost of generating a new prediction model. We test this under a number of cross-application and cross-cloud scenario
    • …
    corecore