231,537 research outputs found
Framework to Automatically Determine the Quality of Open Data Catalogs
Data catalogs play a crucial role in modern data-driven organizations by
facilitating the discovery, understanding, and utilization of diverse data
assets. However, ensuring their quality and reliability is complex, especially
in open and large-scale data environments. This paper proposes a framework to
automatically determine the quality of open data catalogs, addressing the need
for efficient and reliable quality assessment mechanisms. Our framework can
analyze various core quality dimensions, such as accuracy, completeness,
consistency, scalability, and timeliness, offer several alternatives for the
assessment of compatibility and similarity across such catalogs as well as the
implementation of a set of non-core quality dimensions such as provenance,
readability, and licensing. The goal is to empower data-driven organizations to
make informed decisions based on trustworthy and well-curated data assets. The
source code that illustrates our approach can be downloaded from
https://www.github.com/jorge-martinez-gil/dataq/.Comment: 25 page
A literature review of expert problem solving using analogy
We consider software project cost estimation from a problem solving perspective. Taking a cognitive psychological approach, we argue that the algorithmic basis for CBR tools is not representative of human problem solving and this mismatch could account for inconsistent results. We describe the fundamentals of problem solving, focusing on experts solving ill-defined problems. This is supplemented by a systematic literature review of empirical studies of expert problem solving of non-trivial problems. We identified twelve studies. These studies suggest that analogical reasoning plays an important role in problem solving, but that CBR tools do not model this in a biologically plausible way. For example, the ability to induce structure and therefore find deeper analogies is widely seen as the hallmark of an expert. However, CBR tools fail to provide support for this type of reasoning for prediction. We conclude this mismatch between experts’ cognitive processes and software tools contributes to the erratic performance of analogy-based prediction
Ethical Challenges in Data-Driven Dialogue Systems
The use of dialogue systems as a medium for human-machine interaction is an
increasingly prevalent paradigm. A growing number of dialogue systems use
conversation strategies that are learned from large datasets. There are well
documented instances where interactions with these system have resulted in
biased or even offensive conversations due to the data-driven training process.
Here, we highlight potential ethical issues that arise in dialogue systems
research, including: implicit biases in data-driven systems, the rise of
adversarial examples, potential sources of privacy violations, safety concerns,
special considerations for reinforcement learning systems, and reproducibility
concerns. We also suggest areas stemming from these issues that deserve further
investigation. Through this initial survey, we hope to spur research leading to
robust, safe, and ethically sound dialogue systems.Comment: In Submission to the AAAI/ACM conference on Artificial Intelligence,
Ethics, and Societ
Transferable knowledge for Low-cost Decision Making in Cloud Environments
Users of Infrastructure as a Service (IaaS) are increasingly overwhelmed with the wide range of providers and services offered by each
provider. As such, many users select services based on description alone. An emerging alternative is to use a decision support system (DSS), which
typically relies on gaining insights from observational data in order to assist a customer in making decisions regarding optimal deployment of cloud
applications. The primary activity of such systems is the generation of a prediction model (e.g. using machine learning), which requires a significantly
large amount of training data. However, considering the varying architectures of applications, cloud providers, and cloud offerings, this activity is
not sustainable as it incurs additional time and cost to collect data to train the models. We overcome this through developing a Transfer Learning (TL)
approach where knowledge (in the form of a prediction model and associated data set) gained from running an application on a particular IaaS is
transferred in order to substantially reduce the overhead of building new models for the performance of new applications and/or cloud infrastructures.
In this paper, we present our approach and evaluate it through extensive experimentation involving three real world applications over two major public
cloud providers, namely Amazon and Google. Our evaluation shows that our novel two-mode TL scheme increases overall efficiency with a factor of
60% reduction in the time and cost of generating a new prediction model. We test this under a number of cross-application and cross-cloud scenario
- …