Search CORE

140,286 research outputs found

A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks

Author: Corrales Muñoz David Camilo
Corrales Juan Carlos
Ledezma Espino Agapito Ismael
Publication venue: 'Elsevier BV'
Publication date: 18/02/2020
Field of study

Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3, for top 7 84.55%, while in top 10 78.35%.The authors are grateful to the research groups: Control Learning Systems Optimization Group (CAOS) of the Carlos III University of Madrid and Telematics Engineering Group (GIT) of the University of Cauca for the technical support. In addition, the authors are grateful to COLCIENCIAS for PhD scholarship granted to PhD. David Camilo Corrales. This work has been also supported by: Project Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT financed by Convocatoria 04C-2018 Banco de Proyectos Conjuntos UEES-Sostenibilidad of Project Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca, ID-3848. The Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

Universidad Carlos III de Madrid e-Archivo

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

The Virtues of Redundancy in Legal Thought

Author: Barnett Randy E
Publication venue: Scholarship @ GEORGETOWN LAW
Publication date: 01/01/1990
Field of study

Redundancy has a bad reputation among legal intellectuals. When someone says, for example, that the ninth and tenth amendments are redundant, we can be pretty sure that this person attaches little importance to these constitutional provisions. Listen to one of the definitions of redundant provided by the Oxford English Dictionary: superabundant, superfluous, excessive. \u27 In this essay, the author proposes that legal theorists pay serious attention to the concept of redundancy used by engineers. He explains how redundancy--in this special sense--is essential to any intellectual enterprise in which we try to reach action-guiding conclusions, including the enterprise of law. The author describes the virtues of redundancy in legal thought

bepress Legal Repository

Georgetown Law Scholarly Commons

Cleveland-Marshall College of Law

Recommended from our members

An architecture for certification-aware service discovery

Author: Bezzi M.
Sabetta A.
Spanoudakis G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2011
Field of study

Service-orientation is an emerging paradigm for building complex systems based on loosely coupled components, deployed and consumed over the network. Despite the original intent of the paradigm, its current instantiations are limited to a single trust domain (e.g., a single organization). Also, some of the key promises of service-orientation - such as the dynamic orchestration of externally provided software services, using runtime service discovery and deployment - are still unachieved. One of the main reasons for this is the trust gap that normally arises when software services, offered by previously unknown providers, are to be selected at run-time, without any human intervention. To close this gap, the concept of machine-readable security certificates (called asserts) has been recently introduced, which paves the way to automated processing about security properties of services. Similarly to current security certification schemes, the assessment of the security properties of a service is delegated to an independent third party (certification authority), who issues a corresponding assert, bound to the service. In this paper, we propose an architecture, which exploits the assert concept to realise a certification-aware service discovery framework. The architecture supports the discovery of single services based on certified security properties (in additional to the usual functional properties), as well as the dynamic synthesis of service compositions, that satisfy the given security properties. The architecture is extensible, thus allowing for a range of domain specific matchmaking components, to cover dimensions related to, e.g., performance, cost and other non-functional characteristics

City Research Online

Crossref

A Semantic Grid Oriented to E-Tourism

Author: A. Maedche
A. Negri
D.D. Roure
F. Bellifemine
F. Lee
H. Zhuge
K. Prantner
M. Wooldridge
M.J. Murphy
O. Corcho
P. Alper
P.A. Buhler
S-MDS
T. Berners-Lee
X.M. Zhang
X.M. Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

With increasing complexity of tourism business models and tasks, there is a clear need of the next generation e-Tourism infrastructure to support flexible automation, integration, computation, storage, and collaboration. Currently several enabling technologies such as semantic Web, Web service, agent and grid computing have been applied in the different e-Tourism applications, however there is no a unified framework to be able to integrate all of them. So this paper presents a promising e-Tourism framework based on emerging semantic grid, in which a number of key design issues are discussed including architecture, ontologies structure, semantic reconciliation, service and resource discovery, role based authorization and intelligent agent. The paper finally provides the implementation of the framework.Comment: 12 PAGES, 7 Figure

arXiv.org e-Print Archive

Crossref

A feature-similarity model for product line engineering

Author: Kaindl Herman
Mannion Mike
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

ResearchOnline@GCU

Towards improving web service repositories through semantic web techniques

Author: Pan Jeff
Sabou Marta
Publication venue
Publication date: 01/01/2005
Field of study

The success of the Web services technology has brought topicsas software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services via online Web service repositories and then combine them manually. However, from our analysis of these repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and, as a solution, we report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, mapping, metadata based presentation) to improve the current situation

CiteSeerX

Open Research Online (The Open University)

Knowledge data discovery and data mining in a design environment

Author: Duffy Alex
Haffey Mark
Publication venue
Publication date: 01/01/2000
Field of study

Designers, in the process of satisfying design requirements, generally encounter difficulties in, firstly, understanding the problem and secondly, finding a solution [Cross 1998]. Often the process of understanding the problem and developing a feasible solution are developed simultaneously by proposing a solution to gauge the extent to which the solution satisfies the specific requirements. Support for future design activities has long been recognised to exist in the form of past design cases, however the varying degrees of similarity and dissimilarity found between previous and current design requirements and solutions has restrained the effectiveness of utilising past design solutions. The knowledge embedded within past designs provides a source of experience with the potential to be utilised in future developments provided that the ability to structure and manipulate that knowledgecan be made a reality. The importance of providing the ability to manipulate past design knowledge, allows the ranging viewpoints experienced by a designer, during a design process, to be reflected and supported. Data Mining systems are gaining acceptance in several domains but to date remain largely unrecognised in terms of the potential to support design activities. It is the focus of this paper to introduce the functionality possessed within the realm of Data Mining tools, and to evaluate the level of support that may be achieved in manipulating and utilising experiential knowledge to satisfy designers' ranging perspectives throughout a product's development

University of Strathclyde Institutional Repository

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato