140,286 research outputs found
A case-based reasoning system for recommendation of data cleaning algorithms in classification and regression tasks
Recently, advances in Information Technologies (social networks, mobile applications, Internet of Things, etc.) generate a deluge of digital data; but to convert these data into useful information for business decisions is a growing challenge. Exploiting the massive amount of data through knowledge discovery (KD) process includes identifying valid, novel, potentially useful and understandable patterns from a huge volume of data. However, to prepare the data is a non-trivial refinement task that requires technical expertise in methods and algorithms for data cleaning. Consequently, the use of a suitable data analysis technique is a headache for inexpert users. To address these problems, we propose a case-based reasoning system (CBR) to recommend data cleaning algorithms for classification and regression tasks. In our approach, we represent the problem space by the meta-features of the dataset, its attributes, and the target variable. The solution space contains the algorithms of data cleaning used for each dataset. We represent the cases through a Data Cleaning Ontology. The case retrieval mechanism is composed of a filter and similarity phases. In the first phase, we defined two filter approaches based on clustering and quartile analysis. These filters retrieve a reduced number of relevant cases. The second phase computes a ranking of the retrieved cases by filter approaches, and it scores a similarity between a new case and the retrieved cases. The retrieval mechanism proposed was evaluated through a set of judges. The panel of judges scores the similarity between a query case against all cases of the case-base (ground truth). The results of the retrieval mechanism reach an average precision on judges ranking of 94.5% in top 3, for top 7 84.55%, while in top 10 78.35%.The authors are grateful to the research groups: Control Learning Systems Optimization Group (CAOS) of the Carlos III University of Madrid and Telematics Engineering Group (GIT) of the University of Cauca for the technical support. In addition, the authors are grateful to COLCIENCIAS for PhD scholarship granted to PhD. David Camilo Corrales. This work has been also supported by: Project Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT financed by Convocatoria 04C-2018 Banco de Proyectos Conjuntos UEES-Sostenibilidad of Project Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca, ID-3848. The Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
The Virtues of Redundancy in Legal Thought
Redundancy has a bad reputation among legal intellectuals. When someone says, for example, that the ninth and tenth amendments are redundant, we can be pretty sure that this person attaches little importance to these constitutional provisions. Listen to one of the definitions of redundant provided by the Oxford English Dictionary: superabundant, superfluous, excessive. \u27 In this essay, the author proposes that legal theorists pay serious attention to the concept of redundancy used by engineers. He explains how redundancy--in this special sense--is essential to any intellectual enterprise in which we try to reach action-guiding conclusions, including the enterprise of law. The author describes the virtues of redundancy in legal thought
Recommended from our members
An architecture for certification-aware service discovery
Service-orientation is an emerging paradigm for building complex systems based on loosely coupled components, deployed and consumed over the network. Despite the original intent of the paradigm, its current instantiations are limited to a single trust domain (e.g., a single organization). Also, some of the key promises of service-orientation - such as the dynamic orchestration of externally provided software services, using runtime service discovery and deployment - are still unachieved. One of the main reasons for this is the trust gap that normally arises when software services, offered by previously unknown providers, are to be selected at run-time, without any human intervention. To close this gap, the concept of machine-readable security certificates (called asserts) has been recently introduced, which paves the way to automated processing about security properties of services. Similarly to current security certification schemes, the assessment of the security properties of a service is delegated to an independent third party (certification authority), who issues a corresponding assert, bound to the service. In this paper, we propose an architecture, which exploits the assert concept to realise a certification-aware service discovery framework. The architecture supports the discovery of single services based on certified security properties (in additional to the usual functional properties), as well as the dynamic synthesis of service compositions, that satisfy the given security properties. The architecture is extensible, thus allowing for a range of domain specific matchmaking components, to cover dimensions related to, e.g., performance, cost and other non-functional characteristics
A Semantic Grid Oriented to E-Tourism
With increasing complexity of tourism business models and tasks, there is a
clear need of the next generation e-Tourism infrastructure to support flexible
automation, integration, computation, storage, and collaboration. Currently
several enabling technologies such as semantic Web, Web service, agent and grid
computing have been applied in the different e-Tourism applications, however
there is no a unified framework to be able to integrate all of them. So this
paper presents a promising e-Tourism framework based on emerging semantic grid,
in which a number of key design issues are discussed including architecture,
ontologies structure, semantic reconciliation, service and resource discovery,
role based authorization and intelligent agent. The paper finally provides the
implementation of the framework.Comment: 12 PAGES, 7 Figure
Towards improving web service repositories through semantic web techniques
The success of the Web services technology has brought topicsas software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services
via online Web service repositories and then combine them manually. However, from our analysis of these repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support
service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and, as a solution, we
report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, mapping, metadata based presentation) to improve the current situation
Knowledge data discovery and data mining in a design environment
Designers, in the process of satisfying design requirements, generally encounter difficulties in, firstly, understanding the problem and secondly, finding a solution [Cross 1998]. Often the process of understanding the problem and developing a feasible solution are developed simultaneously by proposing a solution to gauge the extent to which the solution satisfies the specific requirements. Support for future design activities has long been recognised to exist in the form of past design cases, however the varying degrees of similarity and dissimilarity found between previous and current design requirements and solutions has restrained the effectiveness of utilising past design solutions. The knowledge embedded within past designs provides a source of experience with the potential to be utilised in future developments provided that the ability to structure and manipulate that knowledgecan be made a reality. The importance of providing the ability to manipulate past design knowledge, allows the ranging viewpoints experienced by a designer, during a design process, to be reflected and supported. Data Mining systems are gaining acceptance in several domains but to date remain largely unrecognised in terms of the potential to support design activities. It is the focus of this paper to introduce the functionality possessed within the realm of Data Mining tools, and to evaluate the level of support that may be achieved in manipulating and utilising experiential knowledge to satisfy designers' ranging perspectives throughout a product's development
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
- …