14 research outputs found

    KDD, SEMMA and CRISP-DM: a parallel overview

    Get PDF
    In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them

    An architecture for an effective usage of data mining in business intelligence systems

    Get PDF
    Business Intelligence (BI) is one emergent area of the Decision Support Systems (DSS) discipline. Over the last years, the evolution in this area has been considerable. Similarly, in the last years, there has been a huge growth and consolidation of the Data Mining (DM) field. DM is being used with success in BI systems, but a truly DM integration with BI is lacking. Therefore, a lack of an effective usage of DM in BI can be found in some BI systems. An architecture that pretends to conduct to an effective usage of DM in BI is presented

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Learning with con gurable operators and RL-based heuristics

    Full text link
    In this paper, we push forward the idea of machine learning systems for which the operators can be modi ed and netuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing operators a ect how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is de ned as a choice of operator and rule. As a result, the architecture can be seen as a `system for writing machine learning systems' or to explore new operators.This work was supported by the MEC projects CONSOLIDER-INGENIO 26706 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Econom´ıa y Competitividad in Spain. Also, F. Mart´ınez-Plumed is supported by FPI-ME grant BES-2011-045099Martínez Plumed, F.; Ferri Ramírez, C.; Hernández Orallo, J.; Ramírez Quintana, MJ. (2013). Learning with con gurable operators and RL-based heuristics. En New Frontiers in Mining Complex Patterns. Springer Verlag (Germany). 7765:1-16. https://doi.org/10.1007/978-3-642-37382-4_1S1167765Armstrong, J.: A history of erlang. In: Proceedings of the Third ACM SIGPLAN Conf. on History of Programming Languages, HOPL III, pp. 1–26. ACM (2007)Brazdil, P., Giraud-Carrier: Metalearning: Concepts and systems. In: Metalearning. Cognitive Technologies, pp. 1–10. Springer, Heidelberg (2009)Daumé III, H., Langford, J.: Search-based structured prediction (2009)Dietterich, T., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73, 3–23 (2008)Dietterich, T.G., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)Džeroski, S.: Towards a general framework for data mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)Dzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001), 10.1023/A:1007694015589Dzeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer (2001)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Similarity functions for structured data. an application to decision trees. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 10(29), 109–121 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Web categorisation using distance-based decision trees. ENTCS 157(2), 35–40 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Bridging the Gap between Distance and Generalisation. Computational Intelligence (2012)Ferri-Ramírez, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Incremental learning of functional logic programs. In: Kuchen, H., Ueda, K. (eds.) FLOPS 2001. LNCS, vol. 2024, pp. 233–247. Springer, Heidelberg (2001)Gärtner, T.: Kernels for Structured Data. PhD thesis, Universitat Bonn (2005)Holland, J.H., Booker, L.B., Colombetti, M., Dorigo, M., Goldberg, D.E., Forrest, S., Riolo, R.L., Smith, R.E., Lanzi, P.L., Stolzmann, W., Wilson, S.W.: What is a learning classifier system? In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 1999. LNCS (LNAI), vol. 1813, pp. 3–32. Springer, Heidelberg (2000)Holmes, J.H., Lanzi, P., Stolzmann, W.: Learning classifier systems: New models, successful applications. Information Processing Letters (2002)Kitzelmann, E.: Inductive programming: A survey of program synthesis techniques. In: Schmid, U., Kitzelmann, E., Plasmeijer, R. (eds.) AAIP 2009. LNCS, vol. 5812, pp. 50–73. Springer, Heidelberg (2010)Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 170–178. Morgan Kaufmann Publishers Inc., San Francisco (1997)Lafferty, J., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)Lloyd, J.W.: Knowledge representation, computation, and learning in higher-order logic (2001)Maes, F., Denoyer, L., Gallinari, P.: Structured prediction with reinforcement learning. Machine Learning Journal 77(2-3), 271–301 (2009)Martínez-Plumed, F., Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Newton trees. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 174–183. Springer, Heidelberg (2010)Muggleton, S.: Inverse entailment and Progol. New Generation Computing (1995)Muggleton, S.H.: Inductive logic programming: Issues, results, and the challenge of learning language in logic. Artificial Intelligence 114(1-2), 283–296 (1999)Plotkin, G.: A note on inductive generalization. Machine Intelligence 5 (1970)Schmidhuber, J.: Optimal ordered problem solver. Maching Learning 54(3), 211–254 (2004)Srinivasan, A.: The Aleph Manual (2004)Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Tadepalli, P., Givan, R., Driessens, K.: Relational reinforcement learning: An overview. In: Proc. of the Workshop on Relational Reinforcement Learning (2004)Tamaddoni-Nezhad, A., Muggleton, S.: A genetic algorithms approach to ILP. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 285–300. Springer, Heidelberg (2003)Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Wallace, C.S., Dowe, D.L.: Refinements of MDL and MML coding. Comput. J. 42(4), 330–337 (1999)Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992

    SEMMA and CRISP-DM: a parallel overview

    Get PDF
    ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them

    Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis

    Get PDF
    In this paper we fuse together the Landscapes of Knowledge of Wille's and Exploratory Data Analysis by leveraging Formal Concept Analysis (FCA) to support data-induced scientific enquiry and discovery. We use extended FCA first by allowing K-valued entries in the incidence to accommodate other, non-binary types of data, and second with different modes of creating formal concepts to accommodate diverse conceptualizing phenomena. With these extensions we demonstrate the versatility of the Landscapes of Knowledge metaphor to help in creating new scientific and engineering knowledge by providing several successful use cases of our techniques that support scientific hypothesis-making and discovery in a range of domains: semiring theory, perceptual studies, natural language semantics, and gene expression data analysis. While doing so, we also capture the affordances that justify the use of FCA and its extensions in scientific discovery.FJVA and AP were partially supported by EUFP7 project LiMo- SINe (contract288024) for this research. CPM was partially supported by the Spanish Ministry of Economics and Competitiveness projects TEC2014-61729-EXP and TEC2014-53390-P

    Data mining languages for business intelligence

    Get PDF
    Tese de doutoramento in Information Systems and Technologies (area of Engineering and Management Information Systems)Desde que Lunh usou, pela primeira vez, em 1958, o termo Business Intelligence (BI), grandes transformações se operaram na área dos sistemas e tecnologias de informação e, em especial, na área dos sistemas de apoio à decisão. Atualmente, os sistemas de BI são amplamente utilizados nas organizações e a sua importância estratégica é largamente reconhecida. Estes sistemas apresentam-se como essenciais para um completo conhecimento do negócio e como uma ferramenta insubstituível no apoio à tomada de decisão. A divulgação das ferramentas de Data Mining (DM) tem vindo a aumentar na área do BI, assim como o reconhecimento da relevância da sua utilização nos sistemas de BI empresariais. As ferramentas de BI são ferramentas amigáveis, iterativas e interativas, permitindo aos utilizadores finais um acesso fácil. Desta forma, é possível ao utilizador final manipular diretamente os dados, tendo assim a possibilidade de extrair todo o valor para o negócio neles contido. Um dos problemas apontados na utilização do DM na área do BI prende-se com o facto de os modelos de DM serem, em geral, demasiado complexos para que os utilizadores de negócio os possam manipular diretamente, contrariamente ao que ocorre com as outras ferramentas de BI. Neste contexto, foi identificado como problema de investigação a não existência de ferramentas de BI que possibilitem ao utilizador de negócio a manipulação direta dos modelos de DM e, consequentemente, não possibilitando extrair todo o valor potencial neles contidos. Este aspeto reveste-se de particular importância num universo empresarial no qual a concorrência é cada vez mais forte e no qual o conhecimento do negócio, das variáveis envolvidas e dos potenciais cenários representam um papel fundamental para as organizações poderem concorrer num mercado extremamente exigente. Considerando que os sistemas de BI assentam, maioritariamente, sobre sistemas operacionais que utilizam sobretudo o modelo relacional de bases de dados, a investigação efetuada inspirouse nos conceitos ligados ao modelo relacional de bases de dados e nas linguagens a ele associadas em particular as linguagens Query-By-Example (QBE). Estas linguagens têm uma forte componente de interactividade, são amigáveis e permitem iteratividade e são amplamente utilizadas em ambiente de negócio pelos utilizadores finais. Têm vindo a ser desenvolvidos esforços no sentido do desenvolvimento de padrões e normas na área do DM, sendo dada grande relevância ao tema das bases de dados indutivas. No contexto das bases de dados indutivas é dada grande relevância às chamadas linguagens de DM. Estes conceitos serviram, igualmente, de inspiração a esta investigação. Apesar da importância destas linguagens de DM, elas não estão orientadas para os utilizadores finais em ambientes de negócio. Ligando os conceitos relacionados com as linguagens QBE e com as linguagens de DM, foi concebida e implementada uma linguagem de DM para BI, à qual foi dado o nome QMBE. Esta nova linguagem é por natureza amigável, iterativa e interativa, isto é, apresenta as mesmas características que as ferramentas de BI habituais permitindo aos utilizadores finais a manipulação direta dos modelos de DM e, deste modo, aceder a todo o valor potencial desses modelos com todos as vantagens que daí poderão advir. Utilizando um protótipo de um sistema de BI, a linguagem foi implementada, testada e avaliada conceptualmente. Verificou-se que a linguagem possui as propriedades desejadas, a saber, é amigável, iterativa, interativa. Finalmente, a linguagem foi avaliada por utilizadores finais que já tinham experiência anterior na utilização de DM em contexto de BI. Verificou-se que na ótica destes utilizadores a utilização da linguagem apresenta vantagens em relação à utilização tradicional de DM no âmbito do BI.Since Lunh first used the term Business Intelligence (BI) in 1958, major transformations happened in the field of information systems and technologies, especially in the area of decision support systems. Nowadays, BI systems are widely used in organizations and their strategic importance is clearly recognized. These systems present themselves as an essential part of a complete knowledge of business and an irreplaceable tool in the support to decision making. The dissemination of data mining (DM) tools is increasing in the BI field, as well as the acknowledgement of the relevance of its usage in enterprise BI systems. BI tools are friendly, iterative and interactive, allowing business users an easy access. This way, the user can directly manipulate data, thus having the possibility to extract all the value contained into that business data. One of the problems noted in the use of DM in the field of BI is related to the fact that DM models are, generally, too complex in order to be directly manipulated by business users, as opposite to other BI tools. Within this context, the nonexistence of BI tools allowing business users the direct manipulation of DM models was identified as the research problem, since that, as a consequence of business users not directly manipulating DM models, they can be not able of extracting all the potential value contained in DM models. This aspect has a particular relevance in an entrepreneurial universe where competition is stronger every day and the knowledge of the business, the variables involved and the possible scenarios play a fundamental role in allowing organizations to compete in an extremely demanding market. Considering that the majority of BI systems are built on top of operational systems, which use mainly the relational model for databases, the research was inspired on the concepts related to this model and associated languages in particular Query-By-Example (QBE) languages. These languages are widely used by business users in business environments, and have got a strong interactivity component, are user-friendly, and allow for iterativeness. Efforts are being developed in order to create standards and rules in the field of DM with great relevance being given to the subject of inductive databases. Within the context of inductive databases a great relevance is given to the so called DM languages. These concepts were also an inspiration for this research. Despite their importance, these languages are not oriented to business users in business environments. Linking concepts related with QBE languages and with DM languages, a new DM language for BI, named as Query-Models-By-Example (QMBE) was conceived and implemented. This new language is, by nature, user-friendly, iterative and interactive; it presents the same characteristics as the usual BI tools allowing business users the direct manipulation of DM models and, through this, the access to the potential value of these models with all the advantages that may arise. Using a BI system prototype, the language was implemented, tested, and conceptually evaluated. It has been verified that the language possesses the desired properties, namely, being userfriendly, iterative, and interactive. The language was evaluated later by business users who were already experienced in using DM within the context of BI. It has been verified that, according to these users, using the language presents advantages when comparing to the traditional use of DM within BI

    Data mining languages for business intelligence

    Get PDF
    Doctoral Thesis in Information Systems and Technologies Area of Engineering and Manag ement Information SystemsDesde que Lunh usou, pela primeira vez, em 1958, o termo Business Intelligence (BI), grandes transformações se operaram na área dos sistemas e t ecnologias de informação e, em especial, na área dos sistemas de apoio à decisão. Atualmente , os sistemas de BI são amplamente utilizados nas organizações e a sua importância est ratégica é largamente reconhecida. Estes sistemas apresentam-se como essenciais para um comp leto conhecimento do negócio e como uma ferramenta insubstituível no apoio à tomada de decisão. A divulgação das ferramentas de Data Mining (DM) tem vindo a aumentar na área do BI, assim como o reconhecimento da relevância da sua utilização nos sistemas de BI emp resariais. As ferramentas de BI são ferramentas amigáveis, ite rativas e interativas, permitindo aos utilizadores finais um acesso fácil. Desta forma, é possível ao utilizador final manipular diretamente os dados, tendo assim a possibilidade d e extrair todo o valor para o negócio neles contido. Um dos problemas apontados na utilização d o DM na área do BI prende-se com o facto de os modelos de DM serem, em geral, demasiado comp lexos para que os utilizadores de negócio os possam manipular diretamente, contrariam ente ao que ocorre com as outras ferramentas de BI. Neste contexto, foi identificado como problema de i nvestigação a não existência de ferramentas de BI que possibilitem ao utilizador de negócio a m anipulação direta dos modelos de DM e, consequentemente, não possibilitando extrair todo o valor potencial neles contidos. Este aspeto reveste-se de particular importância num universo e mpresarial no qual a concorrência é cada vez mais forte e no qual o conhecimento do negócio, das variáveis envolvidas e dos potenciais cenários representam um papel fundamental para as o rganizações poderem concorrer num mercado extremamente exigente. Considerando que os sistemas de BI assentam, maiori tariamente, sobre sistemas operacionais que utilizam sobretudo o modelo relacional de bases de dados, a investigação efetuada inspirou- se nos conceitos ligados ao modelo relacional de ba ses de dados e nas linguagens a ele associadas em particular as linguagens Query-By-Exa mple (QBE). Estas linguagens têm uma forte componente de interactividade, são amigáveis e permitem iteratividade e são amplamente utilizadas em ambiente de negócio pelos utilizadore s finais. Têm vindo a ser desenvolvidos esforços no sentido d o desenvolvimento de padrões e normas na área do DM, sendo dada grande relevância ao tema da s bases de dados indutivas. No contexto Data mining languages for business intelligence iv das bases de dados indutivas é dada grande relevânc ia às chamadas linguagens de DM. Estes conceitos serviram, igualmente, de inspiração a est a investigação. Apesar da importância destas linguagens de DM, elas não estão orientadas para os utilizadores finais em ambientes de negócio. Ligando os conceitos relacionados com as linguagens QBE e com as linguagens de DM, foi concebida e implementada uma linguagem de DM para B I, à qual foi dado o nome QMBE. Esta nova linguagem é por natureza amigável, iterativa e interativa, isto é, apresenta as mesmas características que as ferramentas de BI habituais permitindo aos utilizadores finais a manipulação direta dos modelos de DM e, deste modo, aceder a todo o valor potencial desses modelos com todos as vantagens que daí poderão advi r. Utilizando um protótipo de um sistema de BI, a linguagem foi implementada, testada e aval iada conceptualmente. Verificou-se que a linguagem possui as propriedades desejadas, a saber , é amigável, iterativa, interativa. Finalmente, a linguagem foi avaliada por utilizador es finais que já tinham experiência anterior na utilização de DM em contexto de BI. Verificou-se qu e na ótica destes utilizadores a utilização da linguagem apresenta vantagens em relação à utilizaç ão tradicional de DM no âmbito do BI.Since Lunh first used the term Business Intelligenc e (BI) in 1958, major transformations happened in the field of information systems and te chnologies, especially in the area of decision support systems. Nowadays, BI systems are widely us ed in organizations and their strategic importance is clearly recognized. These systems pre sent themselves as an essential part of a complete knowledge of business and an irreplaceable tool in the support to decision making. The dissemination of data mining (DM) tools is increasi ng in the BI field, as well as the acknowledgement of the relevance of its usage in en terprise BI systems. BI tools are friendly, iterative and interactive, a llowing business users an easy access. This way, the user can directly manipulate data, thus having the possibility to extract all the value contained into that business data. One of the problems noted in the use of DM in the field of BI is related to the fact that DM models are, generally, too complex in order to be directly manipulated by business users, as opposite to other BI tools. Within this context, the nonexistence of BI tools a llowing business users the direct manipulation of DM models was identified as the research problem , since that, as a consequence of business users not directly manipulating DM models, they can be not able of extracting all the potential value contained in DM models. This aspect has a par ticular relevance in an entrepreneurial universe where competition is stronger every day an d the knowledge of the business, the variables involved and the possible scenarios play a fundamental role in allowing organizations to compete in an extremely demanding market. Considering that the majority of BI systems are bui lt on top of operational systems, which use mainly the relational model for databases, the rese arch was inspired on the concepts related to this model and associated languages in particular Q uery-By-Example (QBE) languages. These languages are widely used by business users in busi ness environments, and have got a strong interactivity component, are user-friendly, and all ow for iterativeness. Efforts are being developed in order to create stan dards and rules in the field of DM with great relevance being given to the subject of inductive d atabases. Within the context of inductive databases a great relevance is given to the so call ed DM languages. These concepts were also an inspiration for this research. Despite their import ance, these languages are not oriented to business users in business environments. Data mining languages for business intelligence vi Linking concepts related with QBE languages and wit h DM languages, a new DM language for BI, named as Query-Models-By-Example (QMBE) was conceiv ed and implemented. This new language is, by nature, user-friendly, iterative an d interactive; it presents the same characteristics as the usual BI tools allowing business users the d irect manipulation of DM models and, through this, the access to the potential value of these mo dels with all the advantages that may arise. Using a BI system prototype, the language was imple mented, tested, and conceptually evaluated. It has been verified that the language possesses th e desired properties, namely, being user- friendly, iterative, and interactive. The language was evaluated later by business users who were already experienced in using DM within the context of BI. It has been verified that, according to these users, using the language presents advantages when comparing to the traditional use of DM within BI
    corecore