9 research outputs found

    EverMiner - towards Fully Automated KDD Process

    Get PDF

    Practical Aspects of Data Mining Using LISp-Miner

    Get PDF
    The paper describes some practical aspects of using LISp-Miner for data mining. LISp-Miner is a software tool that is under development at the University of Economics, Prague. We will review the different types of knowledge patterns discovered by the system, and discuss their applicability for various data mining tasks. We also compare LISp-Miner 18.16 with Weka 3.6.9 and Rapid Miner 5.3

    Spark solutions for discovering fuzzy association rules in Big Data

    Get PDF
    The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18 P18-RT-294

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    A new strategy for case-based reasoning retrieval using classification based on association

    Get PDF
    Cased Based Reasoning (CBR) is an important area of research in the field of Artificial Intelli-gence. It aims to solve new problems by adapting solutions, that were used to solve previous similar ones. Among the four typical phases - retrieval, reuse, revise and retain, retrieval is a key phase in CBR approach, as the retrieval of wrong cases can lead to wrong decisions. To ac-complish the retrieval process, a CBR system exploits Similarity-Based Retrieval (SBR). How-ever, SBR tends to depend strongly on similarity knowledge, ignoring other forms of knowledge, that can further improve retrieval performance.The aim of this study is to integrate class association rules (CARs) as a special case of associa-tion rules (ARs), to discover a set (of rules) that can form an accurate classifier in a database. It is an efficient method when used to build a classifier, where the target is pre-determined. The proposition for this research is to answer the question of whether CARs can be integrated into a CBR system. A new strategy is proposed that suggests and uses mining class association rules from previous cases, which could strengthen similarity based retrieval (SBR). The propo-sition question can be answered by adapting the pattern of CARs, to be compared with the end of the Retrieval phase. Previous experiments and their results to date, show a link between CARs and CBR cases. This link has been developed to achieve the aim and objectives.A novel strategy, Case-Based Reasoning using Association Rules (CBRAR) is proposed to improve the performance of the SBR and to disambiguate wrongly retrieved cases in CBR. CBRAR uses CARs to generate an optimum frequent pattern tree (FP-tree) which holds a val-ue of each node. The possible advantage offered is that more efficient results can be gained, when SBR returns uncertain answers. In addition, CBRAR has been evaluated using two sources of CBR frameworks - Jcolibri and Free CBR. With the experimental evaluation on real datasets indicating that the proposed CBRAR is a better approach when compared to CBR systems, offering higher accuracy and lower error rate

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision
    corecore