20 research outputs found

    An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications

    Get PDF
    This paper surveys the growing number of indu5 trial applications of data mining and knowledge discovery. We look at the existing tools, describe some representative applications, and discuss the major issues and problems for building and deploying successful applications and their adoption by business users. Finally, we examine how to assess the potential of a knowledge discovery application.

    Using a Neuro-Fuzzy-Genetic Data Mining Architecture to Determine a Marketing Strategy in a Charitable Organization\u27s Donor Database

    Get PDF
    This paper describes the use of a neuro-fuzzy-genetic data mining architecture for finding hidden knowledge and modeling the data of the 1997 donation campaign of an American charitable organization. This data was used during the 1998 KDD Cup competition. In the architecture, all input variables are first preprocessed and all continuous variables are fuzzified. Principal component analysis (PCA) is then applied to reduce the dimensions of the input variables in finding combinations of variables, or factors, that describe major trends in the data. The reduced dimensions of the input variables are then used to train probabilistic neural networks (PNN) to classify the dataset according to the groups considered. A rule extraction technique is then applied in order to extract hidden knowledge from the trained neural networks and represent the knowledge in the form of crisp and fuzzy if-then-rules. In the final stage a genetic algorithm is used as a rule-pruning module to eliminate weak rules that are still in the rule base while insuring that the classification accuracy of the rule base is improved or not changed. The pruned rule base helps the charitable organization to maximize the donation and to understand the characteristics of the respondents of the direct mail fund raising campaig

    Data Mining for Browsing Patterns in Weblog Data by Art Neural Networks

    Get PDF
    Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites

    Relationship of data analytics in the productivity and growth of organizations

    Get PDF
    Cada vez son más las organizaciones a nivel mundial que entienden la importancia de manejar de forma eficaz la información tanto interna como externa que se produce y genera en torno a la cadena de valor de sus operaciones. Sin embargo, aún son muchos los desafíos de esta disciplina en Latinoamérica, como es el caso de Colombia, teniendo en cuenta que se ha venido aplicando más desde su enfoque descriptivo que el predictivo; dejando así de aprovechar métodos y técnicas que le permitirían alcanzar una ventaja competitiva. Siendo el objetivo general de este análisis establecer la relación que existe entre la Analítica de Datos en la productividad y crecimiento de las organizaciones. Para ello se asumió un estudio cualitativo, bajo un diseño documental exhaustivo, con referentes de bases reconocidas, a través del cual se logró no solo definir cuáles son los beneficios de su implementación en cualquier tipo de empresa u organización, sino a su vez, visualizar el panorama o escenario futuro que se presenta en la transformación digital y los avances tecnológicos para el desarrollo de la economía global. Dejando claro que la relación de la Analítica de Datos en la productividad y crecimiento de las organizaciones se encuentra en su capacidad de predecir o anticiparse a los cambios de las economías mundiales, adaptándose con facilidad y eficiencia a los nuevos procesos, que terminan siendo en su mayoría, digitalizados sobre una plataforma 4.0. Capitalizando de esta forma su mayor recurso “el Talento Humano”, quienes han de manejar de forma eficaz los datos y propiciar con ello la toma efectiva de decisiones.Universidad Libre Seccional Cúcuta - Facultad de Ingenierías -- Ingeniería en Tecnología de la Información y las ComunicacionesMore and more organizations worldwide understand the importance of effectively managing both internal and external information that is produced and generated around the value chain of their operations. However, there are still many challenges of this discipline in Latin Relación de la Analítica de Datos en la productividad y crecimiento de las organizaciones Relationship of Data Analytics in the productivity and growth of organizations America, as is the case of Colombia, taking into account that it has been applied more from its descriptive approach than the predictive one; thus ceasing to take advantage of methods and techniques that would allow it to achieve a competitive advantage. Being the general objective of this analysis to establish the relationship that exists between Data Analytics in the productivity and growth of organizations. For this, a qualitative study was assumed, under an exhaustive documentary design, with references from recognized bases, through which it was possible not only to define the benefits of its implementation in any type of company or organization, but also to visualize the panorama or future scenario that is presented in the digital transformation and technological advances for the development of the global economy. Making it clear that the relationship of Data Analytics in the productivity and growth of organizations is found in its ability to predict or anticipate changes in world economies, easily and efficiently adapting to new processes, which end up being in their majority, digitized on a 4.0 platform. Capitalizing in this way its greatest resource "Human Talent", who have to effectively manage the data and thereby promote effective decision making

    Predictive Modeling of Fuel Efficiency of Trucks

    Get PDF
    This research studied the behavior of several controllable variables that affect the fuel efficiency of trucks. Re-routing is the process of modifying the parameters of the routes for a set of trips to optimize fuel consumption and also to increase customer satisfaction through efficient deliveries. This is an important process undertaken by a food distribution company to modify the trips to adapt to the immediate necessities. A predictive model was developed to calculate the change in Miles per Gallon (MPG) whenever a re-route is performed on a region of a particular distribution area. The data that was used, was from the Dallas center which is one of the distribution centers owned by the company. A consistent model that could provide relatively accurate predictions across five distribution centers had to be developed. It was found that the model built using the data from the Corporate center was the most consistent one. The timeline of the data used to build the model was from May 2013 through December 2013. The predictive model provided predictions of which about 88% of the data that was used, was within the 0-10% error group. This was an improvement on the lesser 43% obtained for the linear regression and K-means clustering models. The model was also validated on the data for January 2014 through the first two weeks of March 2014 and it provided predictions of which about 81% of the data was within the 0-10 % error group. The average overall error was around 10%, which was the least for the approaches explored in this research. Weight, stop count and stop time were identified as the most significant factors which influence the fuel efficiency of the trucks. Further, neural network architecture was built to improve the predictions of the MPG. The model can be used to predict the average change in MPG for a set of trips whenever a re-route is performed. Since the aim of re-routing is to reduce the miles and trips; extra load will be added to the remaining trips. Although, the MPG would decrease because of this extra load, it would be offset by the savings due to the drop in miles and trips. The net savings in the fuel can now be translated into the amount of money saved

    Application of knowledge discovery in databases : automating manual tasks

    Get PDF
    Businesses have large data stored in databases and data warehouses that is beyond the scope of traditional analysis methods. Knowledge discovery in databases (KDD) has been applied to get insight from this large business data. In this study, I investigated the application of KDD to automate two manual tasks in a Finnish company that pro-vides financial automation solutions. The objective of the study was to develop mod-els from historical data and use the models to handle future transactions to minimize or omit the manual tasks. Historical data about the manual tasks was extracted from the database. The data was prepared and three machine learning methods were used to develop classification models from the data. The three machine learning methods used are decision tree, Na-ïve Bayes, and k-nearest neighbor. The developed models were evaluated on test data. The models were evaluated based on accuracy and prediction rate. Overall, decision tree had the highest accuracy while k-nearest neighbor has the highest prediction rate. However, there were significant differences in performance across datasets. Overall, the results show that there are patterns in the data that can be used to auto-mate the manual tasks. Due to time constraints data preparation was not done thoroughly. In future iterations, a better data preparation could result in a better result. Moreover, further study to determine the effect of type of transactions on modeling is required. It can be concluded that knowledge discovery methods and tools can be used to automate the manual task

    A Proposal for the Protection of Digital Databases in Sri Lanka

    Get PDF
    Economic development in Sri Lanka has relied heavily on foreign and domestic investment. Digital databases are a new and attractive area for this investment. This thesis argues that investment needs protection and this is crucial to attract future investment. The thesis therefore proposes a digital database protection mechanism with a view to attracting investment in digital databases to Sri Lanka. The research examines various existing protection measures whilst mainly focusing on the sui generis right protection which confirms the protection of qualitative and/or quantitative substantial investment in the obtaining, verification or presentation of the contents of digital databases. In digital databases, this process is carried out by computer programs which establish meaningful and useful data patterns through their data mining process, and subsequently use those patterns in Knowledge Discovery within database processes. Those processes enhance the value and/or usefulness of the data/information. Computer programs need to be protected, as this thesis proposes, by virtue of patent protection because the process carried out by computer programs is that of a technical process - an area for which patents are particularly suitable for the purpose of protecting. All intellectual property concepts under the existing mechanisms address the issue of investment in databases in different ways. These include Copyright, Contract, Unfair Competition law and Misappropriation and Sui generis right protection. Since the primary objective of the thesis is to introduce a protection system for encouraging qualitative and quantitative investment in digital databases in Sri Lanka, this thesis suggests a set of mechanisms and rights which comprises of existing intellectual protection mechanisms for databases. The ultimate goal of the proposed protection mechanisms and rights is to improve the laws pertaining to the protection of digital databases in Sri Lanka in order to attract investment, to protect the rights and duties of the digital database users and owners/authors and, eventually, to bring positive economic effects to the country. Since digital database protection is a new concept in the Sri Lankan legal context, this research will provide guidelines for policy-makers, judges and lawyers in Sri Lanka and throughout the South Asian region

    Descubrimiento de factores asociados al desempeño en las pruebas saber 5 con técnicas descriptivas de minería de datos

    Get PDF
    El objetivo de esta investigación fue descubrir factores asociados al desempeño académico en las competencias genéricas de las pruebas Saber 5° de los estudiantes de Instituciones Educativas de Colombia que presentaron estas pruebas en el periodo 2014 al 2016, utilizando técnicas descriptivas de minería de datos. Se utilizaron los datos socioeconómicos, académicos e institucionales almacenados en las bases de datos del Instituto Colombiano para la Evaluación de la Educación (ICFES). Se aplicó la metodología CRISP-DM, uno de los modelos más utilizado en los ambientes académico e industrial y la guía de referencia más ampliamente aplicada en el desarrollo de este tipo de proyectos. A partir del desarrollo de las fases de esta metodología, se obtuvo en primer lugar un repositorio de datos limpio y transformado, para las competencias de Lenguaje, Matemáticas, Ciencias Naturales y Competencias Ciudadanas. Se utilizaron las técnicas descriptivas de minería de datos Reglas de Asociación con el algoritmo Apriori y la técnica de Agrupamiento o Clustering con el algoritmo k-means, para descubrir factores asociados al desempeño académico. La gran mayoría de factores están asociados al desempeño mínimo en las competencias evaluadas. El conocimiento descubierto se incorporará al existente y se podrá integrar a los procesos de toma de decisiones del ICFES y de las instituciones gubernamentales y académicas que velan por la calidad de la educación en el País
    corecore