194,848 research outputs found

    Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner A Comparative Analysis

    Get PDF
    Decision tree induction and Clustering are two of the most prevalent data mining techniques used separately or together in many business applications. Most commercial data mining software tools provide these two techniques but few of them satisfy business needs.  There are many criteria and factors to choose the most appropriate software for a particular organization. This paper aims to provide a comparative analysis for three popular data mining software tools, which are SASŸ Enterprise Miner, SPSS Clementine, and IBM DB2Ÿ Intelligent Miner based on four main criteria, which are performance, functionality, usability, and auxiliary Task Support

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Credit Risk Management Using Automatic Machine Learning

    Get PDF
    The article presents the basic techniques of data mining implemented in typical commercial software. They were used to assess the risk of credit card debt repayment. The article assesses the quality of classification models derived from data mining techniques and compares their results with the traditional approach using a logit model to assess credit risk. It turns out that data mining models provide similar accuracy of classification compared to the logit model, but they require much less work and facilitate the automation of the process of building scoring models

    Free Open Source Software for Business Intelligence

    Get PDF
    Free Open Source Software (FOSS) has recently grown, becoming a significant part of the IT market. We use the word “FOSS” to refer to software under a license which grants the right to access the source code and use, study, and change the software. We must not confuse FOSS with “non-commercial software”: antonyms of FOSS are “closed” and “proprietary” software. The first purpose of this paper is to maintain an unbiased position. The analysis begins with a general overview of the FOSS world and then moves focus to business intelligence: during the last years, several tools have finally entered the market, becoming actual competitors to proprietary software. Although FOSS still needs to grow, a large number of companies are already deploying or at least testing some FOSS solutions. In addition, the research world has shown interest providing several market surveys and software analyses. After illustrating the selection criteria used, the paper describes the most interesting FOSS tools for each of the following business intelligence subcategories: database management systems (DBMS), data integration tools, analytical tools and business intelligence suites. In addition, the FOSS data mining solutions RapidMiner and KNIME are evaluated and tested on a set of data. Although the two programs are not as widespread as the proprietary data mining tools, they can be considered actual competitors to the proprietary software

    Implementing Service Oriented Architecture for Data Mining

    Get PDF
    With Web technology, data on internet has become increasingly large and complex. No matter users or internet users needs all this data. Also the data which is available on web not all the time useful information or it is knowledgeable. Hence web data mining is necessary to fulfill this demand. Web data mining can extract unstructured, undiscovered data which is possibly useful information and knowledge, from much incomplete, noisy, ambiguous, random, practical application related data from WWW network. It is a new emerging commercial information/data mining technology. Its main characteristic is to extract key data to support business for decision making from business database through the use of extraction, conversion, analysis and other transaction models. Web service is deployed on the web with an object or component to achieve distributed application software platform through a series of protocols. Web Service platform provides a set of standard types systems, rules, techniques and internet service-oriented applications for communication between the different platforms, different programming languages and different types of systems to achieve interoperability. This paper gives the actual and practical application of web services for data mining, we build a data mining model based on Web services and going forward it is possible to implement the new data mining solution for security configuration. This has been achieved with the use of prototypes of a dynamic web service based data mining systems. DOI: 10.17762/ijritcc2321-8169.15079

    A systematic approach for performance assessment using process mining. An industrial experience report

    Get PDF
    Software performance engineering is a mature field that offers methods to assess system performance. Process mining is a promising research field applied to gain insight on system processes. The interplay of these two fields opens promising applications in the industry. In this work, we report our experience applying a methodology, based on process mining techniques, for the performance assessment of a commercial data-intensive software application. The methodology has successfully assessed the scalability of future versions of this system. Moreover, it has identified bottlenecks components and replication needs for fulfilling business rules. The system, an integrated port operations management system, has been developed by Prodevelop, a medium-sized software enterprise with high expertise in geospatial technologies. The performance assessment has been carried out by a team composed by practitioners and researchers. Finally, the paper offers a deep discussion on the lessons learned during the experience, that will be useful for practitioners to adopt the methodology and for researcher to find new routes

    Web Mining for Web Personalization

    Get PDF
    Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented
