194,848 research outputs found
Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner A Comparative Analysis
Decision tree induction and Clustering are two of the most prevalent data mining techniques used separately or together in many business applications. Most commercial data mining software tools provide these two techniques but few of them satisfy business needs.  There are many criteria and factors to choose the most appropriate software for a particular organization. This paper aims to provide a comparative analysis for three popular data mining software tools, which are SASŸ Enterprise Miner, SPSS Clementine, and IBM DB2Ÿ Intelligent Miner based on four main criteria, which are performance, functionality, usability, and auxiliary Task Support
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Credit Risk Management Using Automatic Machine Learning
The article presents the basic techniques of data mining implemented in typical commercial software. They were used to assess the risk of credit card debt repayment. The article assesses the quality of classification models derived from data mining techniques and compares their results with the traditional approach using a logit model to assess credit risk. It turns out that data mining models provide similar accuracy of classification compared to the logit model, but they require much less work and facilitate the automation of the process of building scoring models
Free Open Source Software for Business Intelligence
Free Open Source Software (FOSS) has recently grown, becoming a significant part of the IT market. We use the word âFOSSâ to refer to software under a license which grants the right to access the source code and use, study, and change the software. We must not confuse FOSS with ânon-commercial softwareâ: antonyms of FOSS are âclosedâ and âproprietaryâ software.
The first purpose of this paper is to maintain an unbiased position. The analysis begins with a general overview of the FOSS world and then moves focus to business intelligence: during the last years, several tools have finally entered the market, becoming actual competitors to proprietary software.
Although FOSS still needs to grow, a large number of companies are already deploying or at least testing some FOSS solutions. In addition, the research world has shown interest providing several market surveys and software analyses.
After illustrating the selection criteria used, the paper describes the most interesting FOSS tools for each of the following business intelligence subcategories: database management systems (DBMS), data integration tools, analytical tools and business intelligence suites.
In addition, the FOSS data mining solutions RapidMiner and KNIME are evaluated and tested on a set of data. Although the two programs are not as widespread as the proprietary data mining tools, they can be considered actual competitors to the proprietary software
Implementing Service Oriented Architecture for Data Mining
With Web technology, data on internet has become increasingly large and complex. No matter users or internet users needs all this data. Also the data which is available on web not all the time useful information or it is knowledgeable. Hence web data mining is necessary to fulfill this demand. Web data mining can extract unstructured, undiscovered data which is possibly useful information and knowledge, from much incomplete, noisy, ambiguous, random, practical application related data from WWW network. It is a new emerging commercial information/data mining technology. Its main characteristic is to extract key data to support business for decision making from business database through the use of extraction, conversion, analysis and other transaction models. Web service is deployed on the web with an object or component to achieve distributed application software platform through a series of protocols. Web Service platform provides a set of standard types systems, rules, techniques and internet service-oriented applications for communication between the different platforms, different programming languages and different types of systems to achieve interoperability. This paper gives the actual and practical application of web services for data mining, we build a data mining model based on Web services and going forward it is possible to implement the new data mining solution for security configuration. This has been achieved with the use of prototypes of a dynamic web service based data mining systems.
DOI: 10.17762/ijritcc2321-8169.15079
A systematic approach for performance assessment using process mining. An industrial experience report
Software performance engineering is a mature field that offers methods to assess system performance. Process mining is a promising research field applied to gain insight on system processes. The interplay of these two fields opens promising applications in the industry. In this work, we report our experience applying a methodology, based on process mining techniques, for the performance assessment of a commercial data-intensive software application. The methodology has successfully assessed the scalability of future versions of this system. Moreover, it has identified bottlenecks components and replication needs for fulfilling business rules. The system, an integrated port operations management system, has been developed by Prodevelop, a medium-sized software enterprise with high expertise in geospatial technologies. The performance assessment has been carried out by a team composed by practitioners and researchers. Finally, the paper offers a deep discussion on the lessons learned during the experience, that will be useful for practitioners to adopt the methodology and for researcher to find new routes
Web Mining for Web Personalization
Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented
- âŠ