115,782 research outputs found

    Mining developer communication data streams

    Full text link
    This paper explores the concepts of modelling a software development project as a process that results in the creation of a continuous stream of data. In terms of the Jazz repository used in this research, one aspect of that stream of data would be developer communication. Such data can be used to create an evolving social network characterized by a range of metrics. This paper presents the application of data stream mining techniques to identify the most useful metrics for predicting build outcomes. Results are presented from applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results indicate that only a small number of the available metrics considered have any significance for predicting the outcome of a build

    Hybrid model using logit and nonparametric methods for predicting micro-entity failure

    Get PDF
    Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods (Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic variables complement financial ratios for bankruptcy prediction

    Mining data streams using option trees (revised edition, 2004)

    Get PDF
    The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time. This paper describes a data stream classi_cation algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset

    Competitive Positioning in International Logistics: Identifying a System of Attributes Through Neural Networks and Decision Trees

    Get PDF
    Firms involved in international logistics must develop a system of service attributes that give them a way to be profitable and to satisfy customers’ needs at the same time. How customers trade-off these various attributes in forming satisfaction with competing international logistics providers has not been explored well in the literature. This study explores the ocean freight shipping sector to identify the system of attributes that maximizes customers’ satisfaction. Data were collected from shipping managers in Singapore using personal interviews to identify the chief concerns in choosing and evaluating ocean freight services. The data were then examined using neural networks and decision trees, among other approaches to identify the system of attributes that is connected with customer satisfaction. The results illustrate the power of these methods in understanding how industrial customers with global operations process attributes to derive satisfaction. Implications are discussed
    corecore