115,782 research outputs found
Mining developer communication data streams
This paper explores the concepts of modelling a software development project
as a process that results in the creation of a continuous stream of data. In
terms of the Jazz repository used in this research, one aspect of that stream
of data would be developer communication. Such data can be used to create an
evolving social network characterized by a range of metrics. This paper
presents the application of data stream mining techniques to identify the most
useful metrics for predicting build outcomes. Results are presented from
applying the Hoeffding Tree classification method used in conjunction with the
Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results
indicate that only a small number of the available metrics considered have any
significance for predicting the outcome of a build
Hybrid model using logit and nonparametric methods for predicting micro-entity failure
Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper
by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to
detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods
(Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as
either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and
Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method
implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic
variables complement financial ratios for bankruptcy prediction
Mining data streams using option trees (revised edition, 2004)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time.
This paper describes a data stream classi_cation algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset
Competitive Positioning in International Logistics: Identifying a System of Attributes Through Neural Networks and Decision Trees
Firms involved in international logistics must develop a system of service attributes that give them a way to be profitable and to satisfy customers’ needs at the same time. How customers trade-off these various attributes in forming satisfaction with competing international logistics providers has not been explored well in the literature. This study explores the ocean freight shipping sector to identify the system of attributes that maximizes customers’ satisfaction. Data were collected from shipping managers in Singapore using personal interviews to identify the chief concerns in choosing and evaluating ocean freight services. The data were then examined using neural networks and decision trees, among other approaches to identify the system of attributes that is connected with customer satisfaction. The results illustrate the power of these methods in understanding how industrial customers with global operations process attributes to derive satisfaction. Implications are discussed
- …