Search CORE

20,490 research outputs found

The Business Insight Index – Evaluating Customer Insights through Hybrid Models

Author: Bratel Jonathan
Publication venue: Lunds universitet/Institutionen för datavetenskap
Publication date: 01/01/2015
Field of study

Customer segmentation and target analysis are two essential tasks when identifying a company’s customers. To perform these tasks, this thesis develops and applies hybrid data-mining models, integrating clustering and decision trees. The hybrid models are applied to the life-logging camera company Narrative, in order to gain insights into their customer data. From previous research, we found that these hybrid models lacked means for evaluating the amount of insights proposed to decision makers. For this reason, we created, tested, and validated a new evaluation measure – the Description Tree Index. Through experiments on five separate datasets, we conclude that the measure enables decision makers to evaluate the insights gained through the hybrid model. In each case, the index generates the best results for the expected number of segments. We then integrated the Description Tree Index with existing evaluation models to form a Business Insight Index. This index evaluates customer segmentation and target analysis from both a business and data-mining perspective. By applying the index to the Narrative data, we found four customer segments to present the most insights.Att öka förståelsen av sina kunder har utvecklats till en allt viktigare uppgift för dagens företag. För att få bättre kunduppfattning utvecklas i detta examensarbete ett nytt mått som bedömer kundinformation. Ett allt vanligare problem för affärsverksamheter är att försäljnings- och marknadsföringskostnader ökar. Detta beror på att dagens kunder har alltmer olikartade köpbeteenden. För att kunna prioritera de mest värdeskapande kunderna krävs numera att en segmentering genomförs. Att segmentera kunder innebär att dela upp en marknad i mindre delar utefter olika kundegenskaper, exempelvis ålder, kön eller inkomstnivå. Att segmentera kunder har dock blivit en allt mer komplex uppgift, då informationen om kundernas egenskaper och beteende ökat lavinartat de senaste åren. Bara på det sociala mediet Facebook laddas över 500 terabyte data upp varje dag. Med hjälp av data mining kan segmentering och prioritering utföras även på stora mängder data. Data mining består av verktyg och tekniker för att hitta mönster, samband och trender i data. Dessa insikter kan sedan utnyttjas av beslutsfattare för att skapa konkurrensfördelar. The Business Insight Index För att kunna utvärdera kundinsikterna skapas i examensarbetet ett nytt mått – Business Insight Index (BII). Detta mått kan användas för att avgöra om en kundsegmentering är mer kvalitativ än annan. Genom att utvärdera mängden information som görs tillgänglig till beslutsfattare kan måttet förbättra kundsegmenteringsprocessen. BII uppvisar goda resultat vid test på fem datafiler där segmenteringen är känd sedan tidigare. För varje datafil genererar måttet bäst resultat för de förväntade segmenten. Traditionella evalueringsmetoder håller inte måttet Vanligtvis utvärderas segmentering inom data mining genom att mäta hur inbördes lika segmenten är i förhållande till hur olika de är sinsemellan. Dessa mått tar dock inte hänsyn till mängden information som förmedlas till säljare och marknadsförare. Bättre kundinformation kan på sikt leda till konkurrensfördelar och ökad försäljning. Därför är det viktigt att dels utvärdera segmentens kvalitativa egenskaper, men även till vilken grad dessa kan förstås och kommuniceras. Narrative För att hjälpa företaget Narrative att segmentera sina kunder utnyttjas BII. Narrative är ett Linköpingsbaserat företag som marknadsför lifelogging-kameror. Var 30:e sekund tar dessa bilder, vilka kan laddas upp till företagets servrar. Kunder kan sedan komma åt korten via företagets mobil-app. Genom att dela in företagets kunder i segment och sedan utvärdera dessa, får Narrative information om vilka värdedrivare kunderna ser i produkten. Är exempelvis hög bildkvalitet viktigare än anpassningsmöjligheter till sociala medier? Eller är bildfrekvensen den viktigaste faktorn? Då företaget identifierar kamerans värdedrivare kan produkten utvecklas och marknadsföras till de olika segmenten. Integrering av segmentering och beslutsträd Genom att analysera segmenten i beslutsträd framkommer vilka egenskaper som är utmärkande för kunderna. Beslutsträdet förutsäger vilka värden som kommer krävas för att en kund ska placeras i ett specifikt segment. Detta verktyg är fördelaktigt då det möjliggör en visualisering av kunderna som enkelt kan förstås av och förklaras för beslutsfattare

Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Author: Mpofu Bongeka
Publication venue
Publication date: 01/12/2018
Field of study

Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

Unisa Institutional Repository

A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

Author: Delen Dursun
Kasap Nihat
Meesad Phayung
Thammasiri Dech
Publication venue: 'Elsevier BV'
Publication date: 01/08/2013
Field of study

Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates

Sabanci University Research Database

Learning from accidents : machine learning for safety at railway stations

Author: Alawad H
An M
Kaewunruen S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/12/2019
Field of study

In railway systems, station safety is a critical aspect of the overall structure, and yet, accidents at stations still occur. It is time to learn from these errors and improve conventional methods by utilizing the latest technology, such as machine learning (ML), to analyse accidents and enhance safety systems. ML has been employed in many fields, including engineering systems, and it interacts with us throughout our daily lives. Thus, we must consider the available technology in general and ML in particular in the context of safety in the railway industry. This paper explores the employment of the decision tree (DT) method in safety classification and the analysis of accidents at railway stations to predict the traits of passengers affected by accidents. The critical contribution of this study is the presentation of ML and an explanation of how this technique is applied for ensuring safety, utilizing automated processes, and gaining benefits from this powerful technology. To apply and explore this method, a case study has been selected that focuses on the fatalities caused by accidents at railway stations. An analysis of some of these fatal accidents as reported by the Rail Safety and Standards Board (RSSB) is performed and presented in this paper to provide a broader summary of the application of supervised ML for improving safety at railway stations. Finally, this research shows the vast potential of the innovative application of ML in safety analysis for the railway industry

University of Salford Institutional Repository

University of Birmingham Research Portal