6,179 research outputs found
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
Determining the Data Needs for Decision Making in Public Libraries
Library decision makers evaluate community needs and library capabilities in order to select the appropriate services offered by their particular institution. Evaluations of the programs and services may indicate that some are ineffective or inefficient, or that formerly popular services are no longer needed. The internal and external conditions used for decision making change. Monitoring these conditions and evaluations allows the library to make new decisions that maintain its relevance to the community.
Administrators must have ready access to appropriate data that will give them the information they need for library decision making. Today’s computer-based libraries accumulate electronic data in their integrated library systems (ILS) and other operational databases; however, these systems do not provide tools for examining the data to reveal trends and patterns, nor do they have any means of integrating important information from other programs and files where the data are stored in incompatible formats. These restrictions are overcome by use of a data warehouse and a set of analytical software tools, forming a decision support system. The data warehouse must be tailored to specific needs and users to succeed. Libraries that wish to pursue decision support can begin by performing a needs analysis to determine the most important use of the proposed warehouse and to identify the data elements needed to support this use.
The purpose of this study is to complete the needs analysis phase for a data warehouse for a certain public library that is interested in using its electronic data for data mining and other analytical processes. This study is applied research. Data on users’ needs were collected through two rounds of face-to-face interviews. Participants were selected purposively. The phase one interviews were semi-structured, designed to discover the uses of the data warehouse, and then what data were required for those uses. The phase two interviews were structured, and presented selected data elements from the ILS to interviewees who were asked to evaluate how they would use each element in decision making.
Analysis of these interviews showed that the library needs data from sources that vary in physical format, in summary levels, and in data definitions. The library should construct data marts, carefully designed for future integration into a data warehouse. The only data source that is ready for a data mart is the bibliographic database of the integrated library system. Entities and relationships from the ILS are identified for a circulation data mart. The entities and their attributes are described.
A second data mart is suggested for integrating vendor reports for the online databases. Vendor reports vary widely in how they define their variables and in the summary levels of their statistics. Unified data definitions need to be created for the variables of importance so that online database usage may be compared with other data on use of library resources, reflected in the circulation data mart.
Administrators need data to address a number of other decision situations. These decisions require data from other library sources that are not optimized for data warehousing, or that are external to the library. Suggestions are made for future development of data marts using these sources.
The study concludes by recommending that libraries wishing to undertake similar studies begin with a pre-assessment of the entire institution, its data sources, and its management structure, conducted by a consultant. The needs assessment itself should include a focus group session in addition to the interviews
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
Open issues in semantic query optimization in relational DBMS
After two decades of research into Semantic Query Optimization (SQO) there is clear agreement as to the efficacy of SQO. However, although there are some experimental implementations there are still no commercial implementations. We
first present a thorough analysis of research into SQO. We identify three problems which inhibit the effective use of SQO in Relational Database Management Systems(RDBMS). We then propose solutions to these problems and describe first steps towards the implementation of an effective semantic query optimizer for relational databases
A Data Centric Privacy Preserved Mining Model for Business Intelligence Applications
In present day competitive scenario, the techniques such as data warehouse and on-line
analytical process (OLAP) have become a very significant approach for decision support in data centric
applications and industries. In fact the decision support mechanism puts certain moderately varied needs
on database technology as compared to OLAP based applications. Data centric decision support schemes
(DSS) like data warehouse might play a significant role in extracting details from various areas and for
standardizing data throughout the organization to achieve a singular way of detail presentation. Such
efficient data presentation facilitates information for decision making in business intelligence (BI)
applications in various industrial services. In order to enhance the effectiveness and robust computation in
BI applications, the optimization in data mining and its processing is must. On the other hand, being in a
multiuser scenario, the security of data on warehouse is also a critical issue, which is not explored till date.
In this paper a data centric and service oriented privacy preserved model for BI applications has been
proposed. The optimization in data mining has been accomplished by means of C5.0 classification
algorithm and comparative study has been done with C4.5 algorithm. The implementation of enhanced C5.0
algorithm with BI model would provide much better performance with privacy preservation facility for
Business Intelligence applications
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY
Romania has a huge tourist’s potential, but currently it is too little valued and exploited. As a result, one of the strategic developments of the economy aimed the tourism industry. The strategic decisions are based on different trends obtained from sophtourism industry, data mining techniques, distributed databases
- …