3,502 research outputs found

    Predicting customer's gender and age depending on mobile phone data

    Full text link
    In the age of data driven solution, the customer demographic attributes, such as gender and age, play a core role that may enable companies to enhance the offers of their services and target the right customer in the right time and place. In the marketing campaign, the companies want to target the real user of the GSM (global system for mobile communications), not the line owner. Where sometimes they may not be the same. This work proposes a method that predicts users' gender and age based on their behavior, services and contract information. We used call detail records (CDRs), customer relationship management (CRM) and billing information as a data source to analyze telecom customer behavior, and applied different types of machine learning algorithms to provide marketing campaigns with more accurate information about customer demographic attributes. This model is built using reliable data set of 18,000 users provided by SyriaTel Telecom Company, for training and testing. The model applied by using big data technology and achieved 85.6% accuracy in terms of user gender prediction and 65.5% of user age prediction. The main contribution of this work is the improvement in the accuracy in terms of user gender prediction and user age prediction based on mobile phone data and end-to-end solution that approaches customer data from multiple aspects in the telecom domain

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Data engineering and best practices

    Get PDF
    Mestrado Bolonha em Data Analytics for BusinessThis report presents the results of a study on the current state of data engineering at LGG Advisors company. Analyzing existing data, we identified several key trends and challenges facing data engineers in this field. Our study's key findings include a lack of standardization and best practices for data engineering processes, a growing need for more sophisticated data management and analysis tools and data security, and a lack of trained and experienced data engineers to meet the increasing demand for data-driven solutions. Based on these findings, we recommend several steps that organizations at LGG Advisors company can take to improve their data engineering capabilities, including investing in training and education programs, adopting best practices for data management and analysis, and collaborating with other organizations to share knowledge and resources. Data security is also an essential concern for data engineers, as data breaches can have significant consequences for organizations, including financial losses, reputational damage, and regulatory penalties. In this thesis, we will review and evaluate some of the best software tools for securing data in data engineering environments. We will discuss these tools' key features and capabilities and their strengths and limitations to help data engineers choose the best software for protecting their data. Some of the tools we will consider include encryption software, access control systems, network security tools, and data backup and recovery solutions. We will also discuss best practices for implementing and managing these tools to ensure data security in data engineering environments. We engineer data using intuition and rules of thumb. Many of these rules are folklore. Given the rapid technological changes, these rules must be constantly reevaluated.info:eu-repo/semantics/publishedVersio

    The Data Lakehouse: Data Warehousing and More

    Full text link
    Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers. As the demand for data-driven decision making surges, the need for a more robust data architecture to address these challenges becomes ever more critical. Cloud data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP

    The Use of Business Analytics Systems: An Empirical Investigation in Taiwan’s Hospitals

    Get PDF
    This paper aims to develop a research model to examine the mechanisms by which business analytics capabilities in healthcare units are shown to indirectly influence decision-making effectiveness through a mediating role of absorptive capacity. We employed a survey method to collect primary data from Taiwan\u27s hospitals. Structural equation modeling (SEM) was used for path analysis. This study conceptualizes, operationalizes, and measures the business analytics (BA) capability as a multi-dimensional construct formed by capturing the functionalities of BA systems in healthcare. The results found that healthcare units are likely to obtain valuable knowledge as they utilize the data interpretation tools effectively. Also, the effective use of data analysis and interpretation tools in healthcare units indirectly influence decision-making effectiveness, an impact that is mediated by absorptive capacity

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms
    • …
    corecore