3,502 research outputs found
Predicting customer's gender and age depending on mobile phone data
In the age of data driven solution, the customer demographic attributes, such
as gender and age, play a core role that may enable companies to enhance the
offers of their services and target the right customer in the right time and
place. In the marketing campaign, the companies want to target the real user of
the GSM (global system for mobile communications), not the line owner. Where
sometimes they may not be the same. This work proposes a method that predicts
users' gender and age based on their behavior, services and contract
information. We used call detail records (CDRs), customer relationship
management (CRM) and billing information as a data source to analyze telecom
customer behavior, and applied different types of machine learning algorithms
to provide marketing campaigns with more accurate information about customer
demographic attributes. This model is built using reliable data set of 18,000
users provided by SyriaTel Telecom Company, for training and testing. The model
applied by using big data technology and achieved 85.6% accuracy in terms of
user gender prediction and 65.5% of user age prediction. The main contribution
of this work is the improvement in the accuracy in terms of user gender
prediction and user age prediction based on mobile phone data and end-to-end
solution that approaches customer data from multiple aspects in the telecom
domain
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Data engineering and best practices
Mestrado Bolonha em Data Analytics for BusinessThis report presents the results of a study on the current state of data engineering at LGG Advisors company. Analyzing existing data, we identified several key trends and challenges facing data engineers in this field. Our study's key findings include a lack of standardization and best practices for data engineering processes, a growing need for more sophisticated data management and analysis tools and data security, and a lack of trained and experienced data engineers to meet the increasing demand for data-driven solutions. Based on these findings, we recommend several steps that organizations at LGG Advisors company can take to improve their data engineering capabilities, including investing in training and education programs, adopting best practices for data management and analysis, and collaborating with other organizations to share knowledge and resources. Data security is also an essential concern for data engineers, as data breaches can have significant consequences for organizations, including financial losses, reputational damage, and regulatory penalties. In this thesis, we will review and evaluate some of the best software tools for securing data in data engineering environments. We will discuss these tools' key features and capabilities and their strengths and limitations to help data engineers choose the best software for protecting their data. Some of the tools we will consider include encryption software, access control systems, network security tools, and data backup and recovery solutions. We will also discuss best practices for implementing and managing these tools to ensure data security in data engineering environments. We engineer data using intuition and rules of thumb. Many of these rules are folklore. Given the rapid technological changes, these rules must be constantly reevaluated.info:eu-repo/semantics/publishedVersio
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
The Use of Business Analytics Systems: An Empirical Investigation in Taiwan’s Hospitals
This paper aims to develop a research model to examine the mechanisms by which business analytics capabilities in healthcare units are shown to indirectly influence decision-making effectiveness through a mediating role of absorptive capacity. We employed a survey method to collect primary data from Taiwan\u27s hospitals. Structural equation modeling (SEM) was used for path analysis. This study conceptualizes, operationalizes, and measures the business analytics (BA) capability as a multi-dimensional construct formed by capturing the functionalities of BA systems in healthcare. The results found that healthcare units are likely to obtain valuable knowledge as they utilize the data interpretation tools effectively. Also, the effective use of data analysis and interpretation tools in healthcare units indirectly influence decision-making effectiveness, an impact that is mediated by absorptive capacity
Big Data Computing for Geospatial Applications
The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms
- …