Search CORE

64 research outputs found

Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary

Author: Callegaro M.
Duerr A.
Fielding R. T.
Groves R. M.
Harris S.
Jessen R. J.
Job
Sandler R.
Statistical Methods SEMATECH
Sumbaly R.
Weiner J.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2017
Field of study

The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challenges while designing and implementing the system, and describe the modeling components such as Bayesian hierarchical smoothing that help to compute and present robust compensation insights to users. We report on extensive evaluation with nearly one year of de-identified compensation data collected from over one million LinkedIn users, thereby demonstrating the efficacy of the statistical models. We also highlight the lessons learned through the deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information and Knowledge Management (CIKM 2017

arXiv.org e-Print Archive

Crossref

STUDY OF BIG DATA ARHITECTURE LAMBDA ARHITECTURE

Author: Katkar Jaideep
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2015
Field of study

The lambda architecture introduced by Marz is generic, scalable and fault-tolerant data processing architecture. It aims to satisfy the needs for a robust system that is faulttolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases. The architecture proposal decomposes the problem into three layers: a) the batch layer focuses on fault tolerance and optimizes for precise results b) the speed layer is optimized for short response-times and only takes into account the most recent data and c) the serving layer provides low latency views to the results of the batch layer. The reason to divide the architecture into three layers is the flexibility it offers to the potential applications. The fast but possibly inaccurate results of the speed layer are eventually replaced by the precise results of the batch layer. The evaluation of the designed architecture measured its capabilities based on the DEBS grand challenge 2014 and percentile calculation for milestones task. As part of the project we implement the lambda architecture in different ways (i.e. using different systems). We compare these different implementations and derive the strengths and weaknesses of each different system used in the lambda architecture

SJSU ScholarWorks

QUERY PERFORMANCE EVALUATION OVER HEALTH DATA

Author: Pinarer Ozgun
Turhan Sultan
Publication venue: 'IADIS - International Association for the Development of the Information Society'
Publication date: 17/07/2019
Field of study

International audienceIn recent years, there has been a significant increase in the number and variety of application scenarios studied under the e-health. Each application generates an immense data that is growing constantly. In this context, it becomes an important challenge to store and analyze the data efficiently and economically via conventional database management tools. The traditional relational database systems may sometimes not answer the requirements of the increased type, volume, velocity and dynamic structure of the new datasets. Effective healthcare data management and its transformation into information/knowledge are therefore challenging issues. So, organizations especially hospitals and medical centers that deal with immense data, either have to purchase new systems or re-tool what they already have. The new data models so-called NOSQL, its management tool Hadoop Distributed File Systems is replacing RDBMs especially in real-time healthcare data analytics processes. It becomes a real challenge to perform complex reporting in these applications as the size of the data grows exponentially. Along with that, there is customers demand complex analysis and reporting on those data. Compared to the traditional DBs, Hadoop Framework is designed to process a large volume of data. In this study, we examine the query performance of a traditional DBs and Big Data platforms on healthcare data. In this paper, we try to explore whether it is really necessary to invest on big data environment to run queries on the high volume data or this can also be done with the current relational database management systems and their supporting hardware infrastructure. We present our experience and a comprehensive performance evaluation of data management systems in the context of application performance

Crossref

HAL

Hal-Diderot

An evaluation of non-relational database management systems as suitable storage for user generated text-based content in a distributed environment

Author: Du Toit Petrus
Publication venue
Publication date: 07/10/2016
Field of study

Non-relational database management systems address some of the limitations relational database management systems have when storing large volumes of unstructured, user generated text-based data in distributed environments. They follow different approaches through the data model they use, their ability to scale data storage over distributed servers and the programming interface they provide. An experimental approach was followed to measure the capabilities these alternative database management systems present in their approach to address the limitations of relational databases in terms of their capability to store unstructured text-based data, data warehousing capabilities, ability to scale data storage across distributed servers and the level of programming abstraction they provide. The results of the research highlighted the limitations of relational database management systems. The different database management systems do address certain limitations, but not all. Document-oriented databases provide the best results and successfully address the need to store large volumes of user generated text-based data in a distributed environmentSchool of ComputingM. Sc. (Computer Science

Unisa Institutional Repository

Towards a big data reference architecture

Author: Maier M.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository