64 research outputs found

    Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary

    Full text link
    The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challenges while designing and implementing the system, and describe the modeling components such as Bayesian hierarchical smoothing that help to compute and present robust compensation insights to users. We report on extensive evaluation with nearly one year of de-identified compensation data collected from over one million LinkedIn users, thereby demonstrating the efficacy of the statistical models. We also highlight the lessons learned through the deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information and Knowledge Management (CIKM 2017

    STUDY OF BIG DATA ARHITECTURE LAMBDA ARHITECTURE

    Get PDF
    The lambda architecture introduced by Marz is generic, scalable and fault-tolerant data processing architecture. It aims to satisfy the needs for a robust system that is faulttolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases. The architecture proposal decomposes the problem into three layers: a) the batch layer focuses on fault tolerance and optimizes for precise results b) the speed layer is optimized for short response-times and only takes into account the most recent data and c) the serving layer provides low latency views to the results of the batch layer. The reason to divide the architecture into three layers is the flexibility it offers to the potential applications. The fast but possibly inaccurate results of the speed layer are eventually replaced by the precise results of the batch layer. The evaluation of the designed architecture measured its capabilities based on the DEBS grand challenge 2014 and percentile calculation for milestones task. As part of the project we implement the lambda architecture in different ways (i.e. using different systems). We compare these different implementations and derive the strengths and weaknesses of each different system used in the lambda architecture

    QUERY PERFORMANCE EVALUATION OVER HEALTH DATA

    Get PDF
    International audienceIn recent years, there has been a significant increase in the number and variety of application scenarios studied under the e-health. Each application generates an immense data that is growing constantly. In this context, it becomes an important challenge to store and analyze the data efficiently and economically via conventional database management tools. The traditional relational database systems may sometimes not answer the requirements of the increased type, volume, velocity and dynamic structure of the new datasets. Effective healthcare data management and its transformation into information/knowledge are therefore challenging issues. So, organizations especially hospitals and medical centers that deal with immense data, either have to purchase new systems or re-tool what they already have. The new data models so-called NOSQL, its management tool Hadoop Distributed File Systems is replacing RDBMs especially in real-time healthcare data analytics processes. It becomes a real challenge to perform complex reporting in these applications as the size of the data grows exponentially. Along with that, there is customers demand complex analysis and reporting on those data. Compared to the traditional DBs, Hadoop Framework is designed to process a large volume of data. In this study, we examine the query performance of a traditional DBs and Big Data platforms on healthcare data. In this paper, we try to explore whether it is really necessary to invest on big data environment to run queries on the high volume data or this can also be done with the current relational database management systems and their supporting hardware infrastructure. We present our experience and a comprehensive performance evaluation of data management systems in the context of application performance

    An evaluation of non-relational database management systems as suitable storage for user generated text-based content in a distributed environment

    Get PDF
    Non-relational database management systems address some of the limitations relational database management systems have when storing large volumes of unstructured, user generated text-based data in distributed environments. They follow different approaches through the data model they use, their ability to scale data storage over distributed servers and the programming interface they provide. An experimental approach was followed to measure the capabilities these alternative database management systems present in their approach to address the limitations of relational databases in terms of their capability to store unstructured text-based data, data warehousing capabilities, ability to scale data storage across distributed servers and the level of programming abstraction they provide. The results of the research highlighted the limitations of relational database management systems. The different database management systems do address certain limitations, but not all. Document-oriented databases provide the best results and successfully address the need to store large volumes of user generated text-based data in a distributed environmentSchool of ComputingM. Sc. (Computer Science

    Towards a big data reference architecture

    Get PDF
    • …
    corecore