16 research outputs found

    Building custom plugin for Kibana to visualise Oracle database audit logs

    Get PDF
    Project Specification A central platform which provides a highly scalable, secure and central repository that stores consolidated audit data from Oracle databases is being implemented at CERN. Among other purposes, this central platform will be used for reporting. The reports will provide a holistic view of activity across all oracle databases and will include compliance reports, activity reports and privilege reports. With guidance and support from the supervisor, the candidate will customize the mentioned reports in order to improve the visualizations and graphs contained in it using a flexible analytics and visualization platform called Kibana. The candidate will gain experience in building custom visualizations, aggregations and styling using Kibana along with mastering real-time summaries and charting of streaming data. Abstract Report provides more details about Openlab Summer Student Programme project. Project was Building custom plugin for Kibana to visualise Oracle database audit logs. Chapter one is explaining bigger picture about the project. Chapter two is giving more details about the project architecture. It is also explaining more about used technologies – about Elasticsearch and Kibana. This is important, because it explains problems with current technologies and motivation for this project. Chapter three is explaining student's project. It is giving information about used technologies, plugin structure and current state of the plugin. It also explains which difficulties had student during the work. Considering the fact that some of readers may be interested to try using the plugin, to extend it, or even to create own plugin - chapter three is providing readers with the info how they can do it. In the same chapter can be found ideas for further work on the project

    Hadoop tutorials: HBase - NoSQL on Hadoop

    No full text

    Evaluation of the Suitability of Alluxio for Hadoop Processing Frameworks

    No full text
    Alluxio is an open source memory speed virtual distributed storage platform. It sits between the storage and processing framework layers for big data processing and claims to heavily improve performance when data is required to be written/read at a high throughput; for example when a dataset is used by many jobs simultaneously. This report evaluates the viability of using Alluxio at CERN for Hadoop processing frameworks

    Spark - a modern approach for distributed analytics

    No full text
    The Hadoop ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. Spark has also emerged to be one of the leading engines for data analytics. The Hadoop platform is available at CERN as a central service provided by the IT department. By attending the session, a participant will acquire knowledge of the essential concepts need to benefit from the parallel data processing offered by Spark framework. The session is structured&nbsp;around practical examples and tutorials. Main topics: Architecture overview - work distribution, concepts of a worker and a driver Computing concepts of transformations and actions Data processing APIs - RDD, DataFrame, and SparkSQL </ul

    Web-Based Analysis at CERN

    No full text
    It is often normal for scientists to use IT services every day without appreciating the details of their architecture and operational aspects. Symmetrically, often, IT experts, while providing reliable services to the scientific community, do not have the opportunity to dive into the usage patterns of their infrastructures. These two lectures aim to bridge the gap between service users and service providers considering a particular service of the CERN portfolio, SWAN, which providing an interface for web based data analysis, federates several other production CERN services. The first lecture is dedicated to the general description of the SWAN service, its architecture and components. Concepts, which are generic to any service, will be stressed and distilled in form of general knowledge applicable in areas also not related to SWAN. Special attention is dedicated to the way in which it is packaged in containerized form and how this impacts its operation. Basic concepts of synchronized and shared storage; software distribution, interactive and mass processing are reviewed also through concrete examples collected during the first three years of the service. The second lecture is oriented towards interactive web based data analysis. The attendees are exposed to the basic principles which lead to a statistical interpretation of a dataset, focusing on examples coming from the collider physics and accelerator physics sectors as well as more traditional big data analysis related to IT infrastructure monitoring. Concepts such as identification of datasets, their cleaning and their effective processing are explored according to the state of the art technology trends. Academic_Training Feb: https://cern.zoom.us/j/63644909754?pwd=NVArRUNyKzl1UnhrQytIaDVFNWZnZz0

    Web-Based Analysis at CERN

    No full text
    It is often normal for scientists to use IT services every day without appreciating the details of their architecture and operational aspects. Symmetrically, often, IT experts, while providing reliable services to the scientific community, do not have the opportunity to dive into the usage patterns of their infrastructures. These two lectures aim to bridge the gap between service users and service providers considering a particular service of the CERN portfolio, SWAN, which providing an interface for web based data analysis, federates several other production CERN services. The first lecture is dedicated to the general description of the SWAN service, its architecture and components. Concepts, which are generic to any service, will be stressed and distilled in form of general knowledge applicable in areas also not related to SWAN. Special attention is dedicated to the way in which it is packaged in containerized form and how this impacts its operation. Basic concepts of synchronized and shared storage; software distribution, interactive and mass processing are reviewed also through concrete examples collected during the first three years of the service. The second lecture is oriented towards interactive web based data analysis. The attendees are exposed to the basic principles which lead to a statistical interpretation of a dataset, focusing on examples coming from the collider physics and accelerator physics sectors as well as more traditional big data analysis related to IT infrastructure monitoring. Concepts such as identification of datasets, their cleaning and their effective processing are explored according to the state of the art technology trends. Academic_Training Feb: https://cern.zoom.us/j/63644909754?pwd=NVArRUNyKzl1UnhrQytIaDVFNWZnZz0

    Apache Spark usage and deployment models for scientific computing

    No full text
    This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Apache Spark is currently getting the most traction from various user communities and new ways to deploy Spark such as Apache Mesos or Spark on Kubernetes have started to evolve rapidly. Meanwhile, notebook web applications such as Jupyter offer the ability to perform interactive data analytics and visualizations without the need to install additional software. CERN already provides a web platform, called SWAN (Service for Web-based ANalysis), where users can write and run their analyses in the form of notebooks, seamlessly accessing the data and software they need. The first part of the presentation talks about several recent integrations and optimizations to the Apache Spark computing platform to enable HEP data processing and CERN accelerator logging system analytics. The optimizations and integrations, include, but not limited to, access of kerberized resources, xrootd connector enabling remote access to EOS storage and integration with SWAN for interactive data analysis, thus forming a truly Unified Analytics Platform. The second part of the talk touches upon the evolution of the Apache Spark data analytics platform, particularly sharing the recent work done to run Spark on Kubernetes on the virtualized and container-based infrastructure in Openstack. This deployment model allows for elastic scaling of data analytics workloads enabling efficient, on-demand utilization of resources in private or public clouds
    corecore