689 research outputs found

    Managed Query Processing within the SAP HANA Database Platform

    Get PDF
    The SAP HANA database extends the scope of traditional database engines as it supports data models beyond regular tables, e.g. text, graphs or hierarchies. Moreover, SAP HANA also provides developers with a more fine-grained control to define their database application logic, e.g. exposing specific operators which are difficult to express in SQL. Finally, the SAP HANA database implements efficient communication to dedicated client applications using more effective communication mechanisms than available with standard interfaces like JDBC or ODBC. These features of the HANA database are complemented by the extended scripting engine–an application server for server-side JavaScript applications–that is tightly integrated into the query processing and application lifecycle management. As a result, the HANA platform offers more concise models and code for working with the HANA platform and provides superior runtime performance. This paper describes how these specific capabilities of the HANA platform can be consumed and gives a holistic overview of the HANA platform starting from query modeling, to the deployment, and efficient execution. As a distinctive feature, the HANA platform integrates most steps of the application lifecycle, and thus makes sure that all relevant artifacts stay consistent whenever they are modified. The HANA platform also covers transport facilities to deploy and undeploy applications in a complex system landscape

    On the Integration of Electrical/Electronic Product Data in the Automotive Domain

    Get PDF
    The recent innovation of modern cars has mainly been driven by the development of new as well as the continuous improvement of existing electrical and electronic (E/E) components, including sensors, actuators, and electronic control units. This trend has been accompanied by an increasing complexity of E/E components and their numerous interdependencies. In addition, external impact factors (e.g., changes of regulations, product innovations) demand for more sophisticated E/E product data management (E/E-PDM). Since E/E product data is usually scattered over a large number of distributed, heterogeneous IT systems, application-spanning use cases are difficult to realize (e.g., ensuring the consistency of artifacts corresponding to different development phases, plausibility of logical connections between electronic control units). To tackle this challenge, the partial integration of E/E product data as well as corresponding schemas becomes necessary. This paper presents the properties of a typical IT system landscape related to E/E-PDM, reveals challenges emerging in this context, and elicits requirements for E/E-PDM. Based on this, insights into our framework, which targets at the partial integration of E/E product data, are given. Such an integration will foster E/E product data integration and hence contribute to an improved E/E product quality

    OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data

    Get PDF
    Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches. In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data

    Heterogeneity-Aware Operator Placement in Column-Store DBMS

    Get PDF
    Due to the tremendous increase in the amount of data efficiently managed by current database systems, optimization is still one of the most challenging issues in database research. Today’s query optimizer determine the most efficient composition of physical operators to execute a given SQL query, whereas the underlying hardware consists of a multi-core CPU. However, hardware systems are more and more shifting towards heterogeneity, combining a multi-core CPU with various computing units, e.g., GPU or FPGA cores. In order to efficiently utilize the provided performance capability of such heterogeneous hardware, the assignment of physical operators to computing units gains importance. In this paper, we propose a heterogeneity-aware physical operator placement strategy (HOP) for in-memory columnar database systems in a heterogeneous environment. Our placement approach takes operators from the physical query execution plan as an input and assigns them to computing units using a cost model at runtime. To enable this runtime decision, our cost model uses the characteristics of the computing units, execution properties of the operators, as well as runtime data to estimate execution costs for each unit. We evaluated our approach on full TPC-H queries within a prototype database engine. As we are going to show, the placement in a heterogeneous hardware system has a high influence on query performance

    Database (Lecture) Streams on the Cloud: Experience Report on Teaching an Undergrad Database Lecture During a Pandemic

    Get PDF
    This is an experience report on teaching the undergrad lecture Big Data Engineering at Saarland University in summer term 2020 online. We describe our teaching philosophy, the tools used, what worked and what did not work. As we received extremely positive feedback from the students, we will continue to use the same teaching model for other lectures in the future

    Towards Integrated Data Analytics: Time Series Forecasting in DBMS

    Get PDF
    Integrating sophisticated statistical methods into database management systems is gaining more and more attention in research and industry in order to be able to cope with increasing data volume and increasing complexity of the analytical algorithms. One important statistical method is time series forecasting, which is crucial for decision making processes in many domains. The deep integration of time series forecasting offers additional advanced functionalities within a DBMS. More importantly, however, it allows for optimizations that improve the efficiency, consistency, and transparency of the overall forecasting process. To enable efficient integrated forecasting, we propose to enhance the traditional 3-layer ANSI/SPARC architecture of a DBMS with forecasting functionalities. This article gives a general overview of our proposed enhancements and presents how forecast queries can be processed using an example from the energy data management domain. We conclude with open research topics and challenges that arise in this area

    Visuelle Exploration multivariater Daten im Rahmen eines medizinischen Anwendungsszenarios

    Get PDF
    In diesem Beitrag wird ein Ansatz vorgestellt, der basierend auf Techniken der visuellen Daten- Exploration und semantikbasierten Fusion eine Nutzung von Analysemethoden wie Data- Mining- und Visualisierungstechniken zur Wissensgenerierung in verteilten, kooperativen Umgebungen erlaubt. Unter Einsatz von Ontologien zur semantischen Beschreibung verteilter Quellen wird es ermöglicht, die Daten und Analysemethoden aus diesen Quellen zu fusionieren. Kern der Architektur ist die Gatewaykomponente, die es dem Analysten erlaubt, Daten und Analysemethoden in einer verteilten Umgebung zu nutzen. Im Rahmen eines medizinischen Anwendungsszenarios wurden die vorgestellten Komponenten evaluiert

    Automated Multilingual Detection of Pro-Kremlin Propaganda in Newspapers and Telegram Posts

    Get PDF
    The full-scale conflict between the Russian Federation and Ukraine generated an unprecedented amount of news articles and social media data reflecting opposing ideologies and narratives. These polarized campaigns have led to mutual accusations of misinformation and fake news, shaping an atmosphere of confusion and mistrust for readers worldwide. This study analyses how the media affected and mirrored public opinion during the first month of the war using news articles and Telegram news channels in Ukrainian, Russian, Romanian, French and English. We propose and compare two methods of multilingual automated pro-Kremlin propaganda identification, based on Transformers and linguistic features. We analyse the advantages and disadvantages of both methods, their adaptability to new genres and languages, and ethical considerations of their usage for content moderation. With this work, we aim to lay the foundation for further development of moderation tools tailored to the current conflict
    • …
    corecore