61 research outputs found

    Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis

    Get PDF
    In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    Constructive Reasoning for Semantic Wikis

    Get PDF
    One of the main design goals of social software, such as wikis, is to support and facilitate interaction and collaboration. This dissertation explores challenges that arise from extending social software with advanced facilities such as reasoning and semantic annotations and presents tools in form of a conceptual model, structured tags, a rule language, and a set of novel forward chaining and reason maintenance methods for processing such rules that help to overcome the challenges. Wikis and semantic wikis were usually developed in an ad-hoc manner, without much thought about the underlying concepts. A conceptual model suitable for a semantic wiki that takes advanced features such as annotations and reasoning into account is proposed. Moreover, so called structured tags are proposed as a semi-formal knowledge representation step between informal and formal annotations. The focus of rule languages for the Semantic Web has been predominantly on expert users and on the interplay of rule languages and ontologies. KWRL, the KiWi Rule Language, is proposed as a rule language for a semantic wiki that is easily understandable for users as it is aware of the conceptual model of a wiki and as it is inconsistency-tolerant, and that can be efficiently evaluated as it builds upon Datalog concepts. The requirement for fast response times of interactive software translates in our work to bottom-up evaluation (materialization) of rules (views) ahead of time – that is when rules or data change, not when they are queried. Materialized views have to be updated when data or rules change. While incremental view maintenance was intensively studied in the past and literature on the subject is abundant, the existing methods have surprisingly many disadvantages – they do not provide all information desirable for explanation of derived information, they require evaluation of possibly substantially larger Datalog programs with negation, they recompute the whole extension of a predicate even if only a small part of it is affected by a change, they require adaptation for handling general rule changes. A particular contribution of this dissertation consists in a set of forward chaining and reason maintenance methods with a simple declarative description that are efficient and derive and maintain information necessary for reason maintenance and explanation. The reasoning methods and most of the reason maintenance methods are described in terms of a set of extended immediate consequence operators the properties of which are proven in the classical logical programming framework. In contrast to existing methods, the reason maintenance methods in this dissertation work by evaluating the original Datalog program – they do not introduce negation if it is not present in the input program – and only the affected part of a predicate’s extension is recomputed. Moreover, our methods directly handle changes in both data and rules; a rule change does not need to be handled as a special case. A framework of support graphs, a data structure inspired by justification graphs of classical reason maintenance, is proposed. Support graphs enable a unified description and a formal comparison of the various reasoning and reason maintenance methods and define a notion of a derivation such that the number of derivations of an atom is always finite even in the recursive Datalog case. A practical approach to implementing reasoning, reason maintenance, and explanation in the KiWi semantic platform is also investigated. It is shown how an implementation may benefit from using a graph database instead of or along with a relational database

    Managing Uncertainty and Vagueness in Semantic Web

    Get PDF
    Ο Σημασιολογικός Ιστός στοχεύει στην διεκπεραίωση εργασιών σε υπολογιστικά συστήματα χωρίς την ανθρώπινη παρέμβαση. Προκειμένου να επιτευχθεί ο στόχος αυτός, εισάγεται η έννοια της πληροφορίας που είναι επεξεργάσιμη από μηχανές. Στα περισσότερα προβλήματα, η έννοια της πληροφορίας είναι συνυφασμένη με την έννοια της αβεβαιότητας και της ασάφειας. Και οι δύο έννοιες περιγράφονται με την κοινή ονομασία ατελής πληροφορία. Δεδομένου ότι ο Σημασιολογικός Ιστός απαρτίζεται από ένα σύνολο τεχνολογιών και των θεωριών που τις διέπουν, οποιαδήποτε μέθοδος αναπαράστασης θα πρέπει να βρίσκεται σε συμφωνία με άλλες υπάρχουσες. Συγκεκριμένα, το θεωρητικό πλαίσιο πρέπει να εντάσσεται ομαλά στη θεωρία που εφαρμόζεται στο Σημασιολογικό Ιστό. Η δε υλοποίησή του, ιδανικό είναι, να υποστηριχθεί με χρήση μεθόδων του Σημασιολογικού Ιστού, στις οποίες κυριαρχεί εκείνη των οντολογιών. Στη διατριβή μας, ορίσαμε μία μέθοδο αναπαράστασης της αβεβαιότητας και της ασάφειας μέσω ενός ενιαίου πλαισίου. Το μοντέλο Dempster-Shafer χρησιμοποιήθηκε για την αναπαράσταση της αβεβαιότητας και το μοντέλο Ασαφούς Λογικής και Ασαφών Συνόλων για την αναπαράσταση της ασάφειας. Για το λόγο αυτό, ορίσαμε το θεωρητικό πλαίσιο, στοχεύοντας σε ένα συνδυασμό ALC Λογικών Περιγραφών (Description Logics) με το μοντέλο Dempster-Shafer. Κατά τη διάρκεια της έρευνάς μας υλοποιήσαμε μεταοντολογίες για την αναπαράσταση της αβεβαιότητας και της ασάφειας και στη συνέχεια μελετήσαμε την συμπεριφορά τους σε πραγματικές εφαρμογές.Semantic Web has been designed for processing tasks without human intervention. In this context, the term machine processable information has been introduced. In most Semantic Web tasks, we come across information incompleteness issues, aka uncertainty and vagueness. For this reason, a method that represents uncertainty and vagueness under a common framework has to be defined. Semantic Web technologies are defined through a Semantic Web Stack and are based on a clear formal foundation. Therefore, any representation scheme should be aligned with these technologies and be formally defined. As the concept of ontologies is significant in the Semantic Web for representing knowledge, any framework is desirable to be built upon it. In our work, we have defined an approach for representing uncertainty and vagueness under a common framework. Uncertainty is represented through Dempster-Shafer model, whereas vagueness has been represented through Fuzzy Logic and Fuzzy Sets. For this reason, we have defined our theoretical framework, aimed at a combination of the classical crisp DL ALC with a Dempster-Shafer module. As a next step, we added fuzziness to this model. Throughout our work, we have implemented metaontologies in order to represent uncertain and vague concepts and, next, we have tested our methodology in real-world applications

    Keyword Searches and Schema Transformation for Multi-Model Databases

    Get PDF
    The Variety of data is promoting the evolution and development of databases. One of the influence results is the emergence of multi-model databases. So far, the database community has proposed quite a few multi-model databases to support different data models, but these databases adopt diverse methods to implement their data storage and query, which results in a heavy burden for novices to use multi-model databases. Considering this, we present our first research topic - how to employ the keyword searches method as an alternative way to explore and query multi-model databases. Besides, compared with the mature and robust relational databases dominating the current market, multi-model databases - which can not yet match them in transaction management, query optimization, security, etc. - still need time to perfect their foundations of the mathematic theory and boost performance. Considering this, we present our second research topic - how to use relational databases as an alternative way to store and query well-structured data and NoSQL data uniformly. For the first research problem, we utilize the probabilistic formalism of quantum physics to bring the problem into vector spaces and exploit non-classical probabilities to find top-k the most relevant results. As for the second research topic, it requires designing a good relational schema to store these various data in relational databases. But the challenge is that we need to address the difference of structure between flat relational tables and complex multi-model data. To address this problem, we review all relevant works, analyze existing methods, and give a literature review. As a result, we find these works focusing on handling one single data model by relational databases. There is no relevant research to handle multi-model data. Against this challenge, we prepare to employ the reinforcement learning method. This is because this method could automatically obtain an excellent relational schema from the given multi-model data and queries by interacting with the outer environment. To make this idea work in the field of databases, we define the input, goal, reward, policy, and observation according to our purpose, respectively. Besides, we present a Double Q-tables algorithm to assist in decreasing the complexity of the learning process.Datan monimuotoisuus edistää tietokantojen kehittymistä. Eräs vaikuttavimmista kehityskuluista on monimallisten tietokantojen synty,. Tietokantayhteisö on kehittänyt useita monimallisia tietokantoja tukemaan erilaisia tietomalleja. Näissä monimallisissa tietokannoissa on toteutettu monipuolisesti erilaisia tapoja tallentaa dataa ja suorittaa tietokantakyselyjä, mikä toisaalta aiheuttaa aloittelijoille vaikeuksia monimallisten tietokantojen käyttämisessä. Aloittelijoiden omaksuttava jokaisen monimallisen tietokannan kyselykieli erikseen. Näiden lisäksi käyttäjien täytyy hallita monimutkaisia ja dynaamisesti kehittyviä tietokantakaavioita, jotta he voivat muodostaa kyselyitä monimallisissa tietokannoissa. Ottaen huomioon nämä haasteet esitämme ensimmäisen tutkimuskysymyksen: kuinka käyttää avainsanahakua vaihtoehtoisena tapana suorittaa kyselyitä monimallisissa tietokannoissa? Ensimmäisen tutkimuskysymyksen osalta hyödynnämme kvanttifysiikkaan liittyvää todennäköisyyslaskennan formalismia, joka muotoilee ongelman vektoriavaruuksien avulla ja hyödyntää ei-klassisia todennäköisyyksiä. Tällöin löydetään k olennaisinta tulosta, jotka koostuvat useasta komponentista ja tietomallista. Lähestymme toista tutkimusongelmaa havaitsemalla, että monimallisen tiedon tallentaminen relaatiotietokantaan vaatii hyvän relaatiotietokantakaavion kehittämistä. Meidän täytyy ottaa huomioon yksiulotteisten relaatioiden ja monimallisen tiedon rakenteelliset erot. Aloitamme katsauksella nykyiseen aiheeseen liittyvään tutkimukseen, analysoimme olemassa olevia menetelmiä sekä kokoamme kirjallisuuskatsauksen aiheesta. Selvityksen perusteella voimme havaita, että nämä tutkimukset keskittyvät yhden tietomallin käsittelemiseen relaatiotietokannoissa eikä monimallista tietoa ole toistaiseksi käsitelty tutkimuksissa lainkaan. Vastataksemme tähän haasteeseen kehitämme vahvistusoppimiseen perustuvan menetelmän, jolla pystymme tuottamaan erinomaisen relaatiokaavion monimalliselle tiedolle sekä kyselyille vuorovaikutuksessa ympäristön kanssa. Jotta kykenemme soveltamaan tätä ideaa tietokantatutkimuksessa, määrittelemme tarkoituksiimme sopivan syötteen, tavoitteen, palkkiosysteemin, menettelytavan ja havainnot. Lisäksi esittelemme ns. Double Q-tables -algoritmin, joka auttaa koneoppimisprosessin vaativuuden vähentämisessä

    Query Answering in Probabilistic Data and Knowledge Bases

    Get PDF
    Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory

    A Labeling DOM-Based Tree Walking Algorithm for Mapping XML Documents into Relational Databases

    Get PDF
    XML has emerged as the standard format for representing and exchanging data on the World Wide Web. For practical purposes, it is found to be critical to have efficient mechanisms to store and query XML data, as well as to exploit the full power of this new technology. Several researchers have proposed to use relational databases to store and query XML data. With the understanding the limitations of current approaches, this thesis aims to propose an algorithm for automatic mapping XML documents to RDBMS with XML-API as a database utility. The algorithm uses best fit auto mapping technique, and dynamic shredding, of a specified selected XML document type (datacentric, document-centric, and mixed documents).e. The propose algorithm use DOM(Data Object Model) as a warehouse and stack as a data structure to mapping the XML document into relational database and reconstructing the XML document from the relational database. The experiment study show that the algorithm mapping document and reconstructing it again well. Finally, the algorithm compare with other algorithms the result is good in time and efficiency, also the algorithm complexity is O(11n+2)
    corecore