200 research outputs found

    Intensional Query Processing in Deductive Database Systems.

    Get PDF
    This dissertation addresses the problem of deriving a set of non-ground first-order logic formulas (intensional answers), as an answer set to a given query, rather than a set of facts (extensional answers), in deductive database (DDB) systems based on non-recursive Horn clauses. A strategy in previous work in this area is to use resolution to derive intensional answers. It leaves however, several important problems. Some of them are: no specific resolution strategy is given; no specific methodologies to formalize the meaningful intensional answers are given; no solution is given to handle large facts in extensional databases (EDB); and no strategy is given to avoid deriving meaningless intensional answers. As a solution, a three-stage formalization process (pre-resolution, resolution, and post-resolution) for the derivation of meaningful intensional answers is proposed which can solve all of the problems mentioned above. A specific resolution strategy called SLD-RC resolution is proposed, which can derive a set of meaningful intensional answers. The notions of relevant literals and relevant clauses are introduced to avoid deriving meaningless intensional answers. The soundness and the completeness of SLD-RC resolution for intensional query processing are proved. An algorithm for the three-stage formalization process is presented and the correctness of the algorithm is proved. Furthermore, it is shown that there are two relationships between intensional answers and extensional answers. In a syntactic relationship, intensional answers are sufficient conditions to derive extensional answers. In a semantic relationship, intensional answers are sufficient and necessary conditions to derive extensional answers. Based on these relationships, the notions of the global and local completeness of an intensional database (IDB) are defined. It is proved that all incomplete IDBs can be transformed into globally complete IDBs, in which all extensional answers can be generated by evaluating intensional answers against an EDB. We claim that the intensional query processing provide a new methodology for query processing in DDBs and thus, extending the categories of queries, will greatly increase our insight into the nature of DDBs

    Investigating Information Structure of Phishing Emails Based on Persuasive Communication Perspective

    Get PDF
    Current approaches of phishing filters depend on classifying messages based on textually discernable features such as IP-based URLs or domain names as those features that can be easily extracted from a given phishing message. However, in the same sense, those easily perceptible features can be easily manipulated by sophisticated phishers. Therefore, it is important that universal patterns of phishing messages should be identified for feature extraction to serve as a basis for text classification. In this paper, we demonstrate that user perception regarding phishing message can be identified in central and peripheral routes of information processing. We also present a method of formulating quantitative model that can represent persuasive information structure in phishing messages. This paper makes contribution to phishing classification research by presenting the idea of universal information structure in terms of persuasive communication theories

    Investigating Information Structure of Phishing Emails Based on Persuasive Communication Perspective

    Get PDF
    Current approaches of phishing filters depend on classifying messages based on textually discernable features such as IP-based URLs or domain names as those features that can be easily extracted from a given phishing message. However, in the same sense, those easily perceptible features can be easily manipulated by sophisticated phishers. Therefore, it is important that universal patterns of phishing messages should be identified for feature extraction to serve as a basis for text classification. In this paper, we demonstrate that user perception regarding phishing message can be identified in central and peripheral routes of information processing. We also present a method of formulating quantitative model that can represent persuasive information structure in phishing messages. This paper makes contribution to phishing classification research by presenting the idea of universal information structure in terms of persuasive communication theories

    A Logical Model and Data Placement Strategies for MEMS Storage Devices

    Full text link
    MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility: freely selecting a set of probe tips for accessing data, (2) parallelism: simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.Comment: 37 page

    An alternative view on data processing pipelines from the DOLAP 2019 perspective

    Get PDF
    Data science requires constructing data processing pipelines (DPPs), which span diverse phases such as data integration, cleaning, pre-processing, and analysis. However, current solutions lack a strong data engineering perspective. As consequence, DPPs are error-prone, inefficient w.r.t. human efforts, and inefficient w.r.t. execution time. We claim that DPP design, development, testing, deployment, and execution should benefit from a standardized DPP architecture and from well-known data engineering solutions. This claim is supported by our experience in real projects and trends in the field, and it opens new paths for research and technology. With this spirit, we outline five research opportunities that represent novel trends towards building DPPs. Finally, we highlight that the best DOLAP 2019 papers selected for the DOLAP 2019 Information Systems Special Issue fall in this category and highlight the relevance of advanced data engineering for data science.Peer ReviewedPostprint (author's final draft

    A UML profile for multidimensional modeling in data warehouses

    Get PDF
    The multidimensional (MD) modeling, which is the foundation of data warehouses (DWs), MD databases, and On-Line Analytical Processing (OLAP) applications, is based on several properties different from those in traditional database modeling. In the past few years, there have been some proposals, providing their own formal and graphical notations, for representing the main MD properties at the conceptual level. However, unfortunately none of them has been accepted as a standard for conceptual MD modeling. In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile. This profile is defined by a set of stereotypes, constraints and tagged values to elegantly represent main MD properties at the conceptual level. We make use of the Object Constraint Language (OCL) to specify the constraints attached to the defined stereotypes, thereby avoiding an arbitrary use of these stereotypes. We have based our proposal in UML for two main reasons: (i) UML is a well known standard modeling language known by most database designers, thereby designers can avoid learning a new notation, and (ii) UML can be easily extended so that it can be tailored for a specific domain with concrete peculiarities such as the multidimensional modeling for data warehouses. Moreover, our proposal is Model Driven Architecture (MDA) compliant and we use the Query View Transformation (QVT) approach for an automatic generation of the implementation in a target platform. Throughout the paper, we will describe how to easily accomplish the MD modeling of DWs at the conceptual level. Finally, we show how to use our extension in Rational Rose for MD modeling.This work has been partially supported by the METASIGN project (TIN2004-00779) from the Spanish Ministry of Education and Science, by the DADASMECA project (GV05/220) from the Regional Government of Valencia, and by the MESSENGER (PCC-03-003-1) and DADS (PBC-05-012-2) projects from the Regional Science and Technology Ministry of Castilla-La Mancha (Spain)

    Focused multi-document summarization: Human summarization activity vs. automated systems techniques

    Get PDF
    Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (i.e. query, question, topic, etc.), or focus. Although the current state-of-the-art provides somewhat decent performance for DUC/TAC-like evaluations (i.e. government and news concerns), other considerations need to be explored. This paper not only briefly explores the state-of-the-art in automatic systems techniques, but also a comparison with human summarization activity

    Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications

    Get PDF
    Multidimensional (MD) modeling is the basis for data warehouses (DW), multidimensional databases (MDB) and on-line analytical processing (OLAP) applications. In this paper, we present how the unified modeling language (UML) can be successfully used to represent both structural and dynamic properties of these systems at the conceptual level. The structure of the system is specified by means of a UML class diagram that considers the main properties of MD modeling with minimal use of constraints and extensions of the UML. If the system to be modeled is too complex, thereby leading us to a considerable number of classes and relationships, we describe how to use the package grouping mechanism provided by the UML to simplify the final model. Furthermore, we provide a UML-compliant class notation (called cube class) to represent OLAP users’ initial requirements. We also describe how we can use the UML state and interaction diagrams to model the behavior of a data warehouse system. To facilitate the interchange of conceptual MD models, we provide a Document Type Definition (DTD) which allows us to represent the same MD modeling properties that can be considered by using our approach. From this DTD, we can directly generate valid eXtensible Markup Language (XML) documents that represent MD models at the conceptual level. We believe that our innovative approach provides a theoretical foundation for simplifying the conceptual design of MD systems and the examples included in this paper clearly illustrate the use of our approach
    • …
    corecore