41 research outputs found

    Adaptive Merging on Phase Change Memory

    Full text link
    Indexing is a well-known database technique used to facilitate data access and speed up query processing. Nevertheless, the construction and modification of indexes are very expensive. In traditional approaches, all records in the database table are equally covered by the index. It is not effective, since some records may be queried very often and some never. To avoid this problem, adaptive merging has been introduced. The key idea is to create index adaptively and incrementally as a side-product of query processing. As a result, the database table is indexed partially depending on the query workload. This paper faces a problem of adaptive merging for phase change memory (PCM). The most important features of this memory type are: limited write endurance and high write latency. As a consequence, adaptive merging should be investigated from the scratch. We solve this problem in two steps. First, we apply several PCM optimization techniques to the traditional adaptive merging approach. We prove that the proposed method (eAM) outperforms a traditional approach by 60%. After that, we invent the framework for adaptive merging (PAM) and a new PCM-optimized index. It further improves the system performance by 20% for databases where search queries interleave with data modifications

    SHARE : A Framework for Personalized and Healthy Recipe Recommendations

    Get PDF
    This paper presents a personalized recommendation system that suggests recipes to users based on their health history and similar users' preferences. Specifically, the system utilizes collaborative filtering to determine other users with similar dietary preferences and exploits this information to identify suitable recipes for an individual. The system is able to handle a wide range of health constraints, preferences, and specific diet plans, such as low-carb or vegetarian. We demonstrate the usability of the system through a series of experiments on a large real-world data set of recipes. The results indicate that our system is able to provide highly personalized and accurate recommendations.publishedVersionPeer reviewe

    The GDPR enforcement fines at glance

    Get PDF
    AbstractThe General Data Protection Regulation (GDPR) came into force in 2018. After this enforcement, many fines have already been imposed by national data protection authorities in Europe. This paper examines the individual GDPR articles referenced in the enforcement decisions, as well as predicts the amount of enforcement fines with available meta-data and text mining features extracted from the enforcement decision documents. According to the results, three articles related to the general principles, lawfulness, and information security have been the most frequently referenced ones. Although the amount of fines imposed vary across the articles referenced, these three particular articles do not stand out. Furthermore, a better statistical evidence is available with other meta-data features, including information about the particular European countries in which the enforcements were made. Accurate predictions are attainable even with simple machine learning techniques for regression analysis. Basic text mining features outperform the meta-data features in this regard. In addition to these results, the paper reflects the GDPR’s enforcement against public administration obstacles in the European Union (EU), as well as discusses the use of automatic decision-making systems in judiciary.</p

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

    Course Recommendation based on Sequences: An Evolutionary Search of Emerging Sequential Patterns

    Get PDF
    Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the Spanish Ministry of Science and Innovation, project PID2020-115832GBI00, and the University of Cordoba, project UCO-FEDER 18 REF.1263116 MOD.A. Both projects were also supported by the European Fund of Regional Development.To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken by each student. Hence, unlike existing long-term course planning methods, it is based on graduate students to model the course and not on external factors that might introduce some bias in the process. The proposal includes a novel sequential pattern mining algorithm, called (ES)2 P (Evolutionary Search of Emerging Sequential Patterns), that properly identifies paths followed by good students and not followed by not so good students, as a long-term course planning approach. A major feature of the proposed (ES)2 P algorithm is its ability to extract the best k solutions, that is, those with a best recommendation index score instead of returning the whole set of solutions above a predefined threshold. A real study case is performed including more than 13,000 students belonging to 13 faculties to demonstrate the usefulness of the proposal not only to recommend study plans but also to give advices at different stages of the students’ learning process.CRUE-CSICSpringer NatureSpanish Government PID2020-115832GBI00University of Cordoba UCO-FEDER 18 REF.1263116 MOD.

    Course Recommendation based on Sequences: An Evolutionary Search of Emerging Sequential Patterns

    Get PDF
    To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken by each student. Hence, unlike existing long-term course planning methods, it is based on graduate students to model the course and not on external factors that might introduce some bias in the process. The proposal includes a novel sequential pattern mining algorithm, called (ES)2P (Evolutionary Search of Emerging Sequential Patterns), that properly identifies paths followed by good students and not followed by not so good students, as a long-term course planning approach. A major feature of the proposed (ES)2P algorithm is its ability to extract the best k solutions, that is, those with a best recommendation index score instead of returning the whole set of solutions above a predefined threshold. A real study case is performed including more than 13,000 students belonging to 13 faculties to demonstrate the usefulness of the proposal not only to recommend study plans but also to give advices at different stages of the students’ learning process

    A Knowledge Graph Based Integration Approach for Industry 4.0

    Get PDF
    The fourth industrial revolution, Industry 4.0 (I40) aims at creating smart factories employing among others Cyber-Physical Systems (CPS), Internet of Things (IoT) and Artificial Intelligence (AI). Realizing smart factories according to the I40 vision requires intelligent human-to-machine and machine-to-machine communication. To achieve this communication, CPS along with their data need to be described and interoperability conflicts arising from various representations need to be resolved. For establishing interoperability, industry communities have created standards and standardization frameworks. Standards describe main properties of entities, systems, and processes, as well as interactions among them. Standardization frameworks classify, align, and integrate industrial standards according to their purposes and features. Despite being published by official international organizations, different standards may contain divergent definitions for similar entities. Further, when utilizing the same standard for the design of a CPS, different views can generate interoperability conflicts. Albeit expressive, standardization frameworks may represent divergent categorizations of the same standard to some extent, interoperability conflicts need to be resolved to support effective and efficient communication in smart factories. To achieve interoperability, data need to be semantically integrated and existing conflicts conciliated. This problem has been extensively studied in the literature. Obtained results can be applied to general integration problems. However, current approaches fail to consider specific interoperability conflicts that occur between entities in I40 scenarios. In this thesis, we tackle the problem of semantic data integration in I40 scenarios. A knowledge graphbased approach allowing for the integration of entities in I40 while considering their semantics is presented. To achieve this integration, there are challenges to be addressed on different conceptual levels. Firstly, defining mappings between standards and standardization frameworks; secondly, representing knowledge of entities in I40 scenarios described by standards; thirdly, integrating perspectives of CPS design while solving semantic heterogeneity issues; and finally, determining real industry applications for the presented approach. We first devise a knowledge-driven approach allowing for the integration of standards and standardization frameworks into an Industry 4.0 knowledge graph (I40KG). The standards ontology is used for representing the main properties of standards and standardization frameworks, as well as relationships among them. The I40KG permits to integrate standards and standardization frameworks while solving specific semantic heterogeneity conflicts in the domain. Further, we semantically describe standards in knowledge graphs. To this end, standards of core importance for I40 scenarios are considered, i.e., the Reference Architectural Model for I40 (RAMI4.0), AutomationML, and the Supply Chain Operation Reference Model (SCOR). In addition, different perspectives of entities describing CPS are integrated into the knowledge graphs. To evaluate the proposed methods, we rely on empirical evaluations as well as on the development of concrete use cases. The attained results provide evidence that a knowledge graph approach enables the effective data integration of entities in I40 scenarios while solving semantic interoperability conflicts, thus empowering the communication in smart factories

    Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

    Full text link
    We study black-box model stealing attacks where the attacker can query a machine learning model only through publicly available APIs. Specifically, our aim is to design a black-box model extraction attack that uses minimal number of queries to create an informative and distributionally equivalent replica of the target model. First, we define distributionally equivalent and max-information model extraction attacks. Then, we reduce both the attacks into a variational optimisation problem. The attacker solves this problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads us to an active sampling-based query selection algorithm, Marich. We evaluate Marich on different text and image data sets, and different models, including BERT and ResNet18. Marich is able to extract models that achieve 69−96%69-96\% of true model's accuracy and uses 1,070−6,9501,070 - 6,950 samples from the publicly available query datasets, which are different from the private training datasets. Models extracted by Marich yield prediction distributions, which are ∼2−4×\sim2-4\times closer to the target's distribution in comparison to the existing active sampling-based algorithms. The extracted models also lead to 85−95%85-95\% accuracy under membership inference attacks. Experimental results validate that Marich is query-efficient, and also capable of performing task-accurate, high-fidelity, and informative model extraction.Comment: Presented in the Privacy-Preserving AI (PPAI) workshop at AAAI 2023 as a spotlight tal
    corecore