8 research outputs found

    Integrating prior knowledge into factorization approaches for relational learning

    Get PDF
    An efficient way to represent the domain knowledge is relational data, where information is recorded in form of relationships between entities. Relational data is becoming ubiquitous over the years for knowledge representation due to the fact that many real-word data is inherently interlinked. Some well-known examples of relational data are: the World Wide Web (WWW), a system of interlinked hypertext documents; the Linked Open Data (LOD) cloud of the Semantic Web, a collection of published data and their interlinks; and finally the Internet of Things (IoT), a network of physical objects with internal states and communications ability. Relational data has been addressed by many different machine learning approaches, the most promising ones are in the area of relational learning, which is the focus of this thesis. While conventional machine learning algorithms consider entities as being independent instances randomly sampled from some statistical distribution and being represented as data points in a vector space, relational learning takes into account the overall network environment when predicting the label of an entity, an attribute value of an entity or the existence of a relationship between entities. An important feature is that relational learning can exploit contextual information that is more distant in the relational network. As the volume and structural complexity of the relational data increase constantly in the era of Big Data, scalability and the modeling power become crucial for relational learning algorithms. Previous relational learning algorithms either provide an intuitive representation of the model, such as Inductive Logic Programming (ILP) and Markov Logic Networks (MLNs), or assume a set of latent variables to explain the observed data, such as the Infinite Hidden Relational Model (IHRM), the Infinite Relational Model (IRM) and factorization approaches. Models with intuitive representations often involve some form of structure learning which leads to scalability problems due to a typically large search space. Factorizations are among the best-performing approaches for large-scale relational learning since the algebraic computations can easily be parallelized and since they can exploit data sparsity. Previous factorization approaches exploit only patterns in the relational data itself and the focus of the thesis is to investigate how additional prior information (comprehensive information), either in form of unstructured data (e.g., texts) or structured patterns (e.g., in form of rules) can be considered in the factorization approaches. The goal is to enhance the predictive power of factorization approaches by involving prior knowledge for the learning, and on the other hand to reduce the model complexity for efficient learning. This thesis contains two main contributions: The first contribution presents a general and novel framework for predicting relationships in multirelational data using a set of matrices describing the various instantiated relations in the network. The instantiated relations, derived or learnt from prior knowledge, are integrated as entities' attributes or entity-pairs' attributes into different adjacency matrices for the learning. All the information available is then combined in an additive way. Efficient learning is achieved using an alternating least squares approach exploiting sparse matrix algebra and low-rank approximation. As an illustration, several algorithms are proposed to include information extraction, deductive reasoning and contextual information in matrix factorizations for the Semantic Web scenario and for recommendation systems. Experiments on various data sets are conducted for each proposed algorithm to show the improvement in predictive power by combining matrix factorizations with prior knowledge in a modular way. In contrast to a matrix, a 3-way tensor si a more natural representation for the multirelational data where entities are connected by different types of relations. A 3-way tensor is a three dimensional array which represents the multirelational data by using the first two dimensions for entities and using the third dimension for different types of relations. In the thesis, an analysis on the computational complexity of tensor models shows that the decomposition rank is key for the success of an efficient tensor decomposition algorithm, and that the factorization rank can be reduced by including observable patterns. Based on these theoretical considerations, a second contribution of this thesis develops a novel tensor decomposition approach - an Additive Relational Effects (ARE) model - which combines the strengths of factorization approaches and prior knowledge in an additive way to discover different relational effects from the relational data. As a result, ARE consists of a decomposition part which derives the strong relational leaning effects from a highly scalable tensor decomposition approach RESCAL and a Tucker 1 tensor which integrates the prior knowledge as instantiated relations. An efficient least squares approach is proposed to compute the combined model ARE. The additive model contains weights that reflect the degree of reliability of the prior knowledge, as evaluated by the data. Experiments on several benchmark data sets show that the inclusion of prior knowledge can lead to better performing models at a low tensor rank, with significant benefits for run-time and storage requirements. In particular, the results show that ARE outperforms state-of-the-art relational learning algorithms including intuitive models such as MRC, which is an approach based on Markov Logic with structure learning, factorization approaches such as Tucker, CP, Bayesian Clustered Tensor Factorization (BCTF), the Latent Factor Model (LFM), RESCAL, and other latent models such as the IRM. A final experiment on a Cora data set for paper topic classification shows the improvement of ARE over RESCAL in both predictive power and runtime performance, since ARE requires a significantly lower rank

    Integrating prior knowledge into factorization approaches for relational learning

    Get PDF
    An efficient way to represent the domain knowledge is relational data, where information is recorded in form of relationships between entities. Relational data is becoming ubiquitous over the years for knowledge representation due to the fact that many real-word data is inherently interlinked. Some well-known examples of relational data are: the World Wide Web (WWW), a system of interlinked hypertext documents; the Linked Open Data (LOD) cloud of the Semantic Web, a collection of published data and their interlinks; and finally the Internet of Things (IoT), a network of physical objects with internal states and communications ability. Relational data has been addressed by many different machine learning approaches, the most promising ones are in the area of relational learning, which is the focus of this thesis. While conventional machine learning algorithms consider entities as being independent instances randomly sampled from some statistical distribution and being represented as data points in a vector space, relational learning takes into account the overall network environment when predicting the label of an entity, an attribute value of an entity or the existence of a relationship between entities. An important feature is that relational learning can exploit contextual information that is more distant in the relational network. As the volume and structural complexity of the relational data increase constantly in the era of Big Data, scalability and the modeling power become crucial for relational learning algorithms. Previous relational learning algorithms either provide an intuitive representation of the model, such as Inductive Logic Programming (ILP) and Markov Logic Networks (MLNs), or assume a set of latent variables to explain the observed data, such as the Infinite Hidden Relational Model (IHRM), the Infinite Relational Model (IRM) and factorization approaches. Models with intuitive representations often involve some form of structure learning which leads to scalability problems due to a typically large search space. Factorizations are among the best-performing approaches for large-scale relational learning since the algebraic computations can easily be parallelized and since they can exploit data sparsity. Previous factorization approaches exploit only patterns in the relational data itself and the focus of the thesis is to investigate how additional prior information (comprehensive information), either in form of unstructured data (e.g., texts) or structured patterns (e.g., in form of rules) can be considered in the factorization approaches. The goal is to enhance the predictive power of factorization approaches by involving prior knowledge for the learning, and on the other hand to reduce the model complexity for efficient learning. This thesis contains two main contributions: The first contribution presents a general and novel framework for predicting relationships in multirelational data using a set of matrices describing the various instantiated relations in the network. The instantiated relations, derived or learnt from prior knowledge, are integrated as entities' attributes or entity-pairs' attributes into different adjacency matrices for the learning. All the information available is then combined in an additive way. Efficient learning is achieved using an alternating least squares approach exploiting sparse matrix algebra and low-rank approximation. As an illustration, several algorithms are proposed to include information extraction, deductive reasoning and contextual information in matrix factorizations for the Semantic Web scenario and for recommendation systems. Experiments on various data sets are conducted for each proposed algorithm to show the improvement in predictive power by combining matrix factorizations with prior knowledge in a modular way. In contrast to a matrix, a 3-way tensor si a more natural representation for the multirelational data where entities are connected by different types of relations. A 3-way tensor is a three dimensional array which represents the multirelational data by using the first two dimensions for entities and using the third dimension for different types of relations. In the thesis, an analysis on the computational complexity of tensor models shows that the decomposition rank is key for the success of an efficient tensor decomposition algorithm, and that the factorization rank can be reduced by including observable patterns. Based on these theoretical considerations, a second contribution of this thesis develops a novel tensor decomposition approach - an Additive Relational Effects (ARE) model - which combines the strengths of factorization approaches and prior knowledge in an additive way to discover different relational effects from the relational data. As a result, ARE consists of a decomposition part which derives the strong relational leaning effects from a highly scalable tensor decomposition approach RESCAL and a Tucker 1 tensor which integrates the prior knowledge as instantiated relations. An efficient least squares approach is proposed to compute the combined model ARE. The additive model contains weights that reflect the degree of reliability of the prior knowledge, as evaluated by the data. Experiments on several benchmark data sets show that the inclusion of prior knowledge can lead to better performing models at a low tensor rank, with significant benefits for run-time and storage requirements. In particular, the results show that ARE outperforms state-of-the-art relational learning algorithms including intuitive models such as MRC, which is an approach based on Markov Logic with structure learning, factorization approaches such as Tucker, CP, Bayesian Clustered Tensor Factorization (BCTF), the Latent Factor Model (LFM), RESCAL, and other latent models such as the IRM. A final experiment on a Cora data set for paper topic classification shows the improvement of ARE over RESCAL in both predictive power and runtime performance, since ARE requires a significantly lower rank

    Model konsep kemahiran hijau Politeknik Malaysia

    Get PDF
    Kemahiran hijau merupakan kemahiran yang diperlukan di dalam pasaran pekerjaan hijau yang berteraskan perekonomian hijau. Namun pada masa kini di Malaysia kemahiran hijau masih merupakan perkara baru dan perlu diterokai tentang jenis dan bentuk kemahiran tersebut. Sehubungan dengan itu kajian ini dijalankan untuk meneroka dan menghasilkan Model Konsep Kemahiran Hijau Politeknik Malaysia untuk diterapkan di dalam kurikulum kejuruteraan mekanikal, awam dan elektrik. Kajian ini menggunakan kaedah Exploratory Sequential Mix-Method yang melibatkan 3 fasa kajian yang berfokuskan kepada bidang kejuruteraan mekanikal, awam dan elektrik. Fasa 1: pemerolehan data berkaitan dengan kemahiran hijau diperolehi melalui kaedah kualitatif. Perbincangan dijalankan dengan pakar kurikulum secara kumpulan fokus dan temu bual bersemuka dengan pakar TVET. Konsensus pakar diperolehi melalui Teknik Fuzzy Delphi. Hasil daripada fasa 1 sebanyak 31 elemen kemahiran hijau telah dikenal pasti. Sebelum fasa 2 dimulakan instrumen soal selidik telah dibangunkan berdasarkan dapatan fasa 1. Setelah kajian rintis dijalankan terdapat 8 elemen kemahiran hijau telah dikeluarkan kerana tidak memenuhi kehendak diagnosis yang dijalankan. Sejumlah 23 elemen kemahiran hijau dikekalkan bagi kajian seterusnya pada fasa 2. Pada fasa 2 penyelidik telah menggunakan instrumen soal selidik yang telah dibangunkan bagi memperoleh data kuantitatif. Sampel kajian terdiri daripada 30 orang pekerja daripada industri dan 320 orang pensyarah Politeknik Malaysia daripada semenanjung Malaysia mengikut zon yang terpilih. Dapatan kajian kuantitatif dianalisis menggunakan perisian Winsteps V3.69.1.11 . Dapatan analisis menunjukkan secara keseluruhan pihak industri menunjukkan persetujuan pada tahap tinggi semua terhadap kepentingan elemen kemahiran hijau (min skor konstruk tempat kerja = 4.47, min skor konstruk akademik = 4.06, min skor konstruk keberkesana personel = 4.35). Pensyarah politeknik juga menunjukkan persetujuan pada tahap yang tinggi terhadap kepentingan elemen kemahiran hijau (min konstruk tempat kerja = 4.46, min konstruk akademik = 4.31, min konstruk keberkesana personel = 4.43). Namun dapatan analisis menunjukkan terdapat perbezaan yang signifikan terhadap elemen kemahiran asas komputer (t = -2.28), pembelajaran sepanjang hayat (t =-2.14) tanggungjawab sosial (t = -2.11) dan etika dan profesional (t =2.21). Manakala fasa ketiga pula menggunakan analisis PCA bagi tujuan pengesahan kerangka model konsep kemahiran hijau. Analisis PCA menunjukkan Model Konsep Kemahiran Hijau Politeknik telah dapat disahkan secara empirikal (konstruk tempat kerja RVEM = 42.3 %, UV1stC = 2.4,10%, , konstruk akademik RVEM = 48.3 %, UV1stC = 2.7,9.8%, konstruk keberkesanan personel RVEM = 41.9 %, UV1stC =2.9,9.4%,

    Scalable Task Schedulers for Many-Core Architectures

    Get PDF
    This thesis develops schedulers for many-cores with different optimization objectives. The proposed schedulers are designed to be scale up as the number of cores in many-cores increase while continuing to provide guarantees on the quality of the schedule

    Prediction, evolution and privacy in social and affiliation networks

    Get PDF
    In the last few years, there has been a growing interest in studying online social and affiliation networks, leading to a new category of inference problems that consider the actor characteristics and their social environments. These problems have a variety of applications, from creating more effective marketing campaigns to designing better personalized services. Predictive statistical models allow learning hidden information automatically in these networks but also bring many privacy concerns. Three of the main challenges that I address in my thesis are understanding 1) how the complex observed and unobserved relationships among actors can help in building better behavior models, and in designing more accurate predictive algorithms, 2) what are the processes that drive the network growth and link formation, and 3) what are the implications of predictive algorithms to the privacy of users who share content online. The majority of previous work in prediction, evolution and privacy in online social networks has concentrated on the single-mode networks which form around user-user links, such as friendship and email communication. However, single-mode networks often co-exist with two-mode affiliation networks in which users are linked to other entities, such as social groups, online content and events. We study the interplay between these two types of networks and show that analyzing these higher-order interactions can reveal dependencies that are difficult to extract from the pair-wise interactions alone. In particular, we present our contributions to the challenging problems of collective classification, link prediction, network evolution, anonymization and preserving privacy in social and affiliation networks. We evaluate our models on real-world data sets from well-known online social networks, such as Flickr, Facebook, Dogster and LiveJournal

    Adaptive monitoring and control framework in Application Service Management environment

    Get PDF
    The economics of data centres and cloud computing services have pushed hardware and software requirements to the limits, leaving only very small performance overhead before systems get into saturation. For Application Service Management–ASM, this carries the growing risk of impacting the execution times of various processes. In order to deliver a stable service at times of great demand for computational power, enterprise data centres and cloud providers must implement fast and robust control mechanisms that are capable of adapting to changing operating conditions while satisfying service–level agreements. In ASM practice, there are normally two methods for dealing with increased load, namely increasing computational power or releasing load. The first approach typically involves allocating additional machines, which must be available, waiting idle, to deal with high demand situations. The second approach is implemented by terminating incoming actions that are less important to new activity demand patterns, throttling, or rescheduling jobs. Although most modern cloud platforms, or operating systems, do not allow adaptive/automatic termination of processes, tasks or actions, it is administrators’ common practice to manually end, or stop, tasks or actions at any level of the system, such as at the level of a node, function, or process, or kill a long session that is executing on a database server. In this context, adaptive control of actions termination remains a significantly underutilised subject of Application Service Management and deserves further consideration. For example, this approach may be eminently suitable for systems with harsh execution time Service Level Agreements, such as real–time systems, or systems running under conditions of hard pressure on power supplies, systems running under variable priority, or constraints set up by the green computing paradigm. Along this line of work, the thesis investigates the potential of dimension relevance and metrics signals decomposition as methods that would enable more efficient action termination. These methods are integrated in adaptive control emulators and actuators powered by neural networks that are used to adjust the operation of the system to better conditions in environments with established goals seen from both system performance and economics perspectives. The behaviour of the proposed control framework is evaluated using complex load and service agreements scenarios of systems compatible with the requirements of on–premises, elastic compute cloud deployments, server–less computing, and micro–services architectures

    EUROCOMB 21 Book of extended abstracts

    Get PDF

    ILP approaches to the blockmodel problem

    No full text
    corecore