224 research outputs found
Integrating prior knowledge into factorization approaches for relational learning
An efficient way to represent the domain knowledge is relational data, where information is recorded in form of relationships between entities. Relational data is becoming ubiquitous over the years for knowledge representation due to the fact that many real-word data is inherently interlinked. Some well-known examples of relational data are: the World Wide Web (WWW), a system of interlinked hypertext documents; the Linked Open Data (LOD) cloud of the Semantic Web, a collection of published data and their interlinks; and finally the Internet of Things (IoT), a network of physical objects with internal states and communications ability. Relational data has been addressed by many different machine learning approaches, the most promising ones are in the area of relational learning, which is the focus of this thesis. While conventional machine learning algorithms consider entities as being independent instances randomly sampled from some statistical distribution and being represented as data points in a vector space, relational learning takes into account the overall network environment when predicting the label of an entity, an attribute value of an entity or the existence of a relationship between entities. An important feature is that relational learning can exploit contextual information that is more distant in the relational network. As the volume and structural complexity of the relational data increase constantly in the era of Big Data, scalability and the modeling power become crucial for relational learning algorithms.
Previous relational learning algorithms either provide an intuitive representation of the model, such as Inductive Logic Programming (ILP) and Markov Logic Networks (MLNs), or assume a set of latent variables to explain the observed data, such as the Infinite Hidden Relational Model (IHRM), the Infinite Relational Model (IRM) and factorization approaches. Models with intuitive representations often involve some form of structure learning which leads to scalability problems due to a typically large search space. Factorizations are among the best-performing approaches for large-scale relational learning since the algebraic computations can easily be parallelized and since they can exploit data sparsity. Previous factorization approaches exploit only patterns in the relational data itself and the focus of the thesis is to investigate how additional prior information (comprehensive information), either in form of unstructured data (e.g., texts) or structured patterns (e.g., in form of rules) can be considered in the factorization approaches. The goal is to enhance the predictive power of factorization approaches by involving prior knowledge for the learning, and on the other hand to reduce the model complexity for efficient learning.
This thesis contains two main contributions:
The first contribution presents a general and novel framework for predicting relationships in multirelational data using a set of matrices describing the various instantiated relations in the network. The instantiated relations, derived or learnt from prior knowledge, are integrated as entities' attributes or entity-pairs' attributes into different adjacency matrices for the learning. All the information available is then combined in an additive way. Efficient learning is achieved using an alternating least squares approach exploiting sparse matrix algebra and low-rank approximation. As an illustration, several algorithms are proposed to include information extraction, deductive reasoning and contextual information in matrix factorizations for the Semantic Web scenario and for recommendation systems. Experiments on various data sets are conducted for each proposed algorithm to show the improvement in predictive power by combining matrix factorizations with prior knowledge in a modular way.
In contrast to a matrix, a 3-way tensor si a more natural representation for the multirelational data where entities are connected by different types of relations. A 3-way tensor is a three dimensional array which represents the multirelational data by using the first two dimensions for entities and using the third dimension for different types of relations. In the thesis, an analysis on the computational complexity of tensor models shows that the decomposition rank is key for the success of an efficient tensor decomposition algorithm, and that the factorization rank can be reduced by including observable patterns. Based on these theoretical considerations, a second contribution of this thesis develops a novel tensor decomposition approach - an Additive Relational Effects (ARE) model - which combines the strengths of factorization approaches and prior knowledge in an additive way to discover different relational effects from the relational data. As a result, ARE consists of a decomposition part which derives the strong relational leaning effects from a highly scalable tensor decomposition approach RESCAL and a Tucker 1 tensor which integrates the prior knowledge as instantiated relations. An efficient least squares approach is proposed to compute the combined model ARE. The additive model contains weights that reflect the degree of reliability of the prior knowledge, as evaluated by the data. Experiments on several benchmark data sets show that the inclusion of prior knowledge can lead to better performing models at a low tensor rank, with significant benefits for run-time and storage requirements. In particular, the results show that ARE outperforms state-of-the-art relational learning algorithms including intuitive models such as MRC, which is an approach based on Markov Logic with structure learning, factorization approaches such as Tucker, CP, Bayesian Clustered Tensor Factorization (BCTF), the Latent Factor Model (LFM), RESCAL, and other latent models such as the IRM. A final experiment on a Cora data set for paper topic classification shows the improvement of ARE over RESCAL in both predictive power and runtime performance, since ARE requires a significantly lower rank
Structural and Functional Changes in the Cerebellum in Sporadic Ataxias
Sporadic ataxia is a group of progressive neurodegenerative diseases that can be subdivided into two groups, sporadic adult onset ataxia (SAOA), and the cerebellar type of multiple system atrophy (MSA-C). In the first years after ataxia onset, a reliable distinction between MSA-C and SAOA is often not possible. In particular, some SAOA conditions may turn to MSA-C, and it is known that the conversion becomes very unlikely when a patient with SAOA has a duration of illness longer than 10 years. In this thesis, MSA-C vs. SAOA>10y (defined as SAOA patients with a disease duration longer than 10 years) were compared in an attempt to identify the essential difference between the two conditions. To this end, 16 patients with MSA-C, 13 patients with SAOA>10y and 49 healthy controls were included in this thesis. Chapter 1 first introduces the reader to the concept of sporadic ataxias and gives an overview of MSA-C and SAOA. This is followed by a review of the current state of knowledge of how neuroimaging technology aids understanding of sporadic ataxias. Chapter 2 outlines the general methodologies used for the presented studies. Chapter 3 contains four study results. In study 1, the structural changes of the cerebellum in two ataxia groups were examined to show abnormal gray matter volume in the bilateral anterior part and right posterior part of the cerebellum in both groups and an additional atrophy in vermis cerebellum in the SAOA>10y group. In study 2, the intracerebellar functional connectivity affected by local atrophy was investigated in the two sporadic ataxias by the amplitude of low-frequency fluctuation and degree centrality. An intact functional connectivity pattern was observed in the atrophic cerebellum in the MSA-C and SAOA groups, the atrophic cerebellum being characterized by high ALFF and high DC compared with non-atrophic cerebellum. In study 3, the topological features of the functional cerebellar network were assessed by graph theory analysis. It was found that a well-organized small-world network organization and intact global and regional properties existed in the functional cerebellar system in the ataxia groups. In study 4, the connectivity between different cerebellar parts and cerebral regions was explored, taking every functional cerebellar module as a region of interest. It was observed that the activities of cerebellar modules were positively correlated with the thalamus and negatively connected to the postcentral and precentral gyrus in the healthy group. When compared with the HC group, altered connections between the cerebellum and the visual cortex, the motor cortex and the frontal cortex were found in the MSA-C but not in the SAOA>10y groups. Chapter 4 contains a summary discussion of all four studies, as well as discussing the limitations of the current researches and offering an outlook on future research perspectives
Integrating prior knowledge into factorization approaches for relational learning
An efficient way to represent the domain knowledge is relational data, where information is recorded in form of relationships between entities. Relational data is becoming ubiquitous over the years for knowledge representation due to the fact that many real-word data is inherently interlinked. Some well-known examples of relational data are: the World Wide Web (WWW), a system of interlinked hypertext documents; the Linked Open Data (LOD) cloud of the Semantic Web, a collection of published data and their interlinks; and finally the Internet of Things (IoT), a network of physical objects with internal states and communications ability. Relational data has been addressed by many different machine learning approaches, the most promising ones are in the area of relational learning, which is the focus of this thesis. While conventional machine learning algorithms consider entities as being independent instances randomly sampled from some statistical distribution and being represented as data points in a vector space, relational learning takes into account the overall network environment when predicting the label of an entity, an attribute value of an entity or the existence of a relationship between entities. An important feature is that relational learning can exploit contextual information that is more distant in the relational network. As the volume and structural complexity of the relational data increase constantly in the era of Big Data, scalability and the modeling power become crucial for relational learning algorithms.
Previous relational learning algorithms either provide an intuitive representation of the model, such as Inductive Logic Programming (ILP) and Markov Logic Networks (MLNs), or assume a set of latent variables to explain the observed data, such as the Infinite Hidden Relational Model (IHRM), the Infinite Relational Model (IRM) and factorization approaches. Models with intuitive representations often involve some form of structure learning which leads to scalability problems due to a typically large search space. Factorizations are among the best-performing approaches for large-scale relational learning since the algebraic computations can easily be parallelized and since they can exploit data sparsity. Previous factorization approaches exploit only patterns in the relational data itself and the focus of the thesis is to investigate how additional prior information (comprehensive information), either in form of unstructured data (e.g., texts) or structured patterns (e.g., in form of rules) can be considered in the factorization approaches. The goal is to enhance the predictive power of factorization approaches by involving prior knowledge for the learning, and on the other hand to reduce the model complexity for efficient learning.
This thesis contains two main contributions:
The first contribution presents a general and novel framework for predicting relationships in multirelational data using a set of matrices describing the various instantiated relations in the network. The instantiated relations, derived or learnt from prior knowledge, are integrated as entities' attributes or entity-pairs' attributes into different adjacency matrices for the learning. All the information available is then combined in an additive way. Efficient learning is achieved using an alternating least squares approach exploiting sparse matrix algebra and low-rank approximation. As an illustration, several algorithms are proposed to include information extraction, deductive reasoning and contextual information in matrix factorizations for the Semantic Web scenario and for recommendation systems. Experiments on various data sets are conducted for each proposed algorithm to show the improvement in predictive power by combining matrix factorizations with prior knowledge in a modular way.
In contrast to a matrix, a 3-way tensor si a more natural representation for the multirelational data where entities are connected by different types of relations. A 3-way tensor is a three dimensional array which represents the multirelational data by using the first two dimensions for entities and using the third dimension for different types of relations. In the thesis, an analysis on the computational complexity of tensor models shows that the decomposition rank is key for the success of an efficient tensor decomposition algorithm, and that the factorization rank can be reduced by including observable patterns. Based on these theoretical considerations, a second contribution of this thesis develops a novel tensor decomposition approach - an Additive Relational Effects (ARE) model - which combines the strengths of factorization approaches and prior knowledge in an additive way to discover different relational effects from the relational data. As a result, ARE consists of a decomposition part which derives the strong relational leaning effects from a highly scalable tensor decomposition approach RESCAL and a Tucker 1 tensor which integrates the prior knowledge as instantiated relations. An efficient least squares approach is proposed to compute the combined model ARE. The additive model contains weights that reflect the degree of reliability of the prior knowledge, as evaluated by the data. Experiments on several benchmark data sets show that the inclusion of prior knowledge can lead to better performing models at a low tensor rank, with significant benefits for run-time and storage requirements. In particular, the results show that ARE outperforms state-of-the-art relational learning algorithms including intuitive models such as MRC, which is an approach based on Markov Logic with structure learning, factorization approaches such as Tucker, CP, Bayesian Clustered Tensor Factorization (BCTF), the Latent Factor Model (LFM), RESCAL, and other latent models such as the IRM. A final experiment on a Cora data set for paper topic classification shows the improvement of ARE over RESCAL in both predictive power and runtime performance, since ARE requires a significantly lower rank
OTIEA:Ontology-enhanced Triple Intrinsic-Correlation for Cross-lingual Entity Alignment
Cross-lingual and cross-domain knowledge alignment without sufficient
external resources is a fundamental and crucial task for fusing irregular data.
As the element-wise fusion process aiming to discover equivalent objects from
different knowledge graphs (KGs), entity alignment (EA) has been attracting
great interest from industry and academic research recent years. Most of
existing EA methods usually explore the correlation between entities and
relations through neighbor nodes, structural information and external
resources. However, the complex intrinsic interactions among triple elements
and role information are rarely modeled in these methods, which may lead to the
inadequate illustration for triple. In addition, external resources are usually
unavailable in some scenarios especially cross-lingual and cross-domain
applications, which reflects the little scalability of these methods. To tackle
the above insufficiency, a novel universal EA framework (OTIEA) based on
ontology pair and role enhancement mechanism via triple-aware attention is
proposed in this paper without introducing external resources. Specifically, an
ontology-enhanced triple encoder is designed via mining intrinsic correlations
and ontology pair information instead of independent elements. In addition, the
EA-oriented representations can be obtained in triple-aware entity decoder by
fusing role diversity. Finally, a bidirectional iterative alignment strategy is
deployed to expand seed entity pairs. The experimental results on three
real-world datasets show that our framework achieves a competitive performance
compared with baselines
Lurasidone hydroÂchloride
In the crystal structure of the title compound, C28H37N4O2S+·Cl− [systematic name: 4-(1,2-benzothiaÂzol-3-yl)-1-({2-[(3,5-dioxo-4-azaÂtricycloÂ[5.2.1.02,6]decan-4-yl)methÂyl]cycloÂhexÂyl}methÂyl)piperazin-1-ium chloride], the anions and cations are linked by N—H⋯Cl hydrogen bonds. The crystal structure is further stabilized by C—H⋯π and C—H⋯O interÂactions
Determination of 59 Illegally Added Drugs in Health Foods with Hypoglycemic, Hypolipidemic and Antihypertensive Activity by Ultra-high Performance Liquid Chromatography-Tandem Mass Spectrometry
An analytical method using ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) was established for the detection of 59 illegally added drugs in health foods with hypoglycemic, hypolipidemic and antihypertensive activity. Samples were extracted with methanol, and the extract was purified by the quick, easy, cheap, effective, rugged, and safe (QuEChERS) method and blown to dryness under nitrogen gas. The residue was dissolved in 1 mL of 40% (V/V) methanol aqueous solution. The chromatographic separation was performed using reverse-phase chromatography on an ACQUITY UPLC HSS T3 column through gradient elution using a mobile phase consisting of acetonitrile and 0.1% aqueous formic acid (containing 5 mmol/L ammonium acetate). The mass spectrometer was operated in both the positive and negative ion modes using multi-reaction monitoring (MRM), and quantitative analysis was performed using a matrix-matched external standard method. The results showed that the calibration curves for the 59 illegally added drugs were linear with coefficient of determination (R2) greater than 0.980. Recoveries ranged from 60.2% to 119.5%, and relative standard deviations (RSDs) ranged from 1.2% to 15.0%. This method is characterized by simple pre-treatment, short analysis time, good sensitivity, high accuracy, and low impurity interference, and can be used to the detection of multiple illegal drugs in health foods with hypoglycemic, hypolipidemic and antihypertensive activity
Tunable nonlinear optical bistability based on Dirac semimetal in photonic crystal Fabry-Perot cavity
In this paper, we study the nonlinear optical bistability (OB) in a
symmetrical multilayer structure. This structure is constructed by embedding a
nonlinear three-dimensional Dirac semimetal (3D DSM) into a solution filled
one-dimensional photonic crystal Fabry-Perot cavity. OB stems from the third
order nonlinear conductivity of 3D DSM and the local field of resonance mode
could enhance the nonlinearity and reduce the thresholds of OB. This structure
achieves the tunability of OB due to that the transmittance could be modulated
by the Fermi energy. OB threshold and threshold width could be remarkably
reduced by increasing the Fermi energy. Besides, it is found that the OB curve
depends heavily on the angle of incidence of the incoming light, the structural
parameters of the Fabry-Perot cavity, and the position of 3D DSM inside the
cavity. After parameter optimization, we obtained OB with a threshold of 106
V/m. We believe this simple structure provides a reference idea for realizing
low threshold and tunable all optical switching devices. Keywords: Optical
bistability, Dirac semimetal, Fabry-Perot cavity
- …