8,433 research outputs found

    Genomic prediction in plants: opportunities for ensemble machine learning based approaches [version 2; peer review: 1 approved, 2 approved with reservations]

    Get PDF
    Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners

    Gasificação direta de biomassa para produção de gás combustível

    Get PDF
    The excessive consumption of fossil fuels to satisfy the world necessities of energy and commodities led to the emission of large amounts of greenhouse gases in the last decades, contributing significantly to the greatest environmental threat of the 21st century: Climate Change. The answer to this man-made disaster is not simple and can only be made if distinct stakeholders and governments are brought to cooperate and work together. This is mandatory if we want to change our economy to one more sustainable and based in renewable materials, and whose energy is provided by the eternal nature energies (e.g., wind, solar). In this regard, biomass can have a main role as an adjustable and renewable feedstock that allows the replacement of fossil fuels in various applications, and the conversion by gasification allows the necessary flexibility for that purpose. In fact, fossil fuels are just biomass that underwent extreme pressures and heat for millions of years. Furthermore, biomass is a resource that, if not used or managed, increases wildfire risks. Consequently, we also have the obligation of valorizing and using this resource. In this work, it was obtained new scientific knowledge to support the development of direct (air) gasification of biomass in bubbling fluidized bed reactors to obtain a fuel gas with suitable properties to replace natural gas in industrial gas burners. This is the first step for the integration and development of gasification-based biorefineries, which will produce a diverse number of value-added products from biomass and compete with current petrochemical refineries in the future. In this regard, solutions for the improvement of the raw producer gas quality and process efficiency parameters were defined and analyzed. First, addition of superheated steam as primary measure allowed the increase of H2 concentration and H2/CO molar ratio in the producer gas without compromising the stability of the process. However, the measure mainly showed potential for the direct (air) gasification of high-density biomass (e.g., pellets), due to the necessity of having char accumulation in the reactor bottom bed for char-steam reforming reactions. Secondly, addition of refused derived fuel to the biomass feedstock led to enhanced gasification products, revealing itself as a highly promising strategy in terms of economic viability and environmental benefits of future gasification-based biorefineries, due to the high availability and low costs of wastes. Nevertheless, integrated techno economic and life cycle analyses must be performed to fully characterize the process. Thirdly, application of low-cost catalyst as primary measure revealed potential by allowing the improvement of the producer gas quality (e.g., H2 and CO concentration, lower heating value) and process efficiency parameters with distinct solid materials; particularly, the application of concrete, synthetic fayalite and wood pellets chars, showed promising results. Finally, the economic viability of the integration of direct (air) biomass gasification processes in the pulp and paper industry was also shown, despite still lacking interest to potential investors. In this context, the role of government policies and appropriate economic instruments are of major relevance to increase the implementation of these projects.O consumo excessivo de combustíveis fósseis para garantir as necessidades e interesses da sociedade conduziu à emissão de elevadas quantidades de gases com efeito de estufa nas últimas décadas, contribuindo significativamente para a maior ameaça ambiental do século XXI: Alterações Climáticas. A solução para este desastre de origem humana é de caráter complexo e só pode ser atingida através da cooperação de todos os governos e partes interessadas. Para isto, é obrigatória a criação de uma bioeconomia como base de um futuro mais sustentável, cujas necessidades energéticas e materiais sejam garantidas pelas eternas energias da natureza (e.g., vento, sol). Neste sentido, a biomassa pode ter um papel principal como uma matéria prima ajustável e renovável que permite a substituição de combustíveis fósseis num variado número de aplicações, e a sua conversão através da gasificação pode ser a chave para este propósito. Afinal, na prática, os combustíveis fósseis são apenas biomassa sujeita a elevada temperatura e pressão durante milhões de anos. Além do mais, a gestão eficaz da biomassa é fundamental para a redução dos riscos de incêndio florestal e, como tal, temos o dever de utilizar e valorizar este recurso. Neste trabalho, foi obtido novo conhecimento científico para suporte do desenvolvimento das tecnologias de gasificação direta (ar) de biomassa em leitos fluidizados borbulhantes para produção de gás combustível, com o objetivo da substituição de gás natural em queimadores industriais. Este é o primeiro passo para o desenvolvimento de biorrefinarias de gasificação, uma potencial futura indústria que irá providenciar um variado número de produtos de valor acrescentado através da biomassa e competir com a atual indústria petroquímica. Neste sentido, foram analisadas várias medidas para a melhoria da qualidade do gás produto bruto e dos parâmetros de eficiência do processo. Em primeiro, a adição de vapor sobreaquecido como medida primária permitiu o aumento da concentração de H2 e da razão molar H2/CO no gás produto sem comprometer a estabilidade do processo. No entanto, esta medida somente revelou potencial para a gasificação direta (ar) de biomassa de alta densidade (e.g., pellets) devido à necessidade da acumulação de carbonizados no leito do reator para a ocorrência de reações de reforma com vapor. Em segundo, a mistura de combustíveis derivados de resíduos e biomassa residual florestal permitiu a melhoria dos produtos de gasificação, constituindo desta forma uma estratégia bastante promissora a nível económico e ambiental, devido à elevada abundância e baixo custo dos resíduos urbanos. Contudo, devem ser efetuadas análises técnico-económicas e de ciclo de vida para a completa caraterização do processo. Em terceiro, a aplicação de catalisadores de baixo custo como medida primária demonstrou elevado potencial para a melhoria do gás produto (e.g., concentração de H2 e CO, poder calorífico inferior) e para o incremento dos parâmetros de eficiência do processo; em particular, a aplicação de betão, faialite sintética e carbonizados de pellets de madeira, demonstrou resultados promissores. Finalmente, foi demonstrada a viabilidade económica da integração do processo de gasificação direta (ar) de biomassa na indústria da pasta e papel, apesar dos parâmetros determinados não serem atrativos para potenciais investidores. Neste contexto, a intervenção dos governos e o desenvolvimento de instrumentos de apoio económico é de grande relevância para a implementação destes projetos.Este trabalho foi financiado pela The Navigator Company e por Fundos Nacionais através da Fundação para a Ciência e a Tecnologia (FCT).Programa Doutoral em Engenharia da Refinação, Petroquímica e Químic

    Cost-effective non-destructive testing of biomedical components fabricated using additive manufacturing

    Get PDF
    Biocompatible titanium-alloys can be used to fabricate patient-specific medical components using additive manufacturing (AM). These novel components have the potential to improve clinical outcomes in various medical scenarios. However, AM introduces stability and repeatability concerns, which are potential roadblocks for its widespread use in the medical sector. Micro-CT imaging for non-destructive testing (NDT) is an effective solution for post-manufacturing quality control of these components. Unfortunately, current micro-CT NDT scanners require expensive infrastructure and hardware, which translates into prohibitively expensive routine NDT. Furthermore, the limited dynamic-range of these scanners can cause severe image artifacts that may compromise the diagnostic value of the non-destructive test. Finally, the cone-beam geometry of these scanners makes them susceptible to the adverse effects of scattered radiation, which is another source of artifacts in micro-CT imaging. In this work, we describe the design, fabrication, and implementation of a dedicated, cost-effective micro-CT scanner for NDT of AM-fabricated biomedical components. Our scanner reduces the limitations of costly image-based NDT by optimizing the scanner\u27s geometry and the image acquisition hardware (i.e., X-ray source and detector). Additionally, we describe two novel techniques to reduce image artifacts caused by photon-starvation and scatter radiation in cone-beam micro-CT imaging. Our cost-effective scanner was designed to match the image requirements of medium-size titanium-alloy medical components. We optimized the image acquisition hardware by using an 80 kVp low-cost portable X-ray unit and developing a low-cost lens-coupled X-ray detector. Image artifacts caused by photon-starvation were reduced by implementing dual-exposure high-dynamic-range radiography. For scatter mitigation, we describe the design, manufacturing, and testing of a large-area, highly-focused, two-dimensional, anti-scatter grid. Our results demonstrate that cost-effective NDT using low-cost equipment is feasible for medium-sized, titanium-alloy, AM-fabricated medical components. Our proposed high-dynamic-range strategy improved by 37% the penetration capabilities of an 80 kVp micro-CT imaging system for a total x-ray path length of 19.8 mm. Finally, our novel anti-scatter grid provided a 65% improvement in CT number accuracy and a 48% improvement in low-contrast visualization. Our proposed cost-effective scanner and artifact reduction strategies have the potential to improve patient care by accelerating the widespread use of patient-specific, bio-compatible, AM-manufactured, medical components

    A Comparative Study on Students’ Learning Expectations of Entrepreneurship Education in the UK and China

    Get PDF
    Entrepreneurship education has become a critical subject in academic research and educational policy design, occupying a central role in contemporary education globally. However, a review of the literature indicates that research on entrepreneurship education is still in a relatively early stage. Little is known about how entrepreneurship education learning is affected by the environmental context to date. Therefore, combining the institutional context and focusing on students’ learning expectations as a novel perspective, the main aim of the thesis is to address the knowledge gap by developing an original conceptual framework to advance understanding of the dynamic learning process of entrepreneurship education through the lens of self-determination theory, thereby providing a basis for advancing understanding of entrepreneurship education. The author adopted an epistemological positivism philosophy and a deductive approach. This study gathered 247 valid questionnaires from the UK (84) and China (163). It requested students to recall their learning expectations before attending their entrepreneurship courses and to assess their perceptions of learning outcomes after taking the entrepreneurship courses. It was found that entrepreneurship education policy is an antecedent that influences students' learning expectations, which is represented in the difference in student autonomy. British students in active learning under a voluntary education policy have higher autonomy than Chinese students in passive learning under a compulsory education policy, thus having higher learning expectations, leading to higher satisfaction. The positive relationship between autonomy and learning expectations is established, which adds a new dimension to self-determination theory. Furthermore, it is also revealed that the change in students’ entrepreneurial intentions before and after their entrepreneurship courses is explained by understanding the process of a business start-up (positive), hands-on business start-up opportunities (positive), students’ actual input (positive) and tutors’ academic qualification (negative). The thesis makes contributions to both theory and practice. The findings have far reaching implications for different parties, including policymakers, educators, practitioners and researchers. Understanding and shaping students' learning expectations is a critical first step in optimising entrepreneurship education teaching and learning. On the one hand, understanding students' learning expectations of entrepreneurship and entrepreneurship education can help the government with educational interventions and policy reform, as well as improving the quality and delivery of university-based entrepreneurship education. On the other hand, entrepreneurship education can assist students in establishing correct and realistic learning expectations and entrepreneurial conceptions, which will benefit their future entrepreneurial activities and/or employment. An important implication is that this study connects multiple stakeholders by bridging the national-level institutional context, organisational-level university entrepreneurship education, and individual level entrepreneurial learning to promote student autonomy based on an understanding of students' learning expectations. This can help develop graduates with their ability for autonomous learning and autonomous entrepreneurial behaviour. The results of this study help to remind students that it is them, the learners, their expectations and input that can make the difference between the success or failure of their study. This would not only apply to entrepreneurship education but also to other fields of study. One key message from this study is that education can be encouraged and supported but cannot be “forced”. Mandatory entrepreneurship education is not a quick fix for the lack of university students’ innovation and entrepreneurship. More resources must be invested in enhancing the enterprise culture, thus making entrepreneurship education desirable for students

    Growth trends and site productivity in boreal forests under management and environmental change: insights from long-term surveys and experiments in Sweden

    Get PDF
    Under a changing climate, current tree and stand growth information is indispensable to the carbon sink strength of boreal forests. Important questions regarding tree growth are to what extent have management and environmental change influenced it, and how it might respond in the future. In this thesis, results from five studies (Papers I-V) covering growth trends, site productivity, heterogeneity in managed forests and potentials for carbon storage in forests and harvested wood products via differing management strategies are presented. The studies were based on observations from national forest inventories and long-term experiments in Sweden. The annual height growth of Scots pine (Pinus sylvestris) and Norway spruce (Picea abies) had increased, especially after the millennium shift, while the basal area growth remains stable during the last 40 years (Papers I-II). A positive response on height growth with increasing temperature was observed. The results generally imply a changing growing condition and stand composition. In Paper III, yield capacity of conifers was analysed and compared with existing functions. The results showed that there is a bias in site productivity estimates and the new functions give better prediction of the yield capacity in Sweden. In Paper IV, the variability in stand composition was modelled as indices of heterogeneity to calibrate the relationship between basal area and leaf area index in managed stands of Norway spruce and Scots pine. The results obtained show that the stand structural heterogeneity effects here are of such a magnitude that they cannot be neglected in the implementation of hybrid growth models, especially those based on light interception and light-use efficiency. In the long-term, the net climate benefits in Swedish forests may be maximized through active forest management with high harvest levels and efficient product utilization, compared to increasing carbon storage in standing forests through land set-asides for nature conservation (Paper V). In conclusion, this thesis offers support for the development of evidence-based policy recommendations for site-adapted and sustainable management of Swedish forests in a changing climate

    Foundations for programming and implementing effect handlers

    Get PDF
    First-class control operators provide programmers with an expressive and efficient means for manipulating control through reification of the current control state as a first-class object, enabling programmers to implement their own computational effects and control idioms as shareable libraries. Effect handlers provide a particularly structured approach to programming with first-class control by naming control reifying operations and separating from their handling. This thesis is composed of three strands of work in which I develop operational foundations for programming and implementing effect handlers as well as exploring the expressive power of effect handlers. The first strand develops a fine-grain call-by-value core calculus of a statically typed programming language with a structural notion of effect types, as opposed to the nominal notion of effect types that dominates the literature. With the structural approach, effects need not be declared before use. The usual safety properties of statically typed programming are retained by making crucial use of row polymorphism to build and track effect signatures. The calculus features three forms of handlers: deep, shallow, and parameterised. They each offer a different approach to manipulate the control state of programs. Traditional deep handlers are defined by folds over computation trees, and are the original con-struct proposed by Plotkin and Pretnar. Shallow handlers are defined by case splits (rather than folds) over computation trees. Parameterised handlers are deep handlers extended with a state value that is threaded through the folds over computation trees. To demonstrate the usefulness of effects and handlers as a practical programming abstraction I implement the essence of a small UNIX-style operating system complete with multi-user environment, time-sharing, and file I/O. The second strand studies continuation passing style (CPS) and abstract machine semantics, which are foundational techniques that admit a unified basis for implementing deep, shallow, and parameterised effect handlers in the same environment. The CPS translation is obtained through a series of refinements of a basic first-order CPS translation for a fine-grain call-by-value language into an untyped language. Each refinement moves toward a more intensional representation of continuations eventually arriving at the notion of generalised continuation, which admit simultaneous support for deep, shallow, and parameterised handlers. The initial refinement adds support for deep handlers by representing stacks of continuations and handlers as a curried sequence of arguments. The image of the resulting translation is not properly tail-recursive, meaning some function application terms do not appear in tail position. To rectify this the CPS translation is refined once more to obtain an uncurried representation of stacks of continuations and handlers. Finally, the translation is made higher-order in order to contract administrative redexes at translation time. The generalised continuation representation is used to construct an abstract machine that provide simultaneous support for deep, shallow, and parameterised effect handlers. kinds of effect handlers. The third strand explores the expressiveness of effect handlers. First, I show that deep, shallow, and parameterised notions of handlers are interdefinable by way of typed macro-expressiveness, which provides a syntactic notion of expressiveness that affirms the existence of encodings between handlers, but it provides no information about the computational content of the encodings. Second, using the semantic notion of expressiveness I show that for a class of programs a programming language with first-class control (e.g. effect handlers) admits asymptotically faster implementations than possible in a language without first-class control

    Synthesis and Characterisation of Low-cost Biopolymeric/mineral Composite Systems and Evaluation of their Potential Application for Heavy Metal Removal

    Get PDF
    Heavy metal pollution and waste management are two major environmental problems faced in the world today. Anthropogenic sources of heavy metals, especially effluent from industries, are serious environmental and health concerns by polluting surface and ground waters. Similarly, on a global scale, thousands of tonnes of industrial and agricultural waste are discarded into the environment annually. There are several conventional methods to treat industrial effluents, including reverse osmosis, oxidation, filtration, flotation, chemical precipitation, ion exchange resins and adsorption. Among them, adsorption and ion exchange are known to be effective mechanisms for removing heavy metal pollution, especially if low-cost materials can be used. This thesis was a study into materials that can be used to remove heavy metals from water using low-cost feedstock materials. The synthesis of low-cost composite matrices from agricultural and industrial by-products and low-cost organic and mineral sources was carried out. The feedstock materials being considered include chitosan (generated from industrial seafood waste), coir fibre (an agricultural by-product), spent coffee grounds (a by-product from coffee machines), hydroxyapatite (from bovine bone), and naturally sourced aluminosilicate minerals such as zeolite. The novel composite adsorbents were prepared using commercially sourced HAp and bovine sourced HAp, with two types of adsorbents being synthesized, including two- and three-component composites. Standard synthetic methods such as precipitation were developed to synthesize these materials, followed by characterization of their structural, physical, and chemical properties (by using FTIR, TGA, SEM, EDX and XRD). The synthesized materials were then evaluated for their ability to remove metal ions from solutions of heavy metals using single-metal ion type and two-metal ion type solution systems, using the model ion solutions, with quantification of their removal efficiency. It was followed by experimentation using the synthesized adsorbents for metal ion removal in complex systems such as an industrial input stream solution system obtained from a local timber treatment company. Two-component composites were considered as control composites to compare the removal efficiency of the three-component composites against. The heavy metal removal experiments were conducted under a range of experimental conditions (e.g., pH, sorbent dose, initial metal ion concentration, time of contact). Of the four metal ion systems considered in this study (Cd2+, Pb2+, Cu2+ and Cr as chromate ions), Pb2+ ion removal by the composites was found to be the highest in single-metal and two-metal ion type solution systems, while chromate ion removal was found to be the lowest. The bovine bone-based hydroxyapatite (bHAp) composites were more efficient at removing the metal cations than composites formed from a commercially sourced hydroxyapatite (cHAp). In industrial input stream solution systems (containing Cu, Cr and As), the Cu2+ ion removal was the highest, which aligned with the observations recorded in the single and two-metal ion type solution systems. Arsenate ion was removed to a higher extent than chromate ion using the three-component composites, while the removal of chromate ion was found to be higher than arsenate ion when using the two-component composites (i.e., the control system). The project also aimed to elucidate the removal mechanisms of these synthesized composite materials by using appropriate adsorption and kinetic models. The adsorption of metal ions exhibited a range of adsorption behaviours as both the models (Langmuir and Freundlich) were found to fit most of the data recorded in different adsorption systems studied. The pseudo-second-order model was found to be the best fitted to describe the kinetics of heavy metal ion adsorption in all the composite adsorbent systems studied, in single-metal ion type and two-metal ion type solution systems. The ion-exchange mechanism was considered as one of the dominant mechanisms for the removal of cations (in single-metal and two-metal ion type solution systems) and arsenate ions (in industrial input stream solution systems) along with other adsorption mechanisms. In contrast, electrostatic attractions were considered to be the dominant mechanism of removal for chromate ions

    The making of the NEAM Tsunami Hazard Model 2018 (NEAMTHM18)

    Get PDF
    The NEAM Tsunami Hazard Model 2018 (NEAMTHM18) is a probabilistic hazard model for tsunamis generated by earthquakes. It covers the coastlines of the North-eastern Atlantic, the Mediterranean, and connected seas (NEAM). NEAMTHM18 was designed as a three-phase project. The first two phases were dedicated to the model development and hazard calculations, following a formalized decision-making process based on a multiple-expert protocol. The third phase was dedicated to documentation and dissemination. The hazard assessment workflow was structured in Steps and Levels. There are four Steps: Step-1) probabilistic earthquake model; Step-2) tsunami generation and modeling in deep water; Step-3) shoaling and inundation; Step-4) hazard aggregation and uncertainty quantification. Each Step includes a different number of Levels. Level-0 always describes the input data; the other Levels describe the intermediate results needed to proceed from one Step to another. Alternative datasets and models were considered in the implementation. The epistemic hazard uncertainty was quantified through an ensemble modeling technique accounting for alternative models' weights and yielding a distribution of hazard curves represented by the mean and various percentiles. Hazard curves were calculated at 2,343 Points of Interest (POI) distributed at an average spacing of ∼20 km. Precalculated probability maps for five maximum inundation heights (MIH) and hazard intensity maps for five average return periods (ARP) were produced from hazard curves. In the entire NEAM Region, MIHs of several meters are rare but not impossible. Considering a 2% probability of exceedance in 50 years (ARP≈2,475 years), the POIs with MIH >5 m are fewer than 1% and are all in the Mediterranean on Libya, Egypt, Cyprus, and Greece coasts. In the North-East Atlantic, POIs with MIH >3 m are on the coasts of Mauritania and Gulf of Cadiz. Overall, 30% of the POIs have MIH >1 m. NEAMTHM18 results and documentation are available through the TSUMAPS-NEAM project website (http://www.tsumaps-neam.eu/), featuring an interactive web mapper. Although the NEAMTHM18 cannot substitute in-depth analyses at local scales, it represents the first action to start local and more detailed hazard and risk assessments and contributes to designing evacuation maps for tsunami early warning

    Machine learning for managing structured and semi-structured data

    Get PDF
    As the digitalization of private, commercial, and public sectors advances rapidly, an increasing amount of data is becoming available. In order to gain insights or knowledge from these enormous amounts of raw data, a deep analysis is essential. The immense volume requires highly automated processes with minimal manual interaction. In recent years, machine learning methods have taken on a central role in this task. In addition to the individual data points, their interrelationships often play a decisive role, e.g. whether two patients are related to each other or whether they are treated by the same physician. Hence, relational learning is an important branch of research, which studies how to harness this explicitly available structural information between different data points. Recently, graph neural networks have gained importance. These can be considered an extension of convolutional neural networks from regular grids to general (irregular) graphs. Knowledge graphs play an essential role in representing facts about entities in a machine-readable way. While great efforts are made to store as many facts as possible in these graphs, they often remain incomplete, i.e., true facts are missing. Manual verification and expansion of the graphs is becoming increasingly difficult due to the large volume of data and must therefore be assisted or substituted by automated procedures which predict missing facts. The field of knowledge graph completion can be roughly divided into two categories: Link Prediction and Entity Alignment. In Link Prediction, machine learning models are trained to predict unknown facts between entities based on the known facts. Entity Alignment aims at identifying shared entities between graphs in order to link several such knowledge graphs based on some provided seed alignment pairs. In this thesis, we present important advances in the field of knowledge graph completion. For Entity Alignment, we show how to reduce the number of required seed alignments while maintaining performance by novel active learning techniques. We also discuss the power of textual features and show that graph-neural-network-based methods have difficulties with noisy alignment data. For Link Prediction, we demonstrate how to improve the prediction for unknown entities at training time by exploiting additional metadata on individual statements, often available in modern graphs. Supported with results from a large-scale experimental study, we present an analysis of the effect of individual components of machine learning models, e.g., the interaction function or loss criterion, on the task of link prediction. We also introduce a software library that simplifies the implementation and study of such components and makes them accessible to a wide research community, ranging from relational learning researchers to applied fields, such as life sciences. Finally, we propose a novel metric for evaluating ranking results, as used for both completion tasks. It allows for easier interpretation and comparison, especially in cases with different numbers of ranking candidates, as encountered in the de-facto standard evaluation protocols for both tasks.Mit der rasant fortschreitenden Digitalisierung des privaten, kommerziellen und öffentlichen Sektors werden immer größere Datenmengen verfügbar. Um aus diesen enormen Mengen an Rohdaten Erkenntnisse oder Wissen zu gewinnen, ist eine tiefgehende Analyse unerlässlich. Das immense Volumen erfordert hochautomatisierte Prozesse mit minimaler manueller Interaktion. In den letzten Jahren haben Methoden des maschinellen Lernens eine zentrale Rolle bei dieser Aufgabe eingenommen. Neben den einzelnen Datenpunkten spielen oft auch deren Zusammenhänge eine entscheidende Rolle, z.B. ob zwei Patienten miteinander verwandt sind oder ob sie vom selben Arzt behandelt werden. Daher ist das relationale Lernen ein wichtiger Forschungszweig, der untersucht, wie diese explizit verfügbaren strukturellen Informationen zwischen verschiedenen Datenpunkten nutzbar gemacht werden können. In letzter Zeit haben Graph Neural Networks an Bedeutung gewonnen. Diese können als eine Erweiterung von CNNs von regelmäßigen Gittern auf allgemeine (unregelmäßige) Graphen betrachtet werden. Wissensgraphen spielen eine wesentliche Rolle bei der Darstellung von Fakten über Entitäten in maschinenlesbaren Form. Obwohl große Anstrengungen unternommen werden, so viele Fakten wie möglich in diesen Graphen zu speichern, bleiben sie oft unvollständig, d. h. es fehlen Fakten. Die manuelle Überprüfung und Erweiterung der Graphen wird aufgrund der großen Datenmengen immer schwieriger und muss daher durch automatisierte Verfahren unterstützt oder ersetzt werden, die fehlende Fakten vorhersagen. Das Gebiet der Wissensgraphenvervollständigung lässt sich grob in zwei Kategorien einteilen: Link Prediction und Entity Alignment. Bei der Link Prediction werden maschinelle Lernmodelle trainiert, um unbekannte Fakten zwischen Entitäten auf der Grundlage der bekannten Fakten vorherzusagen. Entity Alignment zielt darauf ab, gemeinsame Entitäten zwischen Graphen zu identifizieren, um mehrere solcher Wissensgraphen auf der Grundlage einiger vorgegebener Paare zu verknüpfen. In dieser Arbeit stellen wir wichtige Fortschritte auf dem Gebiet der Vervollständigung von Wissensgraphen vor. Für das Entity Alignment zeigen wir, wie die Anzahl der benötigten Paare reduziert werden kann, während die Leistung durch neuartige aktive Lerntechniken erhalten bleibt. Wir erörtern auch die Leistungsfähigkeit von Textmerkmalen und zeigen, dass auf Graph-Neural-Networks basierende Methoden Schwierigkeiten mit verrauschten Paar-Daten haben. Für die Link Prediction demonstrieren wir, wie die Vorhersage für unbekannte Entitäten zur Trainingszeit verbessert werden kann, indem zusätzliche Metadaten zu einzelnen Aussagen genutzt werden, die oft in modernen Graphen verfügbar sind. Gestützt auf Ergebnisse einer groß angelegten experimentellen Studie präsentieren wir eine Analyse der Auswirkungen einzelner Komponenten von Modellen des maschinellen Lernens, z. B. der Interaktionsfunktion oder des Verlustkriteriums, auf die Aufgabe der Link Prediction. Außerdem stellen wir eine Softwarebibliothek vor, die die Implementierung und Untersuchung solcher Komponenten vereinfacht und sie einer breiten Forschungsgemeinschaft zugänglich macht, die von Forschern im Bereich des relationalen Lernens bis hin zu angewandten Bereichen wie den Biowissenschaften reicht. Schließlich schlagen wir eine neuartige Metrik für die Bewertung von Ranking-Ergebnissen vor, wie sie für beide Aufgaben verwendet wird. Sie ermöglicht eine einfachere Interpretation und einen leichteren Vergleich, insbesondere in Fällen mit einer unterschiedlichen Anzahl von Kandidaten, wie sie in den de-facto Standardbewertungsprotokollen für beide Aufgaben vorkommen
    corecore