721 research outputs found
High level synthesis of RDF queries for graph analytics
In this paper we present a set of techniques that enable the synthesis of efficient custom accelerators for memory intensive, irregular applications. To address the challenges of irregular applications (large memory footprint, unpredictable fine-grained data accesses, and high synchronization intensity), and exploit their opportunities (thread level parallelism, memory level parallelism), we propose a novel accelerator design that employs an adaptive and Distributed Controller (DC) architecture, and a Memory Interface Controller (MIC) that supports concurrent and atomic memory operations on a multi-ported/multi-banked shared memory. Among the multitude of algorithms that may benefit from our solution, we focus on the acceleration of graph analytics applications and, in particular, on the synthesis of SPARQL queries on Resource Description Framework (RDF) databases. We achieve this objective by incorporating the synthesis techniques into Bambu, an Open Source high-level synthesis tools, and interfacing it with GEMS, the Graph database Engine for Multithreaded Systems. The GEMS' front-end generates optimized C implementations of the input queries, modeled as graph pattern matching algorithms, which are then automatically synthesized by Bambu. We validate our approach by synthesizing several SPARQL queries from the Lehigh University Benchmark (LUBM)
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Aspects of semantic ETL
Tesi en modalitat de cotutela: Universitat Politècnica de Catalunya i Aalborg UniversitetBusiness Intelligence tools support making better business decisions by analyzing available organizational data. Data Warehouses (DWs), typically structured with the Multidimensional (MD) model, are used to store data from different internal and external sources processed using Extract-Transformation-Load (ETL) processes. On-Line analytical Processing (OLAP) queries are applied on DWs to derive important business-critical knowledge. DW and OLAP technologies perform efficiently when they are applied on data that are static in nature and well organized in structure. Nowadays, Semantic Web technologies and the Linked Data principles inspire organizations to publish their semantic data, which allow machines to understand the meaning of data, using the Resource Description Framework (RDF) model. In addition to traditional (non-semantic) data sources, the incorporation of semantic data sources into a DW raises the additional challenges of schema derivation, semantic heterogeneity, and schema and data management model over traditional ETL tools. Furthermore, most SW data provided by business, academic and governmental organizations include facts and figures, which raise new requirements for BI tools to enable OLAP-like analyses over those semantic (RDF) data. In this thesis, we 1) propose a layer-based ETL framework for handling diverse semantic and non-semantic data sources by addressing the challenges mentioned above, 2) propose a set of high-level ETL constructs for processing semantic data, 3) implement appropriate environments (both programmable and GUI) to facilitate ETL processes and evaluate the proposed solutions. Our ETL framework is a semantic ETL framework because it integrates data semantically. We propose SETL, a unified framework for semantic ETL. The framework is divided into three layers: the Definition Layer, ETL Layer, and Data Warehouse Layer. In the Definition Layer, the semantic DW (SDW) schema, sources, and the mappings among the sources and the target are defined. In the ETL Layer, ETL processes to populate the SDW from sources are designed. The Data Warehouse Layer manages the storage of transformed semantic data. The framework supports the inclusion of semantic (RDF) data in DWs in addition to relational data. It allows users to define an ontology of a DW and annotate it with MD constructs (such as dimensions, cubes, levels, etc.) using the Data Cube for OLAP (QB4OLAP) vocabulary. It supports traditional transformation operations and provides a method to generate semantic data from the source data according to the semantics encoded in the ontology. It also provides a method to connect internal SDW data with external knowledge bases. On top of SETL, we propose SETLCONSTUCT where we define a set of high-level ETL tasks/operations to process semantic data sources. We divide the integration process into two layers: the Definition Layer and Execution Layer. The Definition Layer includes two tasks that allow DW designers to define target (SDW) schemas and the mappings between (intermediate) sources and the (intermediate) target. To create mappings among the sources and target constructs, we provide a mapping vocabulary called S2TMAP. Different from other ETL tools, we propose a new paradigm: we characterize the ETL flow transformations at the Definition Layer instead of independently within each ETL operation (in the Execution Layer). This way, the designer has an overall view of the process, which generates metadata (the mapping file) that the ETL operators will read and parametrize themselves with automatically. In the Execution Layer, we propose a set of high-level ETL operations to process semantic data sources. Finally, we develop a GUI-based semantic BI system SETLBI to define, process, integrate, and query semantic and non-semantic data. In addition to the Definition Layer and the ETL Layer, SETLBI has the OLAP Layer, which provides an interactive interface to enable OLAP analysis over the semantic DWLes eines d’Intel·ligència Empresarial (BI), conegudes en anglès com Business
Intelligence, donen suport a la millora de la presa de decisions empresarials
mitjançant l’anà lisi de les dades de l’organització disponibles. Els magatzems
de dades, o data warehouse, (DWs), tÃpicament estructurats seguint el model
Multidimensional (MD), s’utilitzen per emmagatzemar dades de diferents
fonts, tant internes com externes, processades mitjançant processos Extract-
Transformation-Load (ETL). Les consultes de processament analÃtic en lÃnia
(OLAP) s’apliquen als DW per extraure coneixement crÃtic en l’à mbit empresarial.
Els DW i les tecnologies OLAP funcionen de manera eficient quan
s’apliquen sobre dades de natura està tica i ben estructurades. Avui en dia,
les tecnologies de la Web Semà ntica (SW) i els principis Linked Data (LD) inspiren les organitzacions per publicar les seves dades en formats semà ntics,
que permeten que les mà quines entenguin el significat de les dades, mitjançant
el llenguatge de descripció de recursos (RDF). Una de les raons per
les quals les dades semà ntiques han tingut tant d’èxit és que es poden gestionar i fer que estiguin disponibles per tercers amb poc esforç, i no depenen d’esquemes de dades sofisticats.
A més de les fonts de dades tradicionals (no semà ntiques), la incorporació
de fonts de dades semà ntiques en un DW planteja reptes addicionals
tals com derivar-hi esquema, l’heterogeneïtat semà ntica i la representació de
l’esquema i les dades a través d’eines d’ETL. A més, la majoria de dades SW
proporcionades per empreses, organitzacions acadèmiques o governamentals
inclouen fets i figures que representen nous reptes per les eines de BI per tal
d’habilitar l’anà lisi OLAP sobre dades semà ntiques (RDF). En aquesta tesi, 1)
proposem un marc ETL basat en capes per a la gestió de diverses fonts de
dades semà ntiques i no semà ntiques i adreçant els reptes esmentats anteriorment, 2) proposem un conjunt d’operacions ETL per processar dades semà ntiques, i 3) la creació d’entorns apropiats de desenvolupament (programà tics i GUIs) per facilitar la creació i gestió de DW i processos ETL semà ntics, aixà com avaluar les solucions proposades. El nostre marc ETL és un marc ETL semà ntic perquè Es capaç de considerar e integrar dades de forma semà ntica.
Els següents parà grafs elaboren sobre aquests contribucions.
Proposem SETL, un marc unificat per a ETL semà ntic. El marc es divideix
en tres capes: la capa de definició, la capa ETL i la capa DW. A la
capa de definició, es defineixen l’esquema del DW semà ntic (SDW), les fonts
i els mappings entre les fonts i l’esquema del DW. A la capa ETL, es dissenyen
processos ETL per popular el SDW a partir de fonts. A la capa DW,
es gestiona l’emmagatzematge de les dades semà ntiques transformades. El
nostre marc dóna suport a la inclusió de dades semà ntiques (RDF) en DWs,
a més de dades relacionals. AixÃ, permet als usuaris definir una ontologia
d’un DW i anotar-la amb construccions MD (com ara dimensions, cubs, nivells,
etc.) utilitzant el vocabulari Data Cube for OLAP (QB4OLAP). També
admet operacions de transformació tradicionals i proporciona un mètode per
generar semà ntica de les dades d’origen segons la semà ntica codificada al
document ontologia. També proporciona un mètode per connectar l’SDW
amb bases de coneixement externes. Per tant, crea una base de coneixement,
composta per un ontologia i les seves instà ncies, on les dades estan
connectades semà nticament amb altres dades externes / internes. Per fer-ho,
desenvolupem un mètode programà tic, basat en Python, d’alt nivell, per
realitzar les tasques esmentades anteriorment. S’ha portat a terme un experiment
complet d’avaluació comparant SETL amb una solució elaborada amb
eines tradicional (que requereixen molta més codificació). Com a cas d’ús,
hem emprat el Danish Agricultural dataset, i els resultats mostren que SETL
proporciona un millor rendiment, millora la productivitat del programador i
la qualitat de la base de coneixement. La comparació entre SETL i Pentaho
Data Integration (PDI) mostra que SETL és un 13,5% més rà pid que PDI. A
més de ser més rà pid que PDI, tracta les dades semà ntiques com a ciutadans
de primera classe, mentre que PDI no conté operadors especÃfics per a dades
semà ntiques.
A sobre de SETL, proposem SETLCONSTUCT on definim un conjunt de
tasques d’alt nivell / operacions ETL per processar fonts de dades semà ntiques
i orientades a encapsular i facilitar la creació de l’ETL semà ntic. Dividim
el procés d’integració en dues capes: la capa de definició i la capa
d’execució. La capa de definició inclou dues tasques que permeten definir
als dissenyadors de DW esquemes destà (SDW) i mappings entre fonts (o resultats intermedis) i l’SDW (potencialment, altres resultats intermedis). Per
crear mappings entre les fonts i el SDW, proporcionem un vocabulari de mapping anomenat Source-To-Target Mapping (S2TMAP). A diferència d’altres
eines ETL, proposem un nou paradigma: les transformacions del flux ETL es
caracteritzen a la capa de definició, i no de forma independent dins de cada
operació ETL (a la capa d’execució). Aquest nou paradigma permet al dissenyador tenir una visió global del procés, que genera metadades (el fitxer de mapping) que els operadors ETL individuals llegiran i es parametritzaran automà ticament.
A la capa d’execució proposem un conjunt d’operacions ETL d’alt nivell per processar fonts de dades semà ntiques. A més de la neteja, la unió i la transformació per dades semà ntiques, proposem operacions per generar semà ntica multidimensional i actualitzar el SDW per reflectir els canvis
en les fonts. A més, ampliem SETLCONSTRUCT per permetre la generació
automà tica de flux d’execució ETL (l’anomenem SETLAUTO). Finalment, proporcionem una à mplia avaluació per comparar la productivitat, el temps de
desenvolupament i el rendiment de SETLCONSTRUCT i SETLAUTO amb el marc anterior SETL. L’avaluació demostra que SETLCONSTRUCT millora considerablement sobre SETL en termes de productivitat, temps de desenvolupament i rendiment. L’avaluació mostra que 1) SETLCONSTRUCT utilitza un 92% menys de carà cters mecanografiats (NOTC) que SETL, i SETLAUTO redueix encara més el nombre de conceptes usats (NOUC) un altre 25%; 2) utilitzant SETLCONSTRUCT, el temps de desenvolupament es redueix gairebé a la meitat en comparació amb SETL, i es redueix un altre 27 % mitjançant SETLAUTO; 3) SETLCONSTRUCT es escalable i té un rendiment similar en comparació amb SETL.
Finalment, desenvolupem un sistema de BI semà ntic basat en GUI SETLBI
per definir, processar, integrar i consultar dades semà ntiques i no semà ntiques.
A més de la capa de definició i de la capa ETL, SETLBI té una capa OLAP, que proporciona una interfÃcie interactiva per permetre l’anà lisi OLAP
d’autoservei sobre el DW semà ntic. Cada capa està composada per un conjunt
d’operacions / tasques. Per formalitzar les connexions intra i inter-capes
dels components de cada capa, emprem una ontologia. La capa ETL amplia
l’execució de la capa de SETLCONSTUCT afegint operacions per processar
fonts de dades no semà ntiques. Per últim, demostrem el sistema final mitjançant el cens de la població de Bangladesh (2011).
La solució final d’aquesta tesi és l’eina SETLBI . SETLBI facilita (1) als dissenyadors del DW amb pocs / sense coneixements de SW, integrar semà nticament les dades (semà ntiques o no) i analitzar-les emprant OLAP, i (2) als usuaris de la SW els permet definir vistes sobre dades semà ntiques, integrar-les amb fonts no semà ntiques, i visualitzar-les segons el model MD i fer anà lisi OLAP. A més, els usuaris SW poden enriquir l’esquema SDW generat amb construccions RDFS / OWL. Prenent aquest marc com a punt de partida, els investigadors poden emprar-lo per a crear SDWs de forma interactiva i automà tica. Aquest projecte crea un pont entre les tecnologies BI i SW, i obre la porta a altres oportunitats de recerca com desenvolupar tècniques de DW i ETL comprensibles per les mà quines.(Danskere) Business Intelligence (BI) værktøjer understøtter at tage bedre forretningsbeslutninger,
ved at analysere tilgængelige organisatoriske data. Data Warehouses
(DWs), typisk konstrueret med den Multidimensionelle (MD) model,
bruges til at lagre data fra forskellige interne og eksterne kilder, der behandles
ved hjælp af Extract-Transformation-Load (ETL) processer. On-Line
Analytical Processing (OLAP) forespørgsler anvendes på DWs for at udlede
vigtig forretningskritisk viden. DW og OLAP-teknologier fungerer effektivt,
når de anvendes på data, som er statiske af natur og velorganiseret i struktur.
I dag inspirerer Semantic Web (SW) teknologier og Linked Data (LD) principper
organisationer til at offentliggøre deres semantiske data, som tillader
maskiner at forstå betydningen af denne, ved hjælp af Resource Description
Framework (RDF) modellen. En af grundene til, at semantiske data er blevet
succesfuldt, er at styringen og udgivelsen af af dataene er nemt, og ikke er
afhængigt af et sofistikeret skema.
Ud over problemer ved overførslen af traditionelle (ikke-semantiske) databaser
til DWs, opstår yderligere udfordringer ved overførslen af semantiske
databaser, såsom skema nedarvning, semantisk heterogenitet samt skemaet
for data repræsentation over traditionelle ETL værktøjer. På den anden side
udgør en stor del af den semantiske data der bliver offentliggjort af virksomheder,
akademikere samt regeringer, af figurer og fakta, der igen giver
nye problemstillinger og krav til BI værktøjer, for at gøre OLAP lignende
analyser over de semantiske data mulige. I denne afhandling gør vi følgende:
1) foreslår et lag-baseret ETL framework til at håndterer multiple
semantiske og ikke-semantiske datakilder, ved at svare på udfordringerne
nævnt herover, 2) foreslår en mængde af ETL operationer til at behandle
semantisk data, 3) implementerer passende miljøer (både programmerbare
samt grafiske brugergrænseflader), for at lette ETL processer og evaluere den
foreslåede løsning. Vores ETL framework er et semantisk ETL framework,
fordi det integrerer data semantisk. Den følgende sektion forklarer vores
bidrag.
Vi foreslår SETL, et samlet framework for semantisk ETL. Frameworket
er splittet i tre lag: et definitions-lag, et ETL-lag, og et DW-lag. Det semanvii
tiske DW (SWD) skema, datakilder, samt sammenhængen mellem datakilder
og deres mål, er defineret i definitions-laget. I ETL-laget designes ETLprocesser
til at udfylde SDW fra datakilderne. DW-laget administrerer lagring
af transformerede semantiske data. Frameworket understøtter inkluderingen
af semantiske (RDF) data i DWs ud over relationelle data. Det giver
brugerne mulighed for at definere en ontologi for et DW og annotere med
MD-konstruktioner (såsom dimensioner, kuber, niveauer osv.) ved hjælp af
Data Cube til OLAP (QB4OLAP) ordforrådet. Det understøtter traditionelle
transformations operationer, og giver en metode til at generere semantiske
data fra de oprindelige data, i henhold til semantikken indkodet i ontologien.
Det muliggør også en metode til at forbinde interne SDW data med
eksterne vidensbaser. Herved skaber det en vidensbase, der er sammensat af
en ontologi og dets instanser, hvor data er semantisk forbundet med andre
eksterne / interne data. Vi udvikler et høj niveau Python-baseret programmerbart
framework for at udføre de ovennævnte opgaver. En omfattende
eksperimentel evaluering, der sammenligner SETL med en traditionel løsning
(hvilket krævede meget manuel kodning), om brugen af danske landbrugsog
forretnings datasæt, viser at SETL præsterer bedre, programmør produktivitet
og vidensbase kvalitet. Sammenligningen mellem SETL og Pentaho
Data Integration (PDI) ved behandling af en semantisk kilde viser, at SETL
er 13,5% hurtigere end PDI.
Udover SETL, foreslår vi SETLCONSTRUCT hvor vi definerer et sæt ETLoperationer
på højt niveau til behandling af semantiske datakilder. Vi deler
integrationsprocessen i to lag: Definitions-lag og eksekverings-lag. Definitionslaget
indeholder to opgaver, der giver DW designere muligheden for at definere
(SDW) skemaer, og kortlægningerne mellem kilder og målet. For
at oprette kortlægning mellem kilderne og målene, leverer vi et kortlægnings
ordforråd kaldet Source-to-Target Mapping (S2TMAP). Forskelligt fra
andre ETL-værktøjer foreslår vi et nyt paradigme: vi karakteriserer ETLflowtransformationerne
i definitions-laget i stedet for uafhængigt inden for
hver ETL-operation (i eksekverings-laget). På denne måde har designeren
et overblik over processen, som genererer metadata (kortlægningsfilen), som
ETL operatørerne vil læse og parametrisere automatisk. I eksekverings-laget
foreslår vi en mængde høj niveau ETL-operationer til at behandle semantiske
datakilder. Udover rensning, sammenføjning og datatypebaseret transformationer
af semantiske data, foreslår vi operationer til at generere multidimensionel
semantik på data-niveau og operationer til at opdatere et SDW for
at afspejle ændringer i kilde-dataen. Derudover udvider vi SETLCONSTRUCT
for at muliggøre automatisk ETL-eksekveringsstrømgenerering (vi kalder det
SETLAUTO). Endelig leverer vi en omfattende evaluering for at sammenligne
produktivitet, udviklingstid og ydeevne for scon og SETLAUTO med
den tidligere ramme SETL. Evalueringen viser, at SETLCONSTRUCT forbedres
markant i forhold til SETL med hensyn til produktivitet, udviklingstid og ydeevne. Evalueringen viser, at 1) SETLCONSTRUCT bruger 92% færre antal
indtastede tegn (NOTC) end SETL, og SETLAUTO reducerer antallet af brugte
begreber (NOUC) yderligere med 25%; 2) ved at bruge SETLCONSTRUCT, er
udviklingstiden næsten halveret sammenlignet med SETL, og skæres med
yderligere 27% ved hjælp af SETLAUTO; 3) SETLCONSTRUCT er skalerbar og
har lignende ydelse sammenlignet med SETL.
Til slut udvikler vi et GUI-baseret semantisk BI system SETLBI for at
definere, processere, integrere og lave forespørgsler på semantiske og ikkesemantiske
data. Ud over definitions-laget og ETL-laget, har SETLBI et
OLAP-lag, som giver en interaktiv grænseflade for at muliggøre selvbetjenings
OLAP analyser over det semantiske DW. Hvert lag er sammensat af en
mængde operationer/opgaver. Vi udarbejder en ontologi til at formalisere
intra-og ekstra-lags forbindelserne mellem komponenterne og lagene. ETLlaget
udvider eksekverings-laget af SETLCONSTUCT ved at tilføje operationer
til at behandle ikke-semantiske datakilder. Vi demonstrerer systemet ved
hjælp af Bangladesh population census 2011 datasættet.
Sammenfatningen af denne afhandling er BI-værktøjet SETLBI . SETLBI
fremmer (1) DW-designere med ringe / ingen SW-viden til semantisk at integrere
semantiske og / eller ikke-semantiske data og analysere det i OLAP
stil, og (2) SW brugere med grundlæggende MD-baggrund til at definere MDvisninger
over semantiske data, der aktiverer OLAP-lignende analyse. Derudover
kan SW-brugere berige det genererede SDW-skema med RDFS / OWLkonstruktioner.
Med udgangspunkt i frameworket som et grundlag kan
forskere sigte mod at udvikle yderligere interaktive og automatiske integrationsrammer
for SDW. Dette projekt bygger bro mellem de traditionelle BIteknologier
og SW-teknologier, som igen vil åbne døren for yderligere forskningsmuligheder
som at udvikle maskinforståelige ETL og lagerteknikker.Postprint (published version
Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web
The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very
instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that
they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our
technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet
the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the
services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge
management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The
combination of this expertise, and the time and space afforded the consortium by the
IRC structure, suggested the opportunity for a concerted effort to develop an approach
to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to
the knowledge management services AKT tries to provide. As a medium for the
semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the
provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different
applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing
ontologies to create a third). Ontology mapping, and the elimination of conflicts of
reference, will be important tasks. All of these issues are discussed along with our
proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices
that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which
semantic hygiene prevails interesting enough to reason in? These and many other
questions need to be addressed if we are to provide effective knowledge technologies
for our content on the web
Device Information Modeling in Automation - A Computer-Scientific Approach
This thesis presents an approach for device information modeling that is meant to ease the challenges of device manufacturers in the automation domain. The basis for this approach are semantic models of the application domain. The author discusses the challenges for integration in the automation domain and especially regarding field devices, device description languages and fieldbuses. A method for the generation of semantic models is presented and an approach is discussed that is meant to help the generation of device descriptions for different device description languages. The approach is then evaluated
py4DSTEM: a software package for multimodal analysis of four-dimensional scanning transmission electron microscopy datasets
Scanning transmission electron microscopy (STEM) allows for imaging,
diffraction, and spectroscopy of materials on length scales ranging from
microns to atoms. By using a high-speed, direct electron detector, it is now
possible to record a full 2D image of the diffracted electron beam at each
probe position, typically a 2D grid of probe positions. These 4D-STEM datasets
are rich in information, including signatures of the local structure,
orientation, deformation, electromagnetic fields and other sample-dependent
properties. However, extracting this information requires complex analysis
pipelines, from data wrangling to calibration to analysis to visualization, all
while maintaining robustness against imaging distortions and artifacts. In this
paper, we present py4DSTEM, an analysis toolkit for measuring material
properties from 4D-STEM datasets, written in the Python language and released
with an open source license. We describe the algorithmic steps for dataset
calibration and various 4D-STEM property measurements in detail, and present
results from several experimental datasets. We have also implemented a simple
and universal file format appropriate for electron microscopy data in py4DSTEM,
which uses the open source HDF5 standard. We hope this tool will benefit the
research community, helps to move the developing standards for data and
computational methods in electron microscopy, and invite the community to
contribute to this ongoing, fully open-source project
- …