Search CORE

206 research outputs found

Modern Interfaces for Knowledge Representation and Processing Systems Based on Markup Technologies

Author: Danciulescu Dana
Mohammed Saeed Ali Amer
Publication venue: Agora University Press
Publication date: 12/02/2018
Field of study

The usage of markup technologies to specify knowledge to be processed according to a specific field of application is a common technique. Representation techniques based on markup language paradigm to describe various types of knowledge including graph based models is considered and details on using Knowledge Representation and Processing (KRP) Systems in education are presented. XML, and VoiceXML were selected to implement smart interface for KRP systems

Agora University Editing House: Journals

Schema Languages & Internationalization Issues: A survey

Author: Lieske Christian
Sasaki Felix
Witt Andreas
Publication venue: Montreal : Extreme Markup Languages Conference
Publication date: 04/01/2016
Field of study

Many XML-related activities (e.g. the creation of a new schema) already address issues with different languages, scripts, and cultures. Nevertheless, a need exists for additional mechanisms and guidelines for more effective internationalization (i18n) and localization (l10n) in XML-related contents and processes. The W3C Internationalization Tag Set Working Group (W3C ITS WG) addresses this need and works on data categories, representation mechanisms and guidelines related to i18n and l10n support in the XML realm. This paper describes initial findings from the (W3C ITS WG). Furthermore, the paper discusses how these findings relate to specific schema languages, and complementary technologies like namespace sectioning, schema annotation and the description of processing chains. The paper exemplifies why certain requirements only can be met by a combination of technologies, and discusses these technologies

Publikationsserver des Instituts für Deutsche Sprache

Integrating deep and shallow natural language processing components : representations and hybrid architectures

Author: Schäfer Ulrich
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2006
Field of study

We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen für die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten für natürliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer Einführung in constraintbasierte Analyse natürlicher Sprache geben wir einen Überblick über typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir führen XML Standoff-Markup als zusätzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache für die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere Beiträge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und Unterstützung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, Kreativitätsunterstützung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage für semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform für weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert

Universaar

Acronym

XML-Based Automatic Test Data Generation

Author: Bakir Turgut
Bulbul Halil Ibrahim
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

Software engineering aims at increasing quality and reliability while decreasing the cost of the software. Testing is one of the most time-consuming phases of the software development lifecycle. Improvement in software testing results in decrease in cost and increase in quality of the software. Automation in software testing is one of the most popular ways of software cost reduction and reliability improvement. In our work we propose a system called XML-based automatic test data generation that generates the test data automatically according to the given data definition. We also proposed a test data definition language to describe the test data to be generated. This system reduces the testing time compared to manual test data generation and increases the testing reliability compared to the random test data generation by eliminating meaningless test data

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Web and Semantic Web Query Languages

Author: Bailey James
Bry François
Furche Tim
Schaffert Sebastian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

Open Access LMU

Regular Rooted Graph Grammars

Author: Berger Sacha
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 04/02/2008
Field of study

In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei Einschränkungen aus Entscheidbarkeits- oder Effizienzgründen auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nötig, durch Vergröberungen bei der Typprüfung erkauft. Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingeführt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf reguläre Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML üblichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF üblich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation Regulärer Ausdrücke---ist neu. Eine operationale Semantik für geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische Ausdrücke) basieren. Es wird ferner statische Typ-Prüfung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingeführt.This thesis investigates a pragmatic approach to typing, static analysis and static optimization of Web query languages, in special the Web query language Xcerpt. The approach is pragmatic in the sense, that no restriction on the types are made for decidability or efficiency reasons, instead precision is given up if necessary. Pragmatics on the dynamic side means to use types not only to ensure validity of objects operating on, but also influencing query selection based on types. A typing language for typing of graph structured data on the Web is introduced. The Graphs in mind are based on spanning trees with references, the typing languages is based on regular tree grammars with typed reference extensions. Beside ordered data in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or RDF) can be modelled using regular expressions under unordered interpretation – this approach is new. An operational semantics for ordered and unordered types is given based on specialized regular tree automata and counting constraints (them again based on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct terms is introduced, as well as optimization of Xcerpt query terms based on schema information

Digitale Hochschulschriften der LMU

Commonalities, differences and limitations of text analysis software: the results of a review

Author: Alexa Melina
Züll Cornelia
Publication venue: Mannheim
Publication date: 05/11/2010
Field of study

Das Arbeitspapier diskutiert einerseits einige Tendenzen in der Technologie der Softwareentwicklung von Textanalysen und zum anderen die Schwerpunkte, wo in diesem Bereich weitere Entwicklung und Forschung erforderlich ist. Die Basis der Diskussion bildet ein Review über 15 Softwareprogramme. Auf die folgenden Programme wird detaillierter eingegangen: AQUAD; Atlas.ti; COAN; code-A-Text; DICTION; Dimap-MCCA, HyperRESEAERCH; Keds; Nud*IST; QED; TATOE; Textpack; TextSmart; WinMAXpro und WordState. Die Autorinnen erörtern die Methodologie und die Auswahlkriterien, warum sie welche Programme für welche Forschungszwecken am tauglichsten halten. (pre)'This paper discusses on the one hand the tendencies in functionality and technology of software for text analysis and reflects, on the other hand, on the areas where more development is needed. The basis for this discussion forms a comprehensive review (Alexa & Zuell, in press) of fifteen currently available software for text analysis. In the review each software package, i.e. AQUAD, ATLAS.ti, CoAN, Code-A-Text, DICTION, DMIAP-MCCA, HyperRESEARCH, KEDS, NUD-IST, QED, TATOE, TEXTPACK, TextSmart, WinMAXpro, and WordStat, was presented in a detailed and extensive manner. In this paper we shall only delineate our methodology and criteria for selecting which programs to review and concentrate on discussing the types of support the selected programs offer, the commonalities and differences of their functionality, point to some of their shortcomings and put forward suggestions for future development.' (author's abstract)

SSOAR - Social Science Open Access Repository

Creating a readable language for checking XML

Author: Forsman Daniel
Publication venue: Lunds universitet/Institutionen för datavetenskap
Publication date: 01/01/2015
Field of study

Today sharing data is done everywhere. Doctors might want to share patient journal information. Patient journals may contain sensitive information that doctors do not want to share. The journals needs to be checked before they are shared. In this thesis, data and journals are coded in XML and checking journals and data is the same as validating XML. Validating XML documents is usually done by following rules from a validator. A validator processes XML documents and checks that the XML documents follows the validation rules. The issue with most validators today is that they cannot compare arbitrary elements in the XML document with each other and there are no mathematical operations to supply these comparisons. Sometimes there is a need to verify the validation rules. This might be done by someone who has little programming skills. The validator has to be readable so that this someone can verify that the validator matches the requirements. This thesis attempts to solve the issue with existing solutions by creating a readable language for validating XML documents. The solution is done in three steps: investigating similar solutions, implementing a validator, and testing the readability of the validator with a usability test.Idag delar man mycket information med varandra och ibland behöver vi se till att rätt sorts information delas. Tänk om man t. ex råkar skicka sitt personnummer istället för telefonnummer till någon? Detta examensarbetet handlar om att utveckla ett enkelt verktyg som bekräftar att information som delas är rätt formad. Eftersom det blir viktigare att vara säker på att information som delas är formad på rätt sätt så kommer fler människor att komma i kontakt med att bekräfta information innan de delar den. Det finns en del verktyg som kan bekräfta information åt dig men i vissa fall räcker inte dessa verktyg. Ett tillämpningsområde är sjukhusjournaler. En läkare kanske vill dela med sig av ett specifikt fall, i t. ex utbildningssyfte, genom att skicka en patientjournal till en kollega. En patientjournal innehåller mycket information om en patient som en läkare kanske inte kan eller vill dela, exempelvis patientens identitet. Alltså använder läkaren ett program genererat med verktyget från detta examensarbetet för att bekräfta att all privat (och annan potentiellt onödig) information inte finns med i journalen som läkaren tänker skicka. Företaget som examensarbetet utfördes på, Advenica, har ett testfall som kräver mer komplexa beräkningar än vad dagens verktyg klarar av. Om det är viktigt att information bekräftas på ett säkert sätt är det viktigt att någon ser på verktyget så att det gör det som den verkligen ska göra. Idag är de flesta verktygen svårläsliga vilket gör det svårt att förstå om det som verktyget gör är korrekt. Examensarbetet resulterade i ett verktyg som skapar program som bekräftar om information som ska delas är formad på rätt sätt. Verktyget används genom att någon, en programmerare med kunskap i ämnet, skriver regler som sen används för att generera ett program som bekräftar information. Reglerna är gjorda för att vara lättläsliga så människor utan programmeringsbakgrund kan förstå och kontrollera att reglerna är korrekt skrivna. De behöver inte skriva reglerna själva, det gör programmeraren. Programmen som genereras tar informationen och säger till användaren om informationen är formad på rätt sätt. Om informationen inte är formad på rätt sätt måste användaren ändra informationen tills programmet accepterar informationen. Verktyget som utvecklas har två egenskaper som inte andra verktyg har: komplexa matematiska beräkningar, när man bekräftar information, samt att reglerna som verktyget tar är lättläsliga. Med hjälp av matematiska beräkningar kan man se att informationen stämmer överens med mer komplexa krav. Exempelvis kan man beräkna sträckan mellan två koordinater på jorden med hjälp av de matematiska beräkningarna. Eftersom verktygets regler är lätta att kontrollera så är det enkelt att låta en utomstående person se på reglerna och säga om det är rätt regler för rätt syfte

Prosper : developing web applications strongly integrated with Prolog

Author: Hunyadi Levente
Publication venue
Publication date: 01/01/2008
Field of study

Separating presentation and application logic, defining presentation in a declarative way and automating recurring tasks are fundamental issues in rapid web application development. Albeit Prolog is widely employed in intelligent systems and knowledge discovery, creating a web interface for Prolog has been a cumbersome task producing poorly maintainable code, which hinders harnessing the power of Prolog in information systems. This paper presents a framework called Prosper that facilitates developing new or extending existing Prolog applications with a presentation front-end. The framework relies on Prolog to the greatest possible extent, supports code re-use, and integrates easily with web servers. As a result, Prosper simplifies the creation of complex, maintainable web applications running either independently or as part of a heterogeneous system without leaving the Prolog domain

University of Szeged