206 research outputs found

    Modern Interfaces for Knowledge Representation and Processing Systems Based on Markup Technologies

    Get PDF
    The usage of markup technologies to specify knowledge to be processed according to a specific field of application is a common technique. Representation techniques based on markup language paradigm to describe various types of knowledge including graph based models is considered and details on using Knowledge Representation and Processing (KRP) Systems in education are presented. XML, and VoiceXML were selected to implement smart interface for KRP systems

    Schema Languages & Internationalization Issues: A survey

    Get PDF
    Many XML-related activities (e.g. the creation of a new schema) already address issues with different languages, scripts, and cultures. Nevertheless, a need exists for additional mechanisms and guidelines for more effective internationalization (i18n) and localization (l10n) in XML-related contents and processes. The W3C Internationalization Tag Set Working Group (W3C ITS WG) addresses this need and works on data categories, representation mechanisms and guidelines related to i18n and l10n support in the XML realm. This paper describes initial findings from the (W3C ITS WG). Furthermore, the paper discusses how these findings relate to specific schema languages, and complementary technologies like namespace sectioning, schema annotation and the description of processing chains. The paper exemplifies why certain requirements only can be met by a combination of technologies, and discusses these technologies

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen fĂŒr die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten fĂŒr natĂŒrliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer EinfĂŒhrung in constraintbasierte Analyse natĂŒrlicher Sprache geben wir einen Überblick ĂŒber typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir fĂŒhren XML Standoff-Markup als zusĂ€tzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache fĂŒr die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere BeitrĂ€ge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und UnterstĂŒtzung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, KreativitĂ€tsunterstĂŒtzung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage fĂŒr semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform fĂŒr weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert

    XML-Based Automatic Test Data Generation

    Get PDF
    Software engineering aims at increasing quality and reliability while decreasing the cost of the software. Testing is one of the most time-consuming phases of the software development lifecycle. Improvement in software testing results in decrease in cost and increase in quality of the software. Automation in software testing is one of the most popular ways of software cost reduction and reliability improvement. In our work we propose a system called XML-based automatic test data generation that generates the test data automatically according to the given data definition. We also proposed a test data definition language to describe the test data to be generated. This system reduces the testing time compared to manual test data generation and increases the testing reliability compared to the random test data generation by eliminating meaningless test data

    Web and Semantic Web Query Languages

    Get PDF
    A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

    Regular Rooted Graph Grammars

    Get PDF
    In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei EinschrĂ€nkungen aus Entscheidbarkeits- oder EffizienzgrĂŒnden auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nötig, durch Vergröberungen bei der TypprĂŒfung erkauft. Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingefĂŒhrt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf regulĂ€re Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML ĂŒblichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF ĂŒblich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation RegulĂ€rer AusdrĂŒcke---ist neu. Eine operationale Semantik fĂŒr geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische AusdrĂŒcke) basieren. Es wird ferner statische Typ-PrĂŒfung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingefĂŒhrt.This thesis investigates a pragmatic approach to typing, static analysis and static optimization of Web query languages, in special the Web query language Xcerpt. The approach is pragmatic in the sense, that no restriction on the types are made for decidability or efficiency reasons, instead precision is given up if necessary. Pragmatics on the dynamic side means to use types not only to ensure validity of objects operating on, but also influencing query selection based on types. A typing language for typing of graph structured data on the Web is introduced. The Graphs in mind are based on spanning trees with references, the typing languages is based on regular tree grammars with typed reference extensions. Beside ordered data in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or RDF) can be modelled using regular expressions under unordered interpretation – this approach is new. An operational semantics for ordered and unordered types is given based on specialized regular tree automata and counting constraints (them again based on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct terms is introduced, as well as optimization of Xcerpt query terms based on schema information

    Commonalities, differences and limitations of text analysis software: the results of a review

    Full text link
    Das Arbeitspapier diskutiert einerseits einige Tendenzen in der Technologie der Softwareentwicklung von Textanalysen und zum anderen die Schwerpunkte, wo in diesem Bereich weitere Entwicklung und Forschung erforderlich ist. Die Basis der Diskussion bildet ein Review ĂŒber 15 Softwareprogramme. Auf die folgenden Programme wird detaillierter eingegangen: AQUAD; Atlas.ti; COAN; code-A-Text; DICTION; Dimap-MCCA, HyperRESEAERCH; Keds; Nud*IST; QED; TATOE; Textpack; TextSmart; WinMAXpro und WordState. Die Autorinnen erörtern die Methodologie und die Auswahlkriterien, warum sie welche Programme fĂŒr welche Forschungszwecken am tauglichsten halten. (pre)'This paper discusses on the one hand the tendencies in functionality and technology of software for text analysis and reflects, on the other hand, on the areas where more development is needed. The basis for this discussion forms a comprehensive review (Alexa & Zuell, in press) of fifteen currently available software for text analysis. In the review each software package, i.e. AQUAD, ATLAS.ti, CoAN, Code-A-Text, DICTION, DMIAP-MCCA, HyperRESEARCH, KEDS, NUD-IST, QED, TATOE, TEXTPACK, TextSmart, WinMAXpro, and WordStat, was presented in a detailed and extensive manner. In this paper we shall only delineate our methodology and criteria for selecting which programs to review and concentrate on discussing the types of support the selected programs offer, the commonalities and differences of their functionality, point to some of their shortcomings and put forward suggestions for future development.' (author's abstract)

    Creating a readable language for checking XML

    Get PDF
    Today sharing data is done everywhere. Doctors might want to share patient journal information. Patient journals may contain sensitive information that doctors do not want to share. The journals needs to be checked before they are shared. In this thesis, data and journals are coded in XML and checking journals and data is the same as validating XML. Validating XML documents is usually done by following rules from a validator. A validator processes XML documents and checks that the XML documents follows the validation rules. The issue with most validators today is that they cannot compare arbitrary elements in the XML document with each other and there are no mathematical operations to supply these comparisons. Sometimes there is a need to verify the validation rules. This might be done by someone who has little programming skills. The validator has to be readable so that this someone can verify that the validator matches the requirements. This thesis attempts to solve the issue with existing solutions by creating a readable language for validating XML documents. The solution is done in three steps: investigating similar solutions, implementing a validator, and testing the readability of the validator with a usability test.Idag delar man mycket information med varandra och ibland behöver vi se till att rÀtt sorts information delas. TÀnk om man t. ex rÄkar skicka sitt personnummer istÀllet för telefonnummer till nÄgon? Detta examensarbetet handlar om att utveckla ett enkelt verktyg som bekrÀftar att information som delas Àr rÀtt formad. Eftersom det blir viktigare att vara sÀker pÄ att information som delas Àr formad pÄ rÀtt sÀtt sÄ kommer fler mÀnniskor att komma i kontakt med att bekrÀfta information innan de delar den. Det finns en del verktyg som kan bekrÀfta information Ät dig men i vissa fall rÀcker inte dessa verktyg. Ett tillÀmpningsomrÄde Àr sjukhusjournaler. En lÀkare kanske vill dela med sig av ett specifikt fall, i t. ex utbildningssyfte, genom att skicka en patientjournal till en kollega. En patientjournal innehÄller mycket information om en patient som en lÀkare kanske inte kan eller vill dela, exempelvis patientens identitet. AlltsÄ anvÀnder lÀkaren ett program genererat med verktyget frÄn detta examensarbetet för att bekrÀfta att all privat (och annan potentiellt onödig) information inte finns med i journalen som lÀkaren tÀnker skicka. Företaget som examensarbetet utfördes pÄ, Advenica, har ett testfall som krÀver mer komplexa berÀkningar Àn vad dagens verktyg klarar av. Om det Àr viktigt att information bekrÀftas pÄ ett sÀkert sÀtt Àr det viktigt att nÄgon ser pÄ verktyget sÄ att det gör det som den verkligen ska göra. Idag Àr de flesta verktygen svÄrlÀsliga vilket gör det svÄrt att förstÄ om det som verktyget gör Àr korrekt. Examensarbetet resulterade i ett verktyg som skapar program som bekrÀftar om information som ska delas Àr formad pÄ rÀtt sÀtt. Verktyget anvÀnds genom att nÄgon, en programmerare med kunskap i Àmnet, skriver regler som sen anvÀnds för att generera ett program som bekrÀftar information. Reglerna Àr gjorda för att vara lÀttlÀsliga sÄ mÀnniskor utan programmeringsbakgrund kan förstÄ och kontrollera att reglerna Àr korrekt skrivna. De behöver inte skriva reglerna sjÀlva, det gör programmeraren. Programmen som genereras tar informationen och sÀger till anvÀndaren om informationen Àr formad pÄ rÀtt sÀtt. Om informationen inte Àr formad pÄ rÀtt sÀtt mÄste anvÀndaren Àndra informationen tills programmet accepterar informationen. Verktyget som utvecklas har tvÄ egenskaper som inte andra verktyg har: komplexa matematiska berÀkningar, nÀr man bekrÀftar information, samt att reglerna som verktyget tar Àr lÀttlÀsliga. Med hjÀlp av matematiska berÀkningar kan man se att informationen stÀmmer överens med mer komplexa krav. Exempelvis kan man berÀkna strÀckan mellan tvÄ koordinater pÄ jorden med hjÀlp av de matematiska berÀkningarna. Eftersom verktygets regler Àr lÀtta att kontrollera sÄ Àr det enkelt att lÄta en utomstÄende person se pÄ reglerna och sÀga om det Àr rÀtt regler för rÀtt syfte

    Prosper : developing web applications strongly integrated with Prolog

    Get PDF
    Separating presentation and application logic, defining presentation in a declarative way and automating recurring tasks are fundamental issues in rapid web application development. Albeit Prolog is widely employed in intelligent systems and knowledge discovery, creating a web interface for Prolog has been a cumbersome task producing poorly maintainable code, which hinders harnessing the power of Prolog in information systems. This paper presents a framework called Prosper that facilitates developing new or extending existing Prolog applications with a presentation front-end. The framework relies on Prolog to the greatest possible extent, supports code re-use, and integrates easily with web servers. As a result, Prosper simplifies the creation of complex, maintainable web applications running either independently or as part of a heterogeneous system without leaving the Prolog domain
    • 

    corecore