9 research outputs found

    10. Interuniversitäres Doktorandenseminar Wirtschaftsinformatik Juli 2009

    Get PDF
    Begonnen im Jahr 2000, ist das Interuniversitäre Wirtschaftsinformatik-Doktorandenseminar mittlerweile zu einer schönen Tradition geworden. Zunächst unter Beteiligung der Universitäten Leipzig und Halle-Wittenberg gestartet. Seit 2003 wird das Seminar zusammen mit der Jenaer Universität durchgeführt, in diesem Jahr sind erstmals auch die Technische Universität Dresden und die TU Bergakademie Freiberg dabei. Ziel der Interuniversitären Doktorandenseminare ist der über die eigenen Institutsgrenzen hinausgehende Gedankenaustausch zu aktuellen, in Promotionsprojekten behandelten Forschungsthemen. Indem der Schwerpunkt der Vorträge auch auf das Forschungsdesign gelegt wird, bietet sich allen Doktoranden die Möglichkeit, bereits in einer frühen Phase ihrer Arbeit wichtige Hinweise und Anregungen aus einem breiten Hörerspektrum zu bekommen. In den vorliegenden Research Papers sind elf Beiträge zum diesjährigen Doktorandenseminar in Jena enthalten. Sie stecken ein weites Feld ab - vom Data Mining und Wissensmanagement über die Unterstützung von Prozessen in Unternehmen bis hin zur RFID-Technologie. Die Wirtschaftsinformatik als typische Bindestrich-Informatik hat den Ruf einer thematischen Breite. Die Dissertationsprojekte aus fünf Universitäten belegen dies eindrucksvoll.

    Scalable statistical learning for relation prediction on structured data

    Get PDF
    Relation prediction seeks to predict unknown but potentially true relations by revealing missing relations in available data, by predicting future events based on historical data, and by making predicted relations retrievable by query. The approach developed in this thesis can be used for a wide variety of purposes, including to predict likely new friends on social networks, attractive points of interest for an individual visiting an unfamiliar city, and associations between genes and particular diseases. In recent years, relation prediction has attracted significant interest in both research and application domains, partially due to the increasing volume of published structured data and background knowledge. In the Linked Open Data initiative of the Semantic Web, for instance, entities are uniquely identified such that the published information can be integrated into applications and services, and the rapid increase in the availability of such structured data creates excellent opportunities as well as challenges for relation prediction. This thesis focuses on the prediction of potential relations by exploiting regularities in data using statistical relational learning algorithms and applying these methods to relational knowledge bases, in particular in Linked Open Data in particular. We review representative statistical relational learning approaches, e.g., Inductive Logic Programming and Probabilistic Relational Models. While logic-based reasoning can infer and include new relations via deduction by using ontologies, machine learning can be exploited to predict new relations (with some degree of certainty) via induction, purely based on the data. Because the application of machine learning approaches to relation prediction usually requires handling large datasets, we also discuss the scalability of machine learning as a solution to relation prediction, as well as the significant challenge posed by incomplete relational data (such as social network data, which is often much more extensive for some users than others). The main contribution of this thesis is to develop a learning framework called the Statistical Unit Node Set (SUNS) and to propose a multivariate prediction approach used in the framework. We argue that multivariate prediction approaches are most suitable for dealing with large, sparse data matrices. According to the characteristics and intended application of the data, the approach can be extended in different ways. We discuss and test two extensions of the approach--kernelization and a probabilistic method of handling complex n-ary relationships--in empirical studies based on real-world data sets. Additionally, this thesis contributes to the field of relation prediction by applying the SUNS framework to various domains. We focus on three applications: 1. In social network analysis, we present a combined approach of inductive and deductive reasoning for recommending movies to users. 2. In the life sciences, we address the disease gene prioritization problem. 3. In the recommendation system, we describe and investigate the back-end of a mobile app called BOTTARI, which provides personalized location-based recommendations of restaurants.Die Beziehungsvorhersage strebt an, unbekannte aber potenziell wahre Beziehungen vorherzusagen, indem fehlende Relationen in verfügbaren Daten aufgedeckt, zukünftige Ereignisse auf der Grundlage historischer Daten prognostiziert und vorhergesagte Relationen durch Anfragen abrufbar gemacht werden. Der in dieser Arbeit entwickelte Ansatz lässt sich für eine Vielzahl von Zwecken einschließlich der Vorhersage wahrscheinlicher neuer Freunde in sozialen Netzen, der Empfehlung attraktiver Sehenswürdigkeiten für Touristen in fremden Städten und der Priorisierung möglicher Assoziationen zwischen Genen und bestimmten Krankheiten, verwenden. In den letzten Jahren hat die Beziehungsvorhersage sowohl in Forschungs- als auch in Anwendungsbereichen eine enorme Aufmerksamkeit erregt, aufgrund des Zuwachses veröffentlichter strukturierter Daten und von Hintergrundwissen. In der Linked Open Data-Initiative des Semantischen Web werden beispielsweise Entitäten eindeutig identifiziert, sodass die veröffentlichten Informationen in Anwendungen und Dienste integriert werden können. Diese rapide Erhöhung der Verfügbarkeit strukturierter Daten bietet hervorragende Gelegenheiten sowie Herausforderungen für die Beziehungsvorhersage. Diese Arbeit fokussiert sich auf die Vorhersage potenzieller Beziehungen durch Ausnutzung von Regelmäßigkeiten in Daten unter der Verwendung statistischer relationaler Lernalgorithmen und durch Einsatz dieser Methoden in relationale Wissensbasen, insbesondere in den Linked Open Daten. Wir geben einen Überblick über repräsentative statistische relationale Lernansätze, z.B. die Induktive Logikprogrammierung und Probabilistische Relationale Modelle. Während das logikbasierte Reasoning neue Beziehungen unter der Nutzung von Ontologien ableiten und diese einbeziehen kann, kann maschinelles Lernen neue Beziehungen (mit gewisser Wahrscheinlichkeit) durch Induktion ausschließlich auf der Basis der vorliegenden Daten vorhersagen. Da die Verarbeitung von massiven Datenmengen in der Regel erforderlich ist, wenn maschinelle Lernmethoden in die Beziehungsvorhersage eingesetzt werden, diskutieren wir auch die Skalierbarkeit des maschinellen Lernens sowie die erhebliche Herausforderung, die sich aus unvollständigen relationalen Daten ergibt (z. B. Daten aus sozialen Netzen, die oft für manche Benutzer wesentlich umfangreicher sind als für Anderen). Der Hauptbeitrag der vorliegenden Arbeit besteht darin, ein Lernframework namens Statistical Unit Node Set (SUNS) zu entwickeln und einen im Framework angewendeten multivariaten Prädiktionsansatz einzubringen. Wir argumentieren, dass multivariate Vorhersageansätze am besten für die Bearbeitung von großen und dünnbesetzten Datenmatrizen geeignet sind. Je nach den Eigenschaften und der beabsichtigten Anwendung der Daten kann der Ansatz auf verschiedene Weise erweitert werden. In empirischen Studien werden zwei Erweiterungen des Ansatzes--ein kernelisierter Ansatz sowie ein probabilistischer Ansatz zur Behandlung komplexer n-stelliger Beziehungen-- diskutiert und auf realen Datensätzen untersucht. Ein weiterer Beitrag dieser Arbeit ist die Anwendung des SUNS Frameworks auf verschiedene Bereiche. Wir konzentrieren uns auf drei Anwendungen: 1. In der Analyse sozialer Netze stellen wir einen kombinierten Ansatz von induktivem und deduktivem Reasoning vor, um Benutzern Filme zu empfehlen. 2. In den Biowissenschaften befassen wir uns mit dem Problem der Priorisierung von Krankheitsgenen. 3. In den Empfehlungssystemen beschreiben und untersuchen wir das Backend einer mobilen App "BOTTARI", das personalisierte ortsbezogene Empfehlungen von Restaurants bietet

    An ontology-based approach toward the configuration of heterogeneous network devices

    Get PDF
    Despite the numerous efforts of standardization, semantic issues remain in effect in many subfields of networking. The inability to exchange data unambiguously between information systems and human resources is an issue that hinders technology implementation, semantic interoperability, service deployment, network management, technology migration, among many others. In this thesis, we will approach the semantic issues in two critical subfields of networking, namely, network configuration management and network addressing architectures. The fact that makes the study in these areas rather appealing is that in both scenarios semantic issues have been around from the very early days of networking. However, as networks continue to grow in size and complexity current practices are becoming neither scalable nor practical. One of the most complex and essential tasks in network management is the configuration of network devices. The lack of comprehensive and standard means for modifying and controlling the configuration of network elements has led to the continuous and extended use of proprietary Command Line Interfaces (CLIs). Unfortunately, CLIs are generally both, device and vendor-specific. In the context of heterogeneous network infrastructures---i.e., networks typically composed of multiple devices from different vendors---the use of several CLIs raises serious Operation, Administration and Management (OAM) issues. Accordingly, network administrators are forced to gain specialized expertise and to continuously keep knowledge and skills up to date as new features, system upgrades or technologies appear. Overall, the utilization of proprietary mechanisms allows neither sharing knowledge consistently between vendors' domains nor reusing configurations to achieve full automation of network configuration tasks---which are typically required in autonomic management. Due to this heterogeneity, CLIs typically provide a help feature which is in turn an useful source of knowledge to enable semantic interpretation of a vendor's configuration space. The large amount of information a network administrator must learn and manage makes Information Extraction (IE) and other forms of natural language analysis of the Artificial Intelligence (AI) field key enablers for the network device configuration space. This thesis presents the design and implementation specification of the first Ontology-Based Information Extraction (OBIE) System from the CLI of network devices for the automation and abstraction of device configurations. Moreover, the so-called semantic overload of IP addresses---wherein addresses are both identifiers and locators of a node at the same time---is one of the main constraints over mobility of network hosts, multi-homing and scalability of the routing system. In light of this, numerous approaches have emerged in an effort to decouple the semantics of the network addressing scheme. In this thesis, we approach this issue from two perspectives, namely, a non-disruptive (i.e., evolutionary) solution to the current Internet and a clean-slate approach for Future Internet. In the first scenario, we analyze the Locator/Identifier Separation Protocol (LISP) as it is currently one of the strongest solutions to the semantic overload issue. However, its adoption is hindered by existing problems in the proposed mapping systems. Herein, we propose the LISP Redundancy Protocol (LRP) aimed to complement the LISP framework and strengthen feasibility of deployment, while at the same time, minimize mapping table size, latency time and maximize reachability in the network. In the second scenario, we explore TARIFA a Next Generation Internet architecture and introduce a novel service-centric addressing scheme which aims to overcome the issues related to routing and semantic overload of IP addresses.A pesar de los numerosos esfuerzos de estandarización, los problemas de semántica continúan en efecto en muchas subáreas de networking. La inabilidad de intercambiar data sin ambiguedad entre sistemas es un problema que limita la interoperabilidad semántica. En esta tesis, abordamos los problemas de semántica en dos áreas: (i) la gestión de configuración y (ii) arquitecturas de direccionamiento. El hecho que hace el estudio en estas áreas de interés, es que los problemas de semántica datan desde los inicios del Internet. Sin embargo, mientras las redes continúan creciendo en tamaño y complejidad, los mecanismos desplegados dejan de ser escalabales y prácticos. Una de las tareas más complejas y esenciales en la gestión de redes es la configuración de equipos. La falta de mecanismos estándar para la modificación y control de la configuración de equipos ha llevado al uso continuado y extendido de interfaces por líneas de comando (CLI). Desafortunadamente, las CLIs son generalmente, específicos por fabricante y dispositivo. En el contexto de redes heterogéneas--es decir, redes típicamente compuestas por múltiples dispositivos de distintos fabricantes--el uso de varias CLIs trae consigo serios problemas de operación, administración y gestión. En consecuencia, los administradores de red se ven forzados a adquirir experiencia en el manejo específico de múltiples tecnologías y además, a mantenerse continuamente actualizados en la medida en que nuevas funcionalidades o tecnologías emergen, o bien con actualizaciones de sistemas operativos. En general, la utilización de mecanismos propietarios no permite compartir conocimientos de forma consistente a lo largo de plataformas heterogéneas, ni reutilizar configuraciones con el objetivo de alcanzar la completa automatización de tareas de configuración--que son típicamente requeridas en el área de gestión autonómica. Debido a esta heterogeneidad, las CLIs suelen proporcionar una función de ayuda que fundamentalmente aporta información para la interpretación semántica del entorno de configuración de un fabricante. La gran cantidad de información que un administrador debe aprender y manejar, hace de la extracción de información y otras formas de análisis de lenguaje natural del campo de Inteligencia Artificial, potenciales herramientas para la configuración de equipos en entornos heterogéneos. Esta tesis presenta el diseño y especificaciones de implementación del primer sistema de extracción de información basada en ontologías desde el CLI de dispositivos de red, para la automatización y abstracción de configuraciones. Por otra parte, la denominada sobrecarga semántica de direcciones IP--en donde, las direcciones son identificadores y localizadores al mismo tiempo--es una de las principales limitaciones sobre mobilidad, multi-homing y escalabilidad del sistema de enrutamiento. Por esta razón, numerosas propuestas han emergido en un esfuerzo por desacoplar la semántica del esquema de direccionamiento de las redes actuales. En esta tesis, abordamos este problema desde dos perspectivas, la primera de ellas una aproximación no-disruptiva (es decir, evolucionaria) al problema del Internet actual y la segunda, una nueva propuesta en torno a futuras arquitecturas del Internet. En el primer escenario, analizamos el protocolo LISP (del inglés, Locator/Identifier Separation Protocol) ya que es en efecto, una de las soluciones con mayor potencial para la resolucion del problema de semántica. Sin embargo, su adopción está limitada por problemas en los sistemas de mapeo propuestos. En esta tesis, proponemos LRP (del inglés, LISP Redundancy Protocol) un protocolo destinado a complementar LISP e incrementar la factibilidad de despliegue, a la vez que, reduce el tamaño de las tablas de mapeo, tiempo de latencia y maximiza accesibilidad. En el segundo escenario, exploramos TARIFA una arquitectura de red de nueva generación e introducimos un novedoso esquema de direccionamiento orientado a servicios

    Towards a system of concepts for Family Medicine. Multilingual indexing in General Practice/ Family Medicine in the era of Semantic Web

    Get PDF
    UNIVERSITY OF LIÈGE, BELGIUM Executive Summary Faculty of Medicine Département Universitaire de Médecine Générale. Unité de recherche Soins Primaires et Santé Doctor in biomedical sciences Towards a system of concepts for Family Medicine. Multilingual indexing in General Practice/ Family Medicine in the era of SemanticWeb by Dr. Marc JAMOULLE Introduction This thesis is about giving visibility to the often overlooked work of family physicians and consequently, is about grey literature in General Practice and Family Medicine (GP/FM). It often seems that conference organizers do not think of GP/FM as a knowledge-producing discipline that deserves active dissemination. A conference is organized, but not much is done with the knowledge shared at these meetings. In turn, the knowledge cannot be reused or reapplied. This these is also about indexing. To find knowledge back, indexing is mandatory. We must prepare tools that will automatically index the thousands of abstracts that family doctors produce each year in various languages. And finally this work is about semantics1. It is an introduction to health terminologies, ontologies, semantic data, and linked open data. All are expressions of the next step: Semantic Web for health care data. Concepts, units of thought expressed by terms, will be our target and must have the ability to be expressed in multiple languages. In turn, three areas of knowledge are at stake in this study: (i) Family Medicine as a pillar of primary health care, (ii) computational linguistics, and (iii) health information systems. Aim • To identify knowledge produced by General practitioners (GPs) by improving annotation of grey literature in Primary Health Care • To propose an experimental indexing system, acting as draft for a standardized table of content of GP/GM • To improve the searchability of repositories for grey literature in GP/GM. 1For specific terms, see the Glossary page 257 x Methods The first step aimed to design the taxonomy by identifying relevant concepts in a compiled corpus of GP/FM texts. We have studied the concepts identified in nearly two thousand communications of GPs during conferences. The relevant concepts belong to the fields that are focusing on GP/FM activities (e.g. teaching, ethics, management or environmental hazard issues). The second step was the development of an on-line, multilingual, terminological resource for each category of the resulting taxonomy, named Q-Codes. We have designed this terminology in the form of a lightweight ontology, accessible on-line for readers and ready for use by computers of the semantic web. It is also fit for the Linked Open Data universe. Results We propose 182 Q-Codes in an on-line multilingual database (10 languages) (www.hetop.eu/Q) acting each as a filter for Medline. Q-Codes are also available under the form of Unique Resource Identifiers (URIs) and are exportable in Web Ontology Language (OWL). The International Classification of Primary Care (ICPC) is linked to Q-Codes in order to form the Core Content Classification in General Practice/Family Medicine (3CGP). So far, 3CGP is in use by humans in pedagogy, in bibliographic studies, in indexing congresses, master theses and other forms of grey literature in GP/FM. Use by computers is experimented in automatic classifiers, annotators and natural language processing. Discussion To the best of our knowledge, this is the first attempt to expand the ICPC coding system with an extension for family physician contextual issues, thus covering non-clinical content of practice. It remains to be proven that our proposed terminology will help in dealing with more complex systems, such as MeSH, to support information storage and retrieval activities. However, this exercise is proposed as a first step in the creation of an ontology of GP/FM and as an opening to the complex world of Semantic Web technologies. Conclusion We expect that the creation of this terminological resource for indexing abstracts and for facilitating Medline searches for general practitioners, researchers and students in medicine will reduce loss of knowledge in the domain of GP/FM. In addition, through better indexing of the grey literature (congress abstracts, master’s and doctoral theses), we hope to enhance the accessibility of research results and give visibility to the invisible work of family physicians

    Towards Machine Learning on the Semantic Web

    No full text
    Abstract. In this paper we explore some of the opportunities and chal-lenges for machine learning on the Semantic Web. The Semantic Web provides standardized formats for the representation of both data and ontological background knowledge. Semantic Web standards are used to describe meta data but also have great potential as a general data for-mat for data communication and data integration. Within a broad range of possible applications machine learning will play an increasingly im-portant role: Machine learning solutions have been developed to support the management of ontologies, for the semi-automatic annotation of un-structured data, and to integrate semantic information into web mining. Machine learning will increasingly be employed to analyze distributed data sources described in Semantic Web formats and to support approx-imate Semantic Web reasoning and querying. In this paper we discuss existing and future applications of machine learning on the Semantic Web with a strong focus on learning algorithms that are suitable for the relational character of the Semantic Web’s data structure. We discuss some of the particular aspects of learning that we expect will be of rele-vance for the Semantic Web such as scalability, missing and contradicting data, and the potential to integrate ontological background knowledge. In addition we review some of the work on the learning of ontologies and on the population of ontologies, mostly in the context of textual data.
    corecore