120 research outputs found

    Extração de conhecimento a partir de fontes semi-estruturadas

    Get PDF
    The increasing number of small, cheap devices, full of sensing capabilities lead to an untapped source of data that can be explored to improve and optimize multiple systems, from small-scale home automation to large-scale applications such as agriculture monitoring, traffic flow and industrial maintenance prediction. Yet, hand in hand with this growth, goes the increasing difficulty to collect, store and organize all these new data. The lack of standard context representation schemes is one of the main struggles in this area. Furthermore, conventional methods for extracting knowledge from data rely on standard representations or a priori relations. These a priori relations add latent information to the underlying model, in the form of context representation schemes, table relations, or even ontologies. Nonetheless, these relations are created and maintained by human users. While feasible for small-scale scenarios or specific areas, this becomes increasingly difficult to maintain when considering the potential dimension of IoT and M2M scenarios. This thesis addresses the problem of storing and organizing context information from IoT/M2M scenarios in a meaningful way, without imposing a representation scheme or requiring a priori relations. This work proposes a d-dimension organization model, which was optimized for IoT/M2M data. The model relies on machine learning features to identify similar context sources. These features are then used to learn relations between data sources automatically, providing the foundations for automatic knowledge extraction, where machine learning, or even conventional methods, can rely upon to extract knowledge on a potentially relevant dataset. During this work, two different machine learning techniques were tackled: semantic and stream similarity. Semantic similarity estimates the similarity between concepts (in textual form). This thesis proposes an unsupervised learning method for semantic features based on distributional profiles, without requiring any specific corpus. This allows the organizational model to organize data based on concept similarity instead of string matching. Another advantage is that the learning method does not require input from users, making it ideal for massive IoT/M2M scenarios. Stream similarity metrics estimate the similarity between two streams of data. Although these methods have been extensively researched for DNA sequencing, they commonly rely on variants of the longest common sub-sequence. This PhD proposes a generative model for stream characterization, specially optimized for IoT/M2M data. The model can be used to generate statistically significant data’s streams and estimate the similarity between streams. This is then used by the context organization model to identify context sources with similar stream patterns. The work proposed in this thesis was extensively discussed, developed and published in several international publications. The multiple contributions in projects and collaborations with fellow colleagues, where parts of the work developed were used successfully, support the claim that although the context organization model (and subsequent similarity features) were optimized for IoT/M2M data, they can potentially be extended to deal with any kind of context information in a wide array of applications.O número crescente de dispositivos pequenos e baratos, repletos de capacidades sensoriais, criou uma nova fonte de dados que pode ser explorada para melhorar e otimizar vários sistemas, desde domótica em ambientes residenciais até aplicações de larga escala como monitorização agrícola, gestão de tráfego e manutenção preditiva a nível industrial. No entanto, este crescimento encontra-se emparelhado com a crescente dificuldade em recolher, armazenar e organizar todos estes dados. A inexistência de um esquema de representação padrão é uma das principais dificuldades nesta área. Além disso, métodos de extração de conhecimento convencionais dependem de representações padrão ou relações definidas a priori. No entanto estas relações são definidas e mantidas por utilizadores humanos. Embora seja viável para cenários de pequena escala ou áreas especificas, este tipo de relações torna-se cada vez mais difícil de manter quando se consideram cenários com a dimensão associado a IoT e M2M. Esta tese de doutoramento endereça o problema de armazenar e organizar informação de contexto de cenários de IoT/M2M, sem impor um esquema de representação ou relações a priori. Este trabalho propõe um modelo de organização com d dimensões, especialmente otimizado para dados de IoT/M2M. O modelo depende de características de machine learning para identificar fontes de contexto similares. Estas caracteristicas são utilizadas para aprender relações entre as fontes de dados automaticamente, criando as fundações para a extração de conhecimento automática. Quer machine learning quer métodos convencionais podem depois utilizar estas relações automáticas para extrair conhecimento em datasets potencialmente relevantes. Durante este trabalho, duas técnicas foram desenvolvidas: similaridade semântica e similaridade entre séries temporais. Similaridade semântica estima a similaridade entre conceitos (em forma textual). Este trabalho propõe um método de aprendizagem não supervisionado para features semânticas baseadas em perfis distributivos, sem exigir nenhum corpus específico. Isto permite ao modelo de organização organizar dados baseado em conceitos e não em similaridade de caracteres. Numa outra vantagem importante para os cenários de IoT/M2M, o método de aprendizagem não necessita de dados de entrada adicionados por utilizadores. A similaridade entre séries temporais são métricas que permitem estimar a similaridade entre várias series temporais. Embora estes métodos tenham sido extensivamente desenvolvidos para sequenciação de ADN, normalmente dependem de variantes de métodos baseados na maior sub-sequencia comum. Esta tese de doutoramento propõe um modelo generativo para caracterizar séries temporais, especialmente desenhado para dados IoT/M2M. Este modelo pode ser usado para gerar séries temporais estatisticamente corretas e estimar a similaridade entre múltiplas séries temporais. Posteriormente o modelo de organização identifica fontes de contexto com padrões temporais semelhantes. O trabalho proposto foi extensivamente discutido, desenvolvido e publicado em diversas publicações internacionais. As múltiplas contribuições em projetos e colaborações com colegas, onde partes trabalho desenvolvido foram utilizadas com sucesso, permitem reivindicar que embora o modelo (e subsequentes técnicas) tenha sido otimizado para dados IoT/M2M, podendo ser estendido para lidar com outros tipos de informação de contexto noutras áreas.The present study was developed in the scope of the Smart Green Homes Project [POCI-01-0247-FEDER-007678], a co-promotion between Bosch Termotecnologia S.A. and the University of Aveiro. It is financed by Portugal 2020 under the Competitiveness and Internationalization Operational Program, and by the European Regional Development Fund.Programa Doutoral em Informátic

    Adaptive and Reactive Rich Internet Applications

    Get PDF
    In this thesis we present the client-side approach of Adaptive and Reactive Rich Internet Applications as the main result of our research into how to bring in time adaptivity to Rich Internet Applications. Our approach leverages previous work on adaptive hypermedia, event processing and other research disciplines. We present a holistic framework covering the design-time as well as the runtime aspects of Adaptive and Reactive Rich Internet Applications focusing especially on the run-time aspects

    Continuous Process Auditing (CPA): an Audit Rule Ontology Approach to Compliance and Operational Audits

    Get PDF
    Continuous Auditing (CA) has been investigated over time and it is, somewhat, in practice within nancial and transactional auditing as a part of continuous assurance and monitoring. Enterprise Information Systems (EIS) that run their activities in the form of processes require continuous auditing of a process that invokes the action(s) speci ed in the policies and rules in a continuous manner and/or sometimes in real-time. This leads to the question: How much could continuous auditing mimic the actual auditing procedures performed by auditing professionals? We investigate some of these questions through Continuous Process Auditing (CPA) relying on heterogeneous activities of processes in the EIS, as well as detecting exceptions and evidence in current and historic databases to provide audit assurance

    The AdCIM framework : extraction, integration and persistence of the configuration of distributed systems

    Get PDF
    [Resumen] Este resumen se compone de una introducción, que explica el enfoque y contexto de la Tesis, seguida de una sección sobre su organización en partes y capítulos. Después, sigue una enumeración de las contribuciones recogidas en ella, para finalizar con las conclusiones y trabajo futuro. Introducción Los administradores de sistemas tienen que trabajar con la gran diversidad de hardware y software existente en las organizaciones actuales. Desde el punto de vista del administrador, las infraestructuras homogéneas son mucho más sencillas de administrar y por ello más deseables. Pero, aparte de la dificultad intrínseca de mantener esa homogeneidad a la vez que progresa la tecnología y las consecuencias de estar atado a un proveedor fijo, la propia homogeneidad tiene riesgos; por ejemplo, las instalaciones en monocultivo son más vulnerables contra virus y troyanos, y hacerlas seguras requiere la introducción de diferencias aleatorias en llamadas al sistema que introduzcan diversidad artificial, una medida que puede provocar inestabilidad (ver Birman y Schneider. Esto hace la heterogeneidad en sí casi inevitable, y una característica de los sistemas reales difícil de obviar. Pero de hecho conlleva más complejidad. En muchas instalaciones, la mezcla de Windows y derivados de Unix es usual, ya sea en combinación o divididos claramente en clientes y servidores. Las tareas de administración en ambos sistemas son diferentes debido a las diferencias en ecosistema y modo de conceptualizar los sistemas informáticos acaecidas tras años de divergencia en interfaces, sistemas de configuración, comandos y abstracciones. A lo largo del tiempo ha habido muchos intentos de cerrar esa brecha, y algunos lo hacen emulando o versionando las herramientas Unix, probadas a lo largo de muchos años. Por ejemplo, la solución de Microsoft, Windows Services for Unix permite el uso de NIS, el Network File System (NFS), Perl, y el shell Korn en Windows, pero no los integra realmente en Windows, ya que está más orientado a la migración de aplicaciones. Cygwin soporta más herramientas, como Bash y las Autotools de GNU, pero se centra en la traslación directa a Windows de programas Unix basados en POSIX usando gcc. Outwit es un port muy interesante del conjunto de herramientas Unix que integra los pipelines de Unix en Windows y permite acceder al Registro, los drivers ODBC y al portapapeles desde los shells de Unix, pero los scripts desarrollados para este sistema no son usables directamente en sistemas Unix. Por lo tanto, la separación sigue a pesar de dichos intentos. En esta Tesis presentamos un framework, denominado AdCIM, para la administración de la configuración de sistemas heterogéneos. Como tal, su objetivo es integrar y uniformizar la administración de estos sistemas abstrayendo sus diferencias, pero al mismo tiempo ser flexible y fácil de adaptar para soportar nuevos sistemas rápidamente. Para lograr dichos objetivos la arquitectura de AdCIM sigue el paradigma de orientación a modelo, que propone el diseño de aplicaciones a partir de un modelo inicial, que es transformado en diversos ''artefactos'', como código, documentación, esquemas de base de datos, etc. que formarían la aplicación. En el caso de AdCIM, el modelo es CIM, y las transformaciones se efectúan utilizando el lenguaje declarativo XSLT, que es capaz de expresar transformaciones sobre datos XML. AdCIM realiza todas sus transformaciones con XSLT, excepto la conversión inicial de ficheros de texto plano a XML, hecha con un párser especial de texto a XML. Los programas XSLT, también denominados stylesheets, enlazan y transforman partes específicas del árbol XML de entrada, y soportan ejecución recursiva, formando un modelo de programación declarativo-funcional con gran potencia expresiva. El modelo elegido para representar los dominios de administración cubiertos por el framework es CIM (Common Information Model), un modelo estándar, extensible y orientado a objetos creado por la Distributed Management Task Force (DMTF). Usando esquemas del modelo CIM, los múltiples y distintos formatos de configuración y datos de administración son traducidos por la infraestructura de AdCIM en instancias CIM. Los esquemas CIM también sirven como base para generar formularios web y otros esquemas específicos para validación y persistencia de los datos. El desarrollo de AdCIM como un framework orientado al modelo evolucionó a partir de nuestro trabajo previo, que extraía datos de configuración y los almacenaba en un repositorio LDAP utilizando scripts Perl. En sucesivos trabajos se empezó a trabajar con la orientación a modelo y se demostró la naturaleza adaptativa de este framework, mediante adaptaciones a entornos Grid y a Wireless Mesh Networks. El enfoque e implementación de este framework son novedosos, y usa algunas tecnologías definidas como estándares por organizaciones internacionales como la IETF, la DMTF, y la W3C. Vemos el uso de dichas tecnologías como una ventaja en vez de una limitación en las posibilidades del framework. Su uso añade generalidad y aplicabilidad al framework, sobre todo comparado con soluciones ad-hoc o de propósito muy específico. A pesar de esta flexibilidad, hemos intentado en todo lo posible definir y concretar todos los aspectos de implementación, definir prácticas de uso adecuadas y evaluar el impacto en el rendimiento y escalabilidad del framework de la elección de las distintas tecnologías estándar

    Topic driven testing

    Get PDF
    Modern interactive applications offer so many interaction opportunities that automated exploration and testing becomes practically impossible without some domain specific guidance towards relevant functionality. In this dissertation, we present a novel fundamental graphical user interface testing method called topic-driven testing. We mine the semantic meaning of interactive elements, guide testing, and identify core functionality of applications. The semantic interpretation is close to human understanding and allows us to learn specifications and transfer knowledge across multiple applications independent of the underlying device, platform, programming language, or technology stack—to the best of our knowledge a unique feature of our technique. Our tool ATTABOY is able to take an existing Web application test suite say from Amazon, execute it on ebay, and thus guide testing to relevant core functionality. Tested on different application domains such as eCommerce, news pages, mail clients, it can trans- fer on average sixty percent of the tested application behavior to new apps—without any human intervention. On top of that, topic-driven testing can go with even more vague instructions of how-to descriptions or use-case descriptions. Given an instruction, say “add item to shopping cart”, it tests the specified behavior in an application–both in a browser as well as in mobile apps. It thus improves state-of-the-art UI testing frame- works, creates change resilient UI tests, and lays the foundation for learning, transfer- ring, and enforcing common application behavior. The prototype is up to five times faster than existing random testing frameworks and tests functions that are hard to cover by non-trained approaches.Moderne interaktive Anwendungen bieten so viele Interaktionsmöglichkeiten, dass eine vollständige automatische Exploration und das Testen aller Szenarien praktisch unmöglich ist. Stattdessen muss die Testprozedur auf relevante Kernfunktionalität ausgerichtet werden. Diese Arbeit stellt ein neues fundamentales Testprinzip genannt thematisches Testen vor, das beliebige Anwendungen u ̈ber die graphische Oberfläche testet. Wir untersuchen die semantische Bedeutung von interagierbaren Elementen um die Kernfunktionenen von Anwendungen zu identifizieren und entsprechende Tests zu erzeugen. Statt typischen starren Testinstruktionen orientiert sich diese Art von Tests an menschlichen Anwendungsfällen in natürlicher Sprache. Dies erlaubt es, Software Spezifikationen zu erlernen und Wissen von einer Anwendung auf andere zu übertragen unabhängig von der Anwendungsart, der Programmiersprache, dem Testgerät oder der -Plattform. Nach unserem Kenntnisstand ist unser Ansatz der Erste dieser Art. Wir präsentieren ATTABOY, ein Programm, das eine existierende Testsammlung für eine Webanwendung (z.B. für Amazon) nimmt und in einer beliebigen anderen Anwendung (sagen wir ebay) ausführt. Dadurch werden Tests für Kernfunktionen generiert. Bei der ersten Ausführung auf Anwendungen aus den Domänen Online Shopping, Nachrichtenseiten und eMail, erzeugt der Prototyp sechzig Prozent der Tests automatisch. Ohne zusätzlichen manuellen Aufwand. Darüber hinaus interpretiert themen- getriebenes Testen auch vage Anweisungen beispielsweise von How-to Anleitungen oder Anwendungsbeschreibungen. Eine Anweisung wie "Fügen Sie das Produkt in den Warenkorb hinzu" testet das entsprechende Verhalten in der Anwendung. Sowohl im Browser, als auch in einer mobilen Anwendung. Die erzeugten Tests sind robuster und effektiver als vergleichbar erzeugte Tests. Der Prototyp testet die Zielfunktionalität fünf mal schneller und testet dabei Funktionen die durch nicht spezialisierte Ansätze kaum zu erreichen sind

    BioIMAX : a Web2.0 approach to visual data mining in bioimage data

    Get PDF
    Loyek C. BioIMAX : a Web2.0 approach to visual data mining in bioimage data. Bielefeld: Universität Bielefeld; 2012

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Enabling Technologies for Web 3.0: A Comprehensive Survey

    Full text link
    Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web, 3D interactive web, Metaverse, Virtual reality/Augmented reality, Internet of Things technology, and their roles in shaping Web 3.0. We commence by providing a comprehensive background of Web 3.0, including its concept, basic architecture, potential applications, and industry adoption. Subsequently, we examine recent breakthroughs in IoT, 5G, and blockchain technologies that are pivotal to Web 3.0 development. Following that, other enabling technologies, including AI, semantic web, and 3D interactive web, are discussed. Utilizing these technologies can effectively address the critical challenges in realizing Web 3.0, such as ensuring decentralized identity, platform interoperability, data transparency, reducing latency, and enhancing the system's scalability. Finally, we highlight significant challenges associated with Web 3.0 implementation, emphasizing potential solutions and providing insights into future research directions in this field

    Automatic Generation of Personalized Recommendations in eCoaching

    Get PDF
    Denne avhandlingen omhandler eCoaching for personlig livsstilsstøtte i sanntid ved bruk av informasjons- og kommunikasjonsteknologi. Utfordringen er å designe, utvikle og teknisk evaluere en prototyp av en intelligent eCoach som automatisk genererer personlige og evidensbaserte anbefalinger til en bedre livsstil. Den utviklede løsningen er fokusert på forbedring av fysisk aktivitet. Prototypen bruker bærbare medisinske aktivitetssensorer. De innsamlede data blir semantisk representert og kunstig intelligente algoritmer genererer automatisk meningsfulle, personlige og kontekstbaserte anbefalinger for mindre stillesittende tid. Oppgaven bruker den veletablerte designvitenskapelige forskningsmetodikken for å utvikle teoretiske grunnlag og praktiske implementeringer. Samlet sett fokuserer denne forskningen på teknologisk verifisering snarere enn klinisk evaluering.publishedVersio

    Resource discovery in heterogeneous digital content environments

    Get PDF
    The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice.The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice
    corecore