127 research outputs found

    Change Detection in XML Documents for Fixed Structures using Exclusive-Or (XOR)

    Get PDF
    Abstract XML is an emerging standard for the representation and exchange of Internet data. The characteristics of XML, tree-structured (i.e. a collection of nodes) and self-descriptive, facilitate the detection of changes in an XML document in minute detail and at a finer grain than obtainable at the document level. Furthermore, for a fixed schema (structure) changes may frequently happen in the content or data values of XML documents in the Web. We wish to propose a method that effectively detects this content or data value changes. Rather than inspecting all nodes between two versions of XML documents, we propose an effective algorithm, called top-down, which will detect changes in XML documents by exploring a subset of nodes in the tree. We would like to be certain that if a leaf node changes the algorithm will detect these changes, not only by inspecting the node itself, but also its parent node, grand parent node, and so on. For this, a signature for each node will be constructed which is basically an abstraction of the information stored in a node. There are several ways we can construct such a signature. We will choose exclusive-or (XOR) to construct node signatures which will prevent a user from getting irrelevant information/change and make certain that the user does not miss relevant information. Note that for the web, along with being able to access huge quantities of information, the relevancy of information/change is more important than missing of relevant information/change. For this, in this paper we propose an automatic change detection algorithm which will identify changes between two versions of an XML document based on these signatures using XOR. Our proposed algorithm will traverse the least number of nodes necessary to detect these changes. We demonstrate that our algorithm outperforms the traditional algorithm which exhaustively searches the entire space. We will also demonstrate analytically and empirically that the miss of relevant change is within tolerable range

    Introductory Computer Forensics

    Get PDF
    INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic

    RUPEE: A Big Data Approach to Indexing and Searching Protein Structures

    Get PDF
    Title from PDF of title page viewed July 7, 2021Yugyung LeeVitaIncludes bibliographical references (pages 149-158)Thesis (Ph.D.)--School of Computing and Engineering and Department of Mathematics and Statistics. University of Missouri--Kansas City, 2021Given the close relationship between protein structure and function, protein structure searches have long played an established role in bioinformatics. Despite their maturity, existing protein structure searches either compromise the quality of results to obtain faster response times or suffer from longer response times to provide better quality results. Existing protein structure searches that focus on faster response times often use sequence clustering or depend on other simplifying assumptions not based on structure alone. In the case of sequence clustering, strong structure similarities are often hidden behind cluster representatives. Existing protein structure searches that focus on better quality results often perform full pairwise protein structure alignments with the query structure against every available structure in the searched database, which can take as long as a full day to complete. The poor response times of these protein structure searches prevent the easy and efficient exploration of relationships between protein structures, which is the norm in other areas of inquiry. To address these trade-offs between faster response times and quality results, we have developed RUPEE, a fast and accurate purely geometric protein structure search combining a novel approach to encoding sequences of torsion angles with established techniques from information retrieval and big data. RUPEE can compare the query structure to every available structure in the searched database with fast response times. To accomplish this, first, we introduce a new polar plot of torsion angles to help identify separable regions of torsion angles and derive a simple encoding of torsion angles based on the identified regions. Then, we introduce a heuristic to encode sequences of torsion angles called Run Position Encoding to increase the specificity of our encoding within regular secondary structures, alpha-helices and beta-strands. Once we have a linear encoding of protein structures based on their torsion angles, we use min-hashing and locality sensitive hashing, established techniques from information retrieval and big data, to compare the query structure to every available structure in the searched database with fast response times. Moreover, because RUPEE is a purely geometric protein structure search, it does not depend on protein sequences. RUPEE also does not depend on other simplifying assumptions not based on structure alone. As such, RUPEE can be used effectively to search on protein structures with low sequence and structure similarity to known structures, such as predicted structures that results from protein structure prediction algorithms. Comparing our results to the mTM-align, SSM, CATHEDRAL, and VAST protein structure searches, RUPEE has set a new bar for protein structure searches. RUPEE produces better quality results than the best available protein structure searches and does so with the fastest response times.Introduction -- Encoding Torsion Angles -- Indexing Protein Structures -- Searching Protein Structures -- Results and Evaluation -- Using RUPEE -- Conclusion -- Appendix A. Benchmarks of Known Protein Structures -- Appendix B. Benchmarks of Protein Structure Prediction

    Component-based control system development for agile manufacturing machine systems

    Get PDF
    It is now a common sense that manufactures including machine suppliers and system integrators of the 21 st century will need to compete on global marketplaces, which are frequently shifting and fragmenting, with new technologies continuously emerging. Future production machines and manufacturing systems need to offer the "agility" required in providing responsiveness to product changes and the ability to reconfigure. The primary aim for this research is to advance studies in machine control system design, in the context of the European project VIR-ENG - "Integrated Design, Simulation and Distributed Control of Agile Modular Machinery"

    Information exchange between medical databases through automated identification of concept equivalence

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2002."February 2002."Includes bibliographical references (p. 123-127).The difficulty of exchanging information between heterogeneous medical databases remains one of the chief obstacles in achieving a unified patient medical record. Although methods have been developed to address differences in data formats, system software, and communication protocols, automated data exchange between disparate systems still remains an elusive goal. The Medical Information Acquisition and Transmission Enabler (MEDIATE) system identifies semantically equivalent concepts between databases to facilitate information exchange. MEDIATE employs a semantic network representation to model underlying native databases and to serve as an interface for database queries. This representation generates a semantic context for data concepts that can subsequently be exploited to perform automated concept matching between disparate databases. To test the feasibility of this system, medical laboratory databases from two different institutions were represented within MEDIATE and automated concept matching was performed. The experimental results show that concepts that existed in both laboratory databases were always correctly recognized as candidate matches.(cont.) In addition, concepts which existed in only one database could often be matched with more "generalized" concepts in the other database that could still provide useful information. The architecture of MEDIATE offers advantages in system scalability and robustness. Since concept matching is performed automatically, the only work required to enable data exchange is construction of the semantic network representation. No pre-negotiation is required between institutions to identify data that is compatible for exchange, and there is no additional overhead to add more databases to the exchange network. Because the concept matching occurs dynamically at the time of information exchange, the system is robust to modifications in the underlying native databases as long as the semantic network representations are appropriately updated.by Yao Sun.Ph.D

    Spatial ontologies for architectural heritage

    Get PDF
    Informatics and artificial intelligence have generated new requirements for digital archiving, information, and documentation. Semantic interoperability has become fundamental for the management and sharing of information. The constraints to data interpretation enable both database interoperability, for data and schemas sharing and reuse, and information retrieval in large datasets. Another challenging issue is the exploitation of automated reasoning possibilities. The solution is the use of domain ontologies as a reference for data modelling in information systems. The architectural heritage (AH) domain is considered in this thesis. The documentation in this field, particularly complex and multifaceted, is well-known to be critical for the preservation, knowledge, and promotion of the monuments. For these reasons, digital inventories, also exploiting standards and new semantic technologies, are developed by international organisations (Getty Institute, ONU, European Union). Geometric and geographic information is essential part of a monument. It is composed by a number of aspects (spatial, topological, and mereological relations; accuracy; multi-scale representation; time; etc.). Currently, geomatics permits the obtaining of very accurate and dense 3D models (possibly enriched with textures) and derived products, in both raster and vector format. Many standards were published for the geographic field or in the cultural heritage domain. However, the first ones are limited in the foreseen representation scales (the maximum is achieved by OGC CityGML), and the semantic values do not consider the full semantic richness of AH. The second ones (especially the core ontology CIDOC – CRM, the Conceptual Reference Model of the Documentation Commettee of the International Council of Museums) were employed to document museums’ objects. Even if it was recently extended to standing buildings and a spatial extension was included, the integration of complex 3D models has not yet been achieved. In this thesis, the aspects (especially spatial issues) to consider in the documentation of monuments are analysed. In the light of them, the OGC CityGML is extended for the management of AH complexity. An approach ‘from the landscape to the detail’ is used, for considering the monument in a wider system, which is essential for analysis and reasoning about such complex objects. An implementation test is conducted on a case study, preferring open source applications

    Exploring visual representation of sound in computer music software through programming and composition

    Get PDF
    Presented through contextualisation of the portfolio works are developments of a practice in which the acts of programming and composition are intrinsically connected. This practice-based research (conducted 2009–2013) explores visual representation of sound in computer music software. Towards greater understanding of composing with the software medium, initial questions are taken as stimulus to explore the subject through artistic practice and critical thinking. The project begins by asking: How might the ways in which sound is visually represented influence the choices that are made while those representations are being manipulated and organised as music? Which aspects of sound are represented visually, and how are those aspects shown? Recognising sound as a psychophysical phenomenon, the physical and psychological aspects of aesthetic interest to my work are identified. Technological factors of mediating these aspects for the interactive visual-domain of software are considered, and a techno-aesthetic understanding developed. Through compositional studies of different approaches to the problem of looking at sound in software, on screen, a number of conceptual themes emerge in this work: the idea of software as substance, both as a malleable material (such as in live coding), and in terms of outcome artefacts; the direct mapping between audio data and screen pixels; the use of colour that maintains awareness of its discrete (as opposed to continuous) basis; the need for integrated display of parameter controls with their target data; and the tildegraph concept that began as a conceptual model of a gramophone and which is a spatio-visual sound synthesis technique related to wave terrain synthesis. The spiroid-frequency-space representation is introduced, contextualised, and combined both with those themes and a bespoke geometrical drawing system (named thisis), to create a new modular computer music software environment named sdfsys

    THE DEVELOPMENT OF NOVEL TECHNIQUES FOR CHARACTERISATION OF MARINE ZOOPLANKTON OVER VERY LARGE SPATIAL SCALES

    Get PDF
    Marine zooplankton play an important role in the transfer of CO2 from the atmosphere/ocean system to deeper waters and the sediments. They also provide food for much of the world's fish stocks and in some areas of the ocean depleted of nutrients they sustain phytoplankton growth by recycling nutrients. They therefore have a profound effect on the carbon cycle and upon life in the oceans. There is a perceived lack of information about global distributions of zooplankton needed to validate ecosystems dynamics models, and the traditional methods of survey are inadequate to provide this information. There is a need to develop new technologies for the large scale survey of zooplankton, which should provide data either suitable for quick and easy subsequent processing, or better still, processed in real time. New technologies for large scale zooplankton survey fall into three main categories: acoustic, optical and video. No single method is capable of providing continuous real time data at the level of detail required. A combination of two of the new technologies (optical and video) has the potential to provide broad scale data on abundance, size and species distributions of zooplankton routinely, reliably, rapidly and economically. Such a combined method has been developed in this study. The optical plankton counter (OPC) is a fairly well established instrument in marine and freshwater zooplankton survey. A novel application of the benchtop version of this instrument (OPC-IL) for real time data gathering at sea over ocean basin scales has been developed in this study. A new automated video zooplankton analyser (ViZA) has been designed and developed to operate together with the OPC-IL. The two devices are eventually to be deployed in tandem on the Undulating Oceanographic Recorder (UOR) for large scale ocean survey of zooplankton. During the initial development of the system, the two devices are used in benchtop flow through mode using the ship's uncontaminated sea water supply. The devices have been deployed on four major oceanographic cruises in the North and South Atlantic, covering almost 40,000 km. of transect. Used in benchtop mode, it has been shown that the OPC can simply and reliably survey thousands of kilometres of ocean surface waters for zooplankton abundance and size distribution in the size range 250|im. to 11.314 mm. in real time. The ViZA system can add the dimension of shape to the OPC size data, and provide supporting data on size distributions and abundance. Sampling rate in oligotrophic waters, and image quality problems are two main limitations to current ViZA performance which must be addressed, but where sufficient abundance exists and good quality images are obtained, the initial version of the ViZA system is shown to be able reliably to classify zooplankton to six major groups. The four deployments have shown that data on zooplankton distributions on oceanic scales can be obtained without the delays and prohibitive costs associated with sample analysis for traditional sampling methods. The results of these deployments are presented, together with an assessment of the performance of the system and proposals for improvements to meet the requirements specified before a fiill in-situ system is deployed.Plymouth Marine Laborator

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)

    Building blocks for semantic data organization on the desktop

    Get PDF
    Die Organisation von (Multimedia-) Daten auf Desktop-Systemen wird derzeit hauptsĂ€chlich durch das Einordnen von Dateien in ein hierarchisches Dateisystem bewerkstelligt. ZusĂ€tzlich werden gewisse Inhalte (z.B. Musik oder Fotos) von spezialisierter Software mit Hilfe Datei-bezogener Metadaten verwaltet. Diese Metadaten werden meist direkt im Dateikopf in einer Unzahl verschiedener, vorwiegend proprietĂ€rer Formate gespeichert. Allgemein nehmen Metadaten und Links die SchlĂŒsselrollen in fortgeschrittenen Datenorganisationskonzepten ein, ihre eingeschrĂ€nkte UnterstĂŒtzung in vorherrschenden Dateisystemen macht die EinfĂŒhrung solcher Konzepte auf dem Desktop jedoch schwierig: Erstens mĂŒssen Anwendungen sowohl Dateiformat als auch Metadatenschema verstehen um auf Metadaten zugreifen zu können; zweitens ist ein getrennter Zugriff auf Daten und Metadaten nicht möglich und drittens kann man solche Metadaten nicht mit mehreren Dateien oder mit Dateiordnern assoziieren obgleich letztere die derzeit wichtigsten Konstrukte fĂŒr die Dateiorganisation darstellen. Dies bedeutet in weiterer Folge: (i) eingeschrĂ€nkte Möglichkeiten der Datenorganisation, (ii) eingeschrĂ€nkte Navigationsmöglichkeiten, (iii) schlechte Auffindbarkeit der gespeicherten Daten, und (iv) Fragmentierung von Metadaten. Obschon es Versuche gab, diese Situation (zum Beispiel mit Hilfe semantischer Dateisysteme) zu verbessern, wurden die meisten dieser Probleme bisher vor allem im Web und im Speziellen im semantischen Web adressiert und gelöst. Das Anwenden dort entwickelter Lösungen auf dem Desktop, einer zentralen Plattform der Daten- und Metadatenmanipulation, wĂ€re zweifellos von Vorteil. In der vorliegenden Arbeit wird ein neues, rĂŒckwĂ€rts-kompatibles Metadatenmodell als Lösungsversuch fĂŒr die oben genannten Probleme prĂ€sentiert. Dieses Modell basiert auf stabilen Datei-Identifikatoren und externen, semantischen, Datei- bezogenen Metadatenbeschreibungen welche im RDF Graphenmodell reprĂ€sentiert werden. Diese Beschreibungen sind durch eine einheitliche Linked-Data- Schnittstelle zugĂ€nglich und können mit anderen Beschreibungen und Ressourcen verlinkt werden. Im Speziellen erlaubt dieses Modell semantische Links zwischen lokalen Dateisystemobjekten und Netzressourcen im Web sowie im entstehenden “Daten Web” und ermöglicht somit die Integration dieser DatenrĂ€ume. Das Modell hĂ€ngt entscheidend von der StabilitĂ€t dieser Links ab weshalb zwei Algorithmen prĂ€sentiert werden, welche deren IntegritĂ€t in lokalen und vernetzten Umgebungen erhalten können. Dies bedeutet, dass Links zwischen Dateisystemobjekten, Metadatenbeschreibungen und Netzressourcen nicht brechen wenn sich deren Adressen Ă€ndern, z.B. wenn Dateien verschoben oder Linked-Data Ressourcen unter geĂ€nderten URIs publiziert werden. Schließlich wird eine prototypische Implementierung des vorgeschlagenen Metadatenmodells prĂ€sentiert, welche demonstriert wie die Summe dieser Bausteine eine Metadatenschicht bildet die als Grundlage fĂŒr semantische Datenorganisation auf dem Desktop verwendet werden kann.The organization of (multimedia) data on current desktop systems is done to a large part by arranging files in hierarchical file systems, but also by specialized applications (e.g., music or photo organizing software) that make use of file-related metadata for this task. These metadata are predominantly stored in embedded file headers, using a magnitude of mainly proprietary formats. Generally, metadata and links play the key roles in advanced data organization concepts. Their limited support in prevalent file system implementations, however, hinders the adoption of such concepts on the desktop: First, non-uniform access interfaces require metadata consuming applications to understand both a file’s format and its metadata scheme; second, separate data/metadata access is not possible, and third, metadata cannot be attached to multiple files or to file folders although the latter are the primary constructs for file organization. As a consequence of this, current desktops suffer, inter alia, from (i) limited data organization possibilities, (ii) limited navigability, (iii) limited data findability, and (iv) metadata fragmentation. Although there were attempts to improve this situation, e.g., by introducing semantic file systems, most of these issues were successfully addressed and solved in the Web and in particular in the Semantic Web and reusing these solutions on the desktop, a central hub of data and metadata manipulation, is clearly desirable. In this thesis a novel, backwards-compatible metadata model that addresses the above-mentioned issues is introduced. This model is based on stable file identifiers and external, file-related, semantic metadata descriptions that are represented using the generic RDF graph model. Descriptions are accessible via a uniform Linked Data interface and can be linked with other descriptions and resources. In particular, this model enables semantic linking between local file system objects and remote resources on the Web or the emerging Web of Data, thereby enabling the integration of these data spaces. As the model crucially relies on the stability of these links, we contribute two algorithms that preserve their integrity in local and in remote environments. This means that links between file system objects, metadata descriptions and remote resources do not break even if their addresses change, e.g., when files are moved or Linked Data resources are re-published using different URIs. Finally, we contribute a prototypical implementation of the proposed metadata model that demonstrates how these building blocks sum up to constitute a metadata layer that may act as a foundation for semantic data organization on the desktop
    • 

    corecore