102 research outputs found

    Design and Performance analysis of a relational replicated database systems

    Get PDF
    The hardware organization and software structure of a new database system are presented. This system, the relational replicated database system (RRDS), is based on a set of replicated processors operating on a partitioned database. Performance improvements and capacity growth can be obtained by adding more processors to the configuration. Based on designing goals a set of hardware and software design questions were developed. The system then evolved according to a five-phase process, based on simulation and analysis, which addressed and resolved the design questions. Strategies and algorithms were developed for data access, data placement, and directory management for the hardware organization. A predictive performance analysis was conducted to determine the extent to which original design goals were satisfied. The predictive performance results, along with an analytical comparison with three other relational multi-backend systems, provided information about the strengths and weaknesses of our design as well as a basis for future research

    A formal context for closures of acyclic hypergraphs

    Get PDF
    Database constraints in the relational database model (RDBM) can be viewed as a set of rules that apply to a dataset, or as a set of axioms that can generate a (closed) set of those constraints. In this paper, we use Formal Concept Analysis to characterize the axioms of Acyclic Hypergraphs (in the RDBM they are called Acyclic Join Dependencies). This present paper complements and generalizes previous work on FCA and databases constraints.Peer ReviewedPostprint (author's final draft

    Motor-Language Cascades: How Fine Motor Relates to Language Outcomes Across Early Development

    Get PDF
    The current dissertation examined the role of motor skills on children’s language outcomes across early development. For study one a systematic review was conducted to examine differences in how gross and fine motor skills foster language development from 0-5 years of age. Results based on 22 articles indicated that while both gross and fine motor skills are related to language outcomes, too few studies have measured fine motor skills to conclusively determine differences in how gross and fine motor skills differentially relate to language outcomes. The aim of study two was to investigate whether gross or fine motor skills were predictive of language growth during the second year of life, while accounting for other common predictors of language skill. Both gross motor and fine motor skills were assessed in a sample of 95 infants at 12-months-old, with expressive language growth measured across 12- to 24-months-old. Hierarchical regression analyses indicated that fine motor skills at 12-months-old predicted language growth above and beyond gross motor skills, maternal education, infant sex, baseline language, visual reception, and gesture skills. Study three assessed the role of fine motor skills on language outcomes via individual differences in handedness for role differentiated bimanual manipulation (RDBM). Hand preference for RDBM was measured monthly from 18- to 24-month-old (N = 90). Receptive and expressive language skills were assessed at 5-years-old. Latent class growth analysis identified three toddler hand preference trajectories: left hand preference with moderate right hand use (left-moderate right), right hand preference with moderate left hand use (right-moderate left), and right hand preference with only mild left hand use (right-mild left). Analyses indicate that toddlers in the right-mild left handedness trajectory scored significantly higher on receptive and expressive language at 5-years-old compared to children with a left-moderate right hand preference. Children with a right-mild left RDBM hand preference also scored significantly higher on receptive language compared to children with a right-moderate left RDBM hand preference. Children with left-moderate right and children with a right-moderate left RDBM hand preference as toddlers did not differ in receptive or expressive language at 5-year-olds. Implications and suggestions for future work are discussed

    Highlighting matched and mismatched segments in translation memory output through sub-­tree alignment

    Get PDF
    In recent years, it is becoming more and more clear that the localisation industry does not have the necessary manpower to satisfy the increasing demand for high-quality translation. This has fuelled the search new and existing technologies that would increase translator throughput. As Translation Memory (TM) systems are the most commonly employed tool by translators, a number of enhancements are available to assist them in their job. One such enhancement would be to show the translator which parts of the sentence that needs to be translated match which parts of the fuzzy match suggested by the TM. For this information to be used, however, the translators have to carry it over to the TM translation themselves. In this paper, we present a novel methodology that can automatically detect and highlight the segments that need to be modified in a TM-­suggested translation. We base it on state-­of-the-art sub-­tree align- ment technology (Zhechev,2010) that can produce aligned phrase-­based-­tree pairs from unannotated data. Our system operates in a three-­step process. First, the fuzzy match selected by the TM and its translation are aligned. This lets us know which segments of the source-­language sentence correspond to which segments in its translation. In the second step, the fuzzy match is aligned to the input sentence that is currently being translated. This tells us which parts of the input sentence are available in the fuzzy match and which still need to be translated. In the third step, the fuzzy match is used as an intermediary, through which the alignments between the input sentence and the TM translation are established. In this way, we can detect with precision the segments in the suggested translation that the translator needs to edit and highlight them appropriately to set them apart from the segments that are already good translations for parts of the input sentence. Additionally, we can show the alignments—as detected by our system—between the input and the translation, which will make it even easier for the translator to post-edit the TM suggestion. This alignment information can additionally be used to pre- translate the mismatched segments, further reducing the post-­editing load

    Seeding statistical machine translation with translation memory output through tree-based structural alignment

    Get PDF
    With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM) technology. In this paper we present a novel modular approach that utilises state-of-the-art sub-tree alignment to pick out pre-translated segments from a TM match and seed with them an SMT system to produce a final translation. We show that the presented system can outperform pure SMT when a good TM match is found. It can also be used in a Computer-Aided Translation (CAT) environment to present almost perfect translations to the human user with markup highlighting the segments of the translation that need to be checked manually for correctness

    Lessons learnt on the analysis of large sequence data in animal genomics

    Get PDF
    The ’omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human ’omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next‐generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large‐scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry—the software may crash or stop—and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets

    Návrh a implementace vyhledávacího systému pro data brownfieldů

    Get PDF
    This diploma thesis describes the development of a software search system for data concerning brownfields and their revitalization methodology. The structure of brownfield data primary consisting of text provided the core aim of the diploma thesis to create a search system focusing on retrieving usable results from text data mainly using full-text search procedures and technologies. Results of the diploma thesis are going to be used by a research group working with brownfield revitalization methodology data. Author of the diploma thesis analyses the requirements of the research group to propose specific solutions and implementation of the full-text search system. The first proposed solution described by the diploma thesis uses Microsoft SQL Server relational database to develop full-text search system capability. Specific implementation procedures, pricing and technology deployment options for this solution are considered and included in the diploma thesis by the author. The second proposed solution by the author uses the Elastic Stack, a set of technologies consisting of Elasticsearch, Kibana and Logstash to develop a full-text search system for brownfield revitalization methodology data. Implementation details, technology deployment options and pricing of Elastic Stack implementation are described as part of the diploma thesis. The author also included in the diploma thesis a comparison of both proposed solutions together with recommendation on their usage with consideration of both current and potential future requirements for the full-text search system usage by the members of the research group, the end-users of given full-text search system.Táto diplomová práca popisuje vývoj softvérového vyhľadávacieho systému pre dáta ohľadom brownfieldov a metodiky ich revitalizácie. Štruktúra dát brownfieldov, ktorá sa skladá primárne z textu poskytla ako základný cieľ pre túto diplomovú prácu vytvorenie vyhľadávacieho systému špecializovaného na získavanie užitočných výsledkov vyhľadávania z textových dát primárne za použitia fulltextových procedúr a technológií. Výsledky diplomovej práce budú použité výskumnou skupinou pracujúcou s dátami ohľadom metodiky revitalizácie brownfieldov. Autor diplomovej práce analyzuje požiadavky výskumnej skupiny aby navrhol špecifické riešenia a implementáciu fulltextového vyhľadávacieho systému. Prvé navrhnuté riešenie popísané diplomovou prácou používa relačnú databázu Microsoft SQL Server na vývoj fulltextového vyhľadávacieho systému. Špecifické implementačné procedúry, náklady a možnosti nasadenia technológie tohto riešenia sú zvážené a zahrnuté v diplomovej práci autorom. Druhé navrhované riešenia autorom používa Elastic Stack, súbor technológií skladajúci sa z aplikácií Elasticsearch, Kibana a Logstash pre vývoj fulltextového vyhľadávacieho systému pre dáta metodiky revitalizácie brownfieldov. Detaily implementácie, možnosti nasadenia technológií a náklady Elastic Stack implementácie sú popísané ako súčasť diplomovej práce. Autor taktiež zahrnul v diplomovej práci porovnanie oboch navrhovaných riešení spoločne s odporúčaním pre ich použitie s ohľadom na súčasné a potencionálne budúce požiadavky použitia vyhľadávacieho systému členmi výskumnej skupiny, teda koncových používateľov daného fulltextového vyhľadávacieho systému.155 - Katedra aplikované informatikyvýborn

    A Big Data Lake for Multilevel Streaming Analytics

    Get PDF
    Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new data analytics and management architecture. This paper focuses on storing high volume, velocity and variety data in the raw formats in a data storage architecture called a data lake. First, we present our study on the limitations of traditional data warehouses in handling recent changes in data paradigms. We discuss and compare different open source and commercial platforms that can be used to develop a data lake. We then describe our end-to-end data lake design and implementation approach using the Hadoop Distributed File System (HDFS) on the Hadoop Data Platform (HDP). Finally, we present a real-world data lake development use case for data stream ingestion, staging, and multilevel streaming analytics which combines structured and unstructured data. This study can serve as a guide for individuals or organizations planning to implement a data lake solution for their use cases.Comment: 6 page

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    Domain Completeness of Model Transformations and Synchronisations

    Get PDF
    The intrinsic question of most activities in information science, in practice or science, is “Does a given system satisfy the requirements regarding its application?” Commonly, requirements are expressed and accessible by means of models, mostly in a diagrammatic representation by visual models. The requirements may change over time and are often defined from different perspectives and within different domains. This implies that models may be transformed either within the same domain-specific visual modelling language or into models in another language. Furthermore, model updates may be synchronised between different models. Most types of visual models can be represented by graphs where model transformations and synchronisations are performed by graph transformations. The theory of graph transformations emerged from its origins in the late 1960s and early 1970s as a generalisation of term and tree rewriting systems to an important field in (theoretical) computer science with applications particularly in visual modelling techniques, model transformations, synchronisations and behavioural specifications of models. Its formal foundations but likewise visual notation enable both precise definitions and proofs of important properties of model transformations and synchronisations from a theoretical point of view and an intuitive approach for specifying transformations and model updates from an engineer’s point of view. The recent results were presented in the EATCS monographs “Fundamentals of Algebraic Graph Transformation” (FAGT) in 2006 and its sequel “Graph and Model Transformation: General Framework and Applications” (GraMoT) in 2015. This thesis concentrates on one important property of model transformations and synchronisations, i.e., syntactical completeness. Syntactical completeness of model transformations means that given a specification for transforming models from a source modelling language into models in a target language, then all source models can be completely transformed into corresponding target models. In the same given context, syntactical completeness of model synchronisations means that all source model updates can be completely synchronised, resulting in corresponding target model updates. This work is essentially based on the GraMoT book and mainly extends its results for model transformations and synchronisations based on triple graph grammars by a new more general notion of syntactical completeness, namely domain completeness, together with corresponding verification techniques. Furthermore, the results are instantiated to the verification of the syntactical completeness of software transformations and synchronisations. The well-known transformation of UML class diagrams into relational database models and the transformation of programs of a small object-oriented programming language into class diagrams serve as running examples. The existing AGG tool is used to support the verification of the given examples in practice
    corecore