308,258 research outputs found

    Research data management education for future curators

    Get PDF
    Science has progressed by “standing on the shoulders of giants” and for centuries research and knowledge have been shared through the publication and dissemination of books, papers and scholarly communications. Moving forward, much of our understanding builds on (large scale) datasets, which have been collected or generated as part of the scientific process of discovery. How will this be made available for future generations? How will we ensure that, once collected or generated, others can stand on the shoulders of the data we produce?Educating students about the challenges and opportunities of data management is a key part of the solution and helps the researchers of the future to start to think about the problems early on in their careers. We have compiled a set of case studies to show the similarities and differences in data between disciplines, and produced a booklet for students containing the case studies and an introduction to the data lifecycle and other data management practices. This has already been used at the University of Southampton within the Faculty of Engineering and is now being adopted centrally for use in other faculties. In this paper, we will provide an overview of the case studies and the guide, and reflect on the reception the guide has had to date

    Defining Asymptotic Parallel Time Complexity of Data-dependent Algorithms

    Get PDF
    The scientific research community has reached a stage of maturity where its strong need for high-performance computing has diffused into also everyday life of engineering and industry algorithms. In efforts to satisfy this need, parallel computers provide an efficient and economical way to solve large-scale and/or time-constrained problems. As a consequence, the end-users of these systems have a vested interest in defining the asymptotic time complexity of parallel algorithms to predict their performance on a particular parallel computer. The asymptotic parallel time complexity of data-dependent algorithms depends on the number of processors, data size, and other parameters. Discovering the main other parameters is a challenging problem and the clue in obtaining a good estimate of performance order. Great examples of these types of applications are sorting algorithms, searching algorithms and solvers of the traveling salesman problem (TSP). This article encompasses all the knowledge discovery aspects to the problem of defining the asymptotic parallel time complexity of datadependent algorithms. The knowledge discovery methodology begins by designing a considerable number of experiments and measuring their execution times. Then, an interactive and iterative process explores data in search of patterns and/or relationships detecting some parameters that affect performance. Knowing the key parameters which characterise time complexity, it becomes possible to hypothesise to restart the process and to produce a subsequent improved time complexity model. Finally, the methodology predicts the performance order for new data sets on a particular parallel computer by replacing a numerical identification. As a case of study, a global pruning traveling salesman problem implementation (GP-TSP) has been chosen to analyze the influence of indeterminism in performance prediction of data-dependent parallel algorithms, and also to show the usefulness of the defined knowledge discovery methodology. The subsequent hypotheses generated to define the asymptotic parallel time complexity of the TSP were corroborated one by one. The experimental results confirm the expected capability of the proposed methodology; the predictions of performance time order were rather good comparing with real execution time (in the order of 85%)

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Topic-based analysis for technology intelligence

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Since the past several decades, scientific literature, patents and other semi-structured technology indicators have been generating and accumulating at a very rapid rate. Their growth provides a wealth of information regarding technology development in both the public and private domain. However, it has also caused increasingly severe information overload problems whereby researchers, analysts and decision makers are not able to read, summarize and understand massive technical documents and records manually. The concept and tools of technology intelligence aims to handle this issue. In the current technology intelligence research, one of the big challenges is that, the frameworks and applications of existing technology intelligence conducted semantic content analysis and temporal trend estimation separately, lacking a comprehensive perspective on trend analysis of the detailed content within an area. In addition, existing research of technology intelligence is mainly constructed on the fundamentals of semantic properties of the semi-structured technology indicators; however, single keywords and their ranking alone, are too general or ambiguous to represent complex concepts and their corresponding temporal patterns. Thirdly, systematic post-processing, forecasting and evaluation on both content analysis and trend identification outputs are still in great demand, for diverse and flexible technological decision support and opportunity discovery. This research aims to handle these three challenges in both theoretical and practical aspects. It first quantitatively defines and presents temporal characteristics and semantic properties of typical semi-structured technology indicators. Then this thesis proposes a framework of topic-based technology intelligence, with three main functionalities, including data-driven trend identification, topic discovery and comprehensive topic evaluation, to synthetically process and analyse technological publication count sequence, textual data and metadata of target technology indicators. To achieve the three functionalities, this research proposes an empirical technology trend analysis method to extract temporal trend turning points and trend segments, which help with producing a more reasonable time-based measure; a topic-based technological forecasting method to first discover and characterize the semantic knowledge underlying in massive textual data of technology indicators, meanwhile estimating the future trends of the discovered topics; a comprehensive topic evaluation method that links metadata and discovered topics, to provide integrated landscape and technological insight in depth. In order to demonstrate the proposed topic-based technology intelligence framework and all the related methods, this research presents case studies with both patents and scientific literature. Experimental results on Australian patents, United States patents and scientific papers from Web of Science database, showed that the proposed framework and methods are well-suited in dealing with semi-structured technology indicators analysis, and can provide valuable topic-based knowledge to facilitate further technological decision making or opportunity discovery with good performance
    • …
    corecore