14 research outputs found

    Data wars over data stores: challenges in medical data linkage

    No full text
    A primary concern of the medical e-research community is the availability of suitable data sets for their analysis requirements. The quantity and dubious quality of data present significant barriers to the application of many automated analysis technologies, including data mining, to the medical and health domain. Publicly available data is frequently poorly coded, incomplete, out-of-date or simply not applicable to the analysis or algorithm being applied. Work has been done to overcome these issues through the application of data linking processes but further complications have been encountered resulting in slow progress. The use of locally held medical data is difficult enough due to its structural complexity and non-standardised language, however linking data from disparate electronic sources adds the challenges of privacy, security, semantic compatibility, provenance, and governance, each with its own inherent issues. A focal requirement is a mechanism for the sharing of medical and health data across multiple sites which incorporates careful management of the semantics and limitations of the data sets whilst maintaining functional relevance for the end user. Our paper addresses this requirement by exploring recent conceptual modeling and data evaluation methodologies that facilitate effective data linking whilst ensuring the semantics of the data are maintained and the individual needs of the end user are met

    Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles

    Full text link
    In the age of the Cloud and so-called Big Data systems must be increasingly flexible, reconfigurable and adaptable to change in addition to being developed rapidly. As a consequence, designing systems to cater for evolution is becoming critical to their success. To be able to cope with change, systems must have the capability of reuse and the ability to adapt as and when necessary to changes in requirements. Allowing systems to be self-describing is one way to facilitate this. To address the issues of reuse in designing evolvable systems, this paper proposes a so-called description-driven approach to systems design. This approach enables new versions of data structures and processes to be created alongside the old, thereby providing a history of changes to the underlying data models and enabling the capture of provenance data. The efficacy of the description-driven approach is exemplified by the CRISTAL project. CRISTAL is based on description-driven design principles; it uses versions of stored descriptions to define various versions of data which can be stored in diverse forms. This paper discusses the need for capturing holistic system description when modelling large-scale distributed systems.Comment: 8 pages, 1 figure and 1 table. Accepted by the 9th Int Conf on the Evaluation of Novel Approaches to Software Engineering (ENASE'14). Lisbon, Portugal. April 201

    Towards active conceptual modelling for sudden events

    Get PDF
    There are a number of issues for information systems which are required to collect data urgently that are not well accommodated by current conceptual modelling methodologies and as a result the modelling step (and the use of databases) is often omitted. Such issues include the fact that ‱ the number of instances for each entity are relatively low resulting in data definition taking a disproportionate amount of effort, ‱ the storage of data and the retrieval of information must take priority over the full definition of a schema describing that data, ‱ they undergo regular structural change and are thus subject to information loss as a result of changes to the schema’s information capacity, ‱ finally, the structure of the information is likely to be only partially known or for which there are multiple, perhaps contradictory, competing hypotheses as to the underlying structure. This paper presents the Low Instance-to-Entity Ratio (LItER) Model, which attempts to circumvent some of the problems encountered by these types of application and to provide a platform and modelling technique to handle rapidly occurring phenomena. The two-part LItER modelling process possesses an overarching architecture which provides hypothesis, knowledge base and ontology support together with a common conceptual schema. This allows data to be stored immediately and for a more refined conceptual schema to be developed later. LItER modelling also aims to facilitate later translation to EER, ORM and UML models and the use of (a form of) SQL. Moreover, an additional benefit of the model is that it provides a partial solution to a number of outstanding issues in current conceptual modelling systems.Sydney, NS

    CRISTAL: A practical study in designing systems to cope with change

    Get PDF
    Software engineers frequently face the challenge of developing systems whose requirements are likely to change in order to adapt to organizational reconfigurations or other external pressures. Evolving requirements present difficulties, especially in environments in which business agility demands shorter development times and responsive prototyping. This paper uses a study from CERN in Geneva to address these research questions by employing a 'description-driven' approach that is responsive to changes in user requirements and that facilitates dynamic system reconfiguration. The study describes how handling descriptions of objects in practice alongside their instances (making the objects self-describing) can mediate the effects of evolving user requirements on system development. This paper reports on and draws lessons from the practical use of a description-driven system over time. It also identifies lessons that can be learned from adopting such a self-describing description-driven approach in future software development. © 2014 Elsevier Ltd

    A Unit Test Approach for Database Schema Evolution

    Get PDF
    Context: The constant changes in today’s business requirements demand continuous database revisions. Hence, database structures, not unlike software applications, deteriorate during their lifespan and thus require refactoring in order to achieve a longer life span. Although unit tests support changes to application programs and refactoring, there is currently a lack of testing strategies for database schema evolution. Objective: This work examines the challenges for database schema evolution and explores the possibility of using various testing strategies to assist with schema evolution. Specifically, the work proposes a novel unit test approach for the application code that accesses databases with the objective of proactively evaluating the code against the altered database. Method: The approach was validated through the implementation of a testing framework in conjunction with a sample application and a relatively simple database schema. Although the database schema in this study was simple, it was nevertheless able to demonstrate the advantages of the proposed approach. Results: After changes in the database schema, the proposed approach found all SELECT statements as well as the majority of other statements requiring modifications in the application code. Due to its efficiency with SELECT statements, the proposed approach is expected to be more successful with database warehouse applications where SELECT statements are dominant. Conclusion: The unit test approach that accesses databases has proven to be successful in evaluating the application code against the evolved database. In particular, the approach is simple and straightforward to implement, which makes it easily adoptable in practice

    Measuring the internationalisation of EU corporate R&D: a novel complementary use of statistical sources

    Get PDF
    The report summarises the main results of a research activity aimed at testing a novel approach for the measurement of EU business R&D internationalisation. Such approach is based on the complementary use of two different sources of data: on the one hand, statistical data from private R&D expenditure taken from national surveys (BERD); on the other hand, data collected from companies' annual reports and accounts (as in the EU Industrial R&D Investment Scoreboard). The main objectives of the study were: i) to explore the methodological rationale for comparing the two sets of data; ii) to test the robustness of the novel methodology through an analysis applied to four EU countries (Belgium, Finland, Germany and Italy); iii) to provide indications of possible further research and follow up activities. The main results from the project are as follows: - BERD and Scoreboard values, though addressing slightly different concepts, are comparable and can be used in a complementary way. - Data regarding top EU R&D performers (that is, companies included in Scoreboard rankings who are the active part of the R&D internationalisation process) have to be considered from the starting point of such complementary use, instead of as final data at the country level resulting from official statistics. - Using top R&D performersÂż global values and adding aggregate values from national R&D statistics allows novel insights on the R&D internationalisation process to be given, at least for the four EU countries involved. - Further research could rely on the forthcoming Euro-Group Register under development at EUROSTAT, to obtain a clear view of intra-EU cross-country R&D flows.JRC.DDG.J.3-Knowledge for Growt

    Are ICT, Human Capital and Organizational Capital Complementary in Production? Evidence from Italian Panel Data

    Get PDF
    Information and communication technologies (ICT) are considered to play a central role in determining productivity. The discussion on the impact of ICT on growth and productivity was stimulated by the famous sentence of Robert Solow (1987): “You can see the computer age everywhere but in the productivity statistics” (the so called Solow paradox or productivity paradox). This quote was actually expressing concern that, while investment in ICT during the eighties and early 90s was growing exponentially in the U.S. and quality-indexed prices for computer were rapidly (and exponentially) falling, productivity in the Service industry, in which about 80% of IT investment is made, was actually stagnating. Trying to provide a solution to the productivity paradox, some scholars (mainly Brynjolfsson and co-authors) have argued that ICT capital does not -per se- increase productivity. In fact, productivity increases when investments in a set of complementary assets are made. These assets are ICT capital, Organizational Capital and Human Capital. In this paper we explore the ICT-Organizational Innovation-Human Capital complementarities issue for the Manufacturing sector in Italy. We use data from the 7th, 8th and 9th waves of the “Indagine sulle Imprese Manifatturiere Italiane” by Unicredit (previously managed by Capitalia-Mediocredito Centrale), which contains information on ICT investments, organizational innovations, the skill composition of the work-force and on many other variables (measured at the firm level). From these three waves we create an unbalanced panel, made up by firms observed either in waves 7 and 8, in waves 8 and 9 or in waves 7, 8 and 9. After generating values for real product and real capital, we take the wave-to-wave variation in the log of productivity and regress it on a series of explanatory variables, including ICT investment, the presence of organizational innovations, the skill composition of the work force and their interactions. By taking first differences (wave-to wave differences) we are able to control for unobserved fixed effects which might be related to the endogenous variable (labor productivity) and to some explanatory variables. On these differenced data we run OLS and find no evidence of the complementary hypothesis between ICT investment and organizational innovations, which is per se an interesting results because for many other (European) countries there exists significant evidence of complementarity. This is perhaps due to 1) the focus on manufacturing firms and 2) the fact that most firms in our dataset are medium-small firms (i.e. organizational change is more complementary with ICT investment for large firms). Our data also signal that the skill composition of the work-force is a strong determinant of productivity (either alone and when interacted with other potentially complementary assets). Finally, ICT investment is a complement to human capital, given that more ICT positively interacts with a high fraction of educated workers to stimulate productivity growth.JRC.J.3-Information Societ

    Population Geometries of Europe: The Topologies of Data Cubes and Grids

    Get PDF
    The political integration of the European Union (EU) is fragile for many reasons not least the reassertion of nationalism. That said, if we examine specific practices and infrastructures a more complicated story emerges. We juxtapose the political fragility of the EU with the formation of data infrastructures in official statistics that take part in post-national enactments of Europe’s populations and territories. We develop this argument through an analysis of transformations in how European populations are enacted through new technological infrastructures that seek to integrate national census data in ‘cubes’ of cross-tabulated social topics and spatial ‘grids’ of maps. In doing so these infrastructures give meaning to what ‘is’ Europe in ways that are both old and new. Through standardisation and harmonisation of social and geographical spaces, ‘old’ geometries of organising and mapping populations are deployed along with ‘new’ topological arrangements that mix and fold categories of population. Furthermore, we consider how grids and cubes are generative of methodological topologies by closing the distances or differences between methods and making their data equivalent. By paying attention to these practices and infrastructures we examine how they enable reconfiguring what is known and imagined as ‘Europe’ and how it is governed
    corecore