14 research outputs found
Data wars over data stores: challenges in medical data linkage
A primary concern of the medical e-research community is the availability of suitable data sets for their analysis requirements. The quantity and dubious quality of data present significant barriers to the application of many automated analysis technologies, including data mining, to the medical and health domain. Publicly available data is frequently poorly coded, incomplete, out-of-date or simply not applicable to the analysis or algorithm being applied. Work has been done to overcome these issues through the application of data linking processes but further complications have been encountered resulting in slow progress.
The use of locally held medical data is difficult enough due to its structural complexity and non-standardised language, however linking data from disparate electronic sources adds the challenges of privacy, security, semantic compatibility, provenance, and governance, each with its own inherent issues. A focal requirement is a mechanism for the sharing of medical and health data across multiple sites which incorporates careful management of the semantics and limitations of the data sets whilst maintaining functional relevance for the end user. Our paper addresses this requirement by exploring recent conceptual modeling and data evaluation methodologies that facilitate effective data linking whilst ensuring the semantics of the data are maintained and the individual needs of the end user are met
Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles
In the age of the Cloud and so-called Big Data systems must be increasingly
flexible, reconfigurable and adaptable to change in addition to being developed
rapidly. As a consequence, designing systems to cater for evolution is becoming
critical to their success. To be able to cope with change, systems must have
the capability of reuse and the ability to adapt as and when necessary to
changes in requirements. Allowing systems to be self-describing is one way to
facilitate this. To address the issues of reuse in designing evolvable systems,
this paper proposes a so-called description-driven approach to systems design.
This approach enables new versions of data structures and processes to be
created alongside the old, thereby providing a history of changes to the
underlying data models and enabling the capture of provenance data. The
efficacy of the description-driven approach is exemplified by the CRISTAL
project. CRISTAL is based on description-driven design principles; it uses
versions of stored descriptions to define various versions of data which can be
stored in diverse forms. This paper discusses the need for capturing holistic
system description when modelling large-scale distributed systems.Comment: 8 pages, 1 figure and 1 table. Accepted by the 9th Int Conf on the
Evaluation of Novel Approaches to Software Engineering (ENASE'14). Lisbon,
Portugal. April 201
Towards active conceptual modelling for sudden events
There are a number of issues for information systems
which are required to collect data urgently that are
not well accommodated by current conceptual modelling
methodologies and as a result the modelling
step (and the use of databases) is often omitted. Such
issues include the fact that
âą the number of instances for each entity are relatively
low resulting in data definition taking a
disproportionate amount of effort,
âą the storage of data and the retrieval of information
must take priority over the full definition of
a schema describing that data,
âą they undergo regular structural change and are
thus subject to information loss as a result of
changes to the schemaâs information capacity,
âą finally, the structure of the information is likely
to be only partially known or for which there
are multiple, perhaps contradictory, competing
hypotheses as to the underlying structure.
This paper presents the Low Instance-to-Entity Ratio
(LItER) Model, which attempts to circumvent some
of the problems encountered by these types of application
and to provide a platform and modelling
technique to handle rapidly occurring phenomena.
The two-part LItER modelling process possesses an
overarching architecture which provides hypothesis,
knowledge base and ontology support together with
a common conceptual schema. This allows data to
be stored immediately and for a more refined conceptual
schema to be developed later. LItER modelling
also aims to facilitate later translation to EER, ORM
and UML models and the use of (a form of) SQL.
Moreover, an additional benefit of the model is that
it provides a partial solution to a number of outstanding
issues in current conceptual modelling systems.Sydney, NS
CRISTAL: A practical study in designing systems to cope with change
Software engineers frequently face the challenge of developing systems whose requirements are likely to change in order to adapt to organizational reconfigurations or other external pressures. Evolving requirements present difficulties, especially in environments in which business agility demands shorter development times and responsive prototyping. This paper uses a study from CERN in Geneva to address these research questions by employing a 'description-driven' approach that is responsive to changes in user requirements and that facilitates dynamic system reconfiguration. The study describes how handling descriptions of objects in practice alongside their instances (making the objects self-describing) can mediate the effects of evolving user requirements on system development. This paper reports on and draws lessons from the practical use of a description-driven system over time. It also identifies lessons that can be learned from adopting such a self-describing description-driven approach in future software development. © 2014 Elsevier Ltd
A Unit Test Approach for Database Schema Evolution
Context: The constant changes in todayâs business requirements demand continuous database revisions. Hence, database structures, not unlike software applications, deteriorate during their lifespan and thus require refactoring in order to achieve a longer life span. Although unit tests support changes to application programs and refactoring, there is currently a lack of testing strategies for database schema evolution.
Objective: This work examines the challenges for database schema evolution and explores the possibility of using various testing strategies to assist with schema evolution. Specifically, the work proposes a novel unit test approach for the application code that accesses databases with the objective of proactively evaluating the code against the altered database.
Method: The approach was validated through the implementation of a testing framework in conjunction with a sample application and a relatively simple database schema. Although the database schema in this study was simple, it was nevertheless able to demonstrate the advantages of the proposed approach.
Results: After changes in the database schema, the proposed approach found all SELECT statements as well as the majority of other statements requiring modifications in the application code. Due to its efficiency with SELECT statements, the proposed approach is expected to be more successful with database warehouse applications where SELECT statements are dominant.
Conclusion: The unit test approach that accesses databases has proven to be successful in evaluating the application code against the evolved database. In particular, the approach is simple and straightforward to implement, which makes it easily adoptable in practice
Measuring the internationalisation of EU corporate R&D: a novel complementary use of statistical sources
The report summarises the main results of a research activity aimed at testing a novel approach for the measurement of EU business R&D internationalisation. Such approach is based on the complementary use of two different sources of data: on the one hand, statistical data from private R&D expenditure taken from national surveys (BERD); on the other hand, data collected from companies' annual reports and accounts (as in the EU Industrial R&D Investment Scoreboard). The main objectives of the study were: i) to explore the methodological rationale for comparing the two sets of data; ii) to test the robustness of the novel methodology through an analysis applied to four EU countries (Belgium, Finland, Germany and Italy); iii) to provide indications of possible further research and follow up activities.
The main results from the project are as follows:
- BERD and Scoreboard values, though addressing slightly different concepts, are comparable and can be used in a complementary way.
- Data regarding top EU R&D performers (that is, companies included in Scoreboard rankings who are the active part of the R&D internationalisation process) have to be considered from the starting point of such complementary use, instead of as final data at the country level resulting from official statistics.
- Using top R&D performersÂż global values and adding aggregate values from national R&D statistics allows novel insights on the R&D internationalisation process to be given, at least for the four EU countries involved.
- Further research could rely on the forthcoming Euro-Group Register under development at EUROSTAT, to obtain a clear view of intra-EU cross-country R&D flows.JRC.DDG.J.3-Knowledge for Growt
Are ICT, Human Capital and Organizational Capital Complementary in Production? Evidence from Italian Panel Data
Information and communication technologies (ICT) are considered to play a central role in determining productivity. The discussion on the impact of ICT on growth and productivity was stimulated by the famous sentence of Robert Solow (1987): âYou can see the computer age everywhere but in the productivity statisticsâ (the so called Solow paradox or productivity paradox). This quote was actually expressing concern that, while investment in ICT during the eighties and early 90s was growing exponentially in the U.S. and quality-indexed prices for computer were rapidly (and exponentially) falling, productivity in the Service industry, in which about 80% of IT investment is made, was actually stagnating. Trying to provide a solution to the productivity paradox, some scholars (mainly Brynjolfsson and co-authors) have argued that ICT capital does not -per se- increase productivity. In fact, productivity increases when investments in a set of complementary assets are made. These assets are ICT capital, Organizational Capital and Human Capital.
In this paper we explore the ICT-Organizational Innovation-Human Capital complementarities issue for the Manufacturing sector in Italy. We use data from the 7th, 8th and 9th waves of the âIndagine sulle Imprese Manifatturiere Italianeâ by Unicredit (previously managed by Capitalia-Mediocredito Centrale), which contains information on ICT investments, organizational innovations, the skill composition of the work-force and on many other variables (measured at the firm level). From these three waves we create an unbalanced panel, made up by firms observed either in waves 7 and 8, in waves 8 and 9 or in waves 7, 8 and 9. After generating values for real product and real capital, we take the wave-to-wave variation in the log of productivity and regress it on a series of explanatory variables, including ICT investment, the presence of organizational innovations, the skill composition of the work force and their interactions. By taking first differences (wave-to wave differences) we are able to control for unobserved fixed effects which might be related to the endogenous variable (labor productivity) and to some explanatory variables.
On these differenced data we run OLS and find no evidence of the complementary hypothesis between ICT investment and organizational innovations, which is per se an interesting results because for many other (European) countries there exists significant evidence of complementarity. This is perhaps due to 1) the focus on manufacturing firms and 2) the fact that most firms in our dataset are medium-small firms (i.e. organizational change is more complementary with ICT investment for large firms). Our data also signal that the skill composition of the work-force is a strong determinant of productivity (either alone and when interacted with other potentially complementary assets). Finally, ICT investment is a complement to human capital, given that more ICT positively interacts with a high fraction of educated workers to stimulate productivity growth.JRC.J.3-Information Societ
Population Geometries of Europe: The Topologies of Data Cubes and Grids
The political integration of the European Union (EU) is fragile for many reasons not least the reassertion of nationalism. That said, if we examine specific practices and infrastructures a more complicated story emerges. We juxtapose the political fragility of the EU with the formation of data infrastructures in official statistics that take part in post-national enactments of Europeâs populations and territories. We develop this argument through an analysis of transformations in how European populations are enacted through new technological infrastructures that seek to integrate national census data in âcubesâ of cross-tabulated social topics and spatial âgridsâ of maps. In doing so these infrastructures give meaning to what âisâ Europe in ways that are both old and new. Through standardisation and harmonisation of social and geographical spaces, âoldâ geometries of organising and mapping populations are deployed along with ânewâ topological arrangements that mix and fold categories of population. Furthermore, we consider how grids and cubes are generative of methodological topologies by closing the distances or differences between methods and making their data equivalent. By paying attention to these practices and infrastructures we examine how they enable reconfiguring what is known and imagined as âEuropeâ and how it is governed