Clinical data was captured and stored data using natural language (NL) in order to
describe the human organs, their attributes and behaviour (Olsen et, 1998). Although this was an
accurate form of data representation it created information overload, space complexity, inconsistency
and erroneous data. To address the issue of data inconsistency and standardisation, clinical coding
such as UMLS was used while for clinical interoperability and data exchange between users, NL7 was
introduced. A survey conducted by (de Keizer et, 2000a) revealed that these methods are inadequate
for clinical data representation hence the data rerepresentation
technique (Haimowitz et, 1988) was
introduced and used for modelling CIS with Entity Relationship Diagram (ERD) and (FOL)(de Keizer
et, 2000b). However this model does not address the issue of information overload and space
complexity. Hence, this paper presents an alternative approach where UML is used to capture human
organs, their attributes and relationships. A new framework with built in algorithm converts the
multiple attributes modelled in the class diagram into mathematical formalisation using the CMAUT.
The logical expression serves as input to the optimisation algorithm to determine the optimal amount
of data that must be retrieved for primary healthcare investigation. To evaluate the framework,
mathematical operations were performed which revealed that the space complexity when using the
CMA rerepresentation
technique is θ ( n + 1) compared to θ (2n) for nonCMA.
This means less
space is needed when the CMA with AND connector is used but for substitutable organs with OR
connector the space complexity for both CMA and nonCMA
representations have the same
exponential expansion of θ (2 n ). A ttest
conducted on the amount of data required for investigation
before and after optimisation gave a pvalue
of 0.000 which means there is a significant different
between the two data sets. For epidemiological analysis the output of the framework was
benchmarked against the output of a web based heart risk calculator and the single sample ttest
conducted gave a pvalue
of 0.686 meaning there is no difference between two outputs. Thus this
framework with data rerepresentation
occupies less space as compared to others and can be used to
calculate the risk factor of a heart patient