579 research outputs found

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results

    Predicting hospital admissions to reduce crowding in emergency department s of the the integral healthcare system for public use in Catalonia

    Get PDF
    Objective: This study analyzed data from Emergency Departments (EDs) from more than 60 different centers embedded in the Integral Healthcare System for Public Use in Catalonia (SISCAT) to predict hospital admissions based on information readily available at the moment of arrival to the ED. The predictive models might help reduce overcrowding at EDs and improve the service delivered to patients. Method: A retrospective analysis was conducted using data from the SISCAT collected during the year 2018. Gradient boosting machine was used to train and test the predictive models in R, splitting the data in a 70/30 partition. Variable importance for each of the models was analyzed. Receiver Operating Characteristic (ROC) curves were created, and the Area Under the Curve (AUC) was obtained from each of them as a measure of predictive performance. The first part of the study targeted the obtention of models with high accuracy and AUC, while the second part targeted the obtention of models with a sensitivity > 0.975 and analyzed the possible benefits that could come from the application of such models. Results: From the 3,189,204 ED visits included in the study, 11.02% ended in admission to the hospital. Gradient boosting machine proved to be a good method to predict for a binary outcome of either admission or discharge. The best performance for all the models was obtained at a 0.5 probability of admission threshold. The largest AUC was obtained for the complete dataset and yielded a result of 0.8938 with a 95% CI of 0.8929-0.8948. The best results for the sensitivity tests were obtained with the adults’ dataset, with a model that gave a 0.4344 specificity and 0.5033 accuracy for a 0.975 sensitivity level. Conclusion: This study reaffirms on the belief that gradient boosting machine is a powerful tool to use in binary outcome predictive models. It shows that data collected at the moment of arrival to the ED can be used to predict hospital admissions accurately, and that a model including data from a comprehensive hospital network has a better predictive performance when compared to a similar model developed with data from one unique health center only. It discusses the huge potential that the application of the models obtained could have in fighting crowding in EDs by allowing for an early start of the bed allocation process, making it possible to do all the required procedures for admission simultaneously to the patient being visited by the doctor, instead of doing it in a sequential manner after the visit, which unnecessarily crowds ED rooms and generates a nonoptimal use of the available resources in EDs. The study also suggests the application of this predictive technique to develop models with proven high sensitivity to digitalize the patient-hospital relationship to allow for a first contact between both parties before the visit to the ED, which can potentially regulate the inflow of patients in this department and reduce ED overcrowding significantl

    An Adaptive Technique for Crime Rate Prediction using Machine Learning Algorithms

    Get PDF
    Any country must give the investigation and preventive of crime top priority. There are a rising amount of cases that are still pending due to the rapid increase in criminal cases in India and elsewhere. It is proving difficult to classify and address the rising number of criminal cases. Understanding a place's trends in criminal activity is essential to preventing it from occurring. Crime-solving organisations will be more effective if they have a clear awareness of the patterns of criminal behavior that are present in a particular area. Women's safety and protection are of highest importance despite the serious and persistent problem of crime against them. This study offers predictions about the kinds of crimes that might occur in a particular location using ensemble methods. This facilitates the categorization of criminal proceedings and subsequent action in a timely manner. We are applying machine learning methods like KNN, Linear regression, SVM, Lasso, Decision tree and Random forest in order to assess the highest accuracy

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    Data mining Twitter for cancer, diabetes, and asthma insights

    Get PDF
    Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data

    Neuro-critical multimodal Edge-AI monitoring algorithm and IoT system design and development

    Get PDF
    In recent years, with the continuous development of neurocritical medicine, the success rate of treatment of patients with traumatic brain injury (TBI) has continued to increase, and the prognosis has also improved. TBI patients' condition is usually very complicated, and after treatment, patients often need a more extended time to recover. The degree of recovery is also related to prognosis. However, as a young discipline, neurocritical medicine still has many shortcomings. Especially in most hospitals, the condition of Neuro-intensive Care Unit (NICU) is uneven, the equipment has limited functionality, and there is no unified data specification. Most of the instruments are cumbersome and expensive, and patients often need to pay high medical expenses. Recent years have seen a rapid development of big data and artificial intelligence (AI) technology, which are advancing the medical IoT field. However, further development and a wider range of applications of these technologies are needed to achieve widespread adoption. Based on the above premises, the main contributions of this thesis are the following. First, the design and development of a multi-modal brain monitoring system including 8-channel electroencephalography (EEG) signals, dual-channel NIRS signals, and intracranial pressure (ICP) signals acquisition. Furthermore, an integrated display platform for multi-modal physiological data to display and analysis signals in real-time was designed. This thesis also introduces the use of the Qt signal and slot event processing mechanism and multi-threaded to improve the real-time performance of data processing to a higher level. In addition, multi-modal electrophysiological data storage and processing was realized on cloud server. The system also includes a custom built Django cloud server which realizes real-time transmission between server and WeChat applet. Based on WebSocket protocol, the data transmission delay is less than 10ms. The analysis platform can be equipped with deep learning models to realize the monitoring of patients with epileptic seizures and assess the level of consciousness of Disorders of Consciousness (DOC) patients. This thesis combines the standard open-source data set CHB-MIT, a clinical data set provided by Huashan Hospital, and additional data collected by the system described in this thesis. These data sets are merged to build a deep learning network model and develop related applications for automatic disease diagnosis for smart medical IoT systems. It mainly includes the use of the clinical data to analyze the characteristics of the EEG signal of DOC patients and building a CNN model to evaluate the patient's level of consciousness automatically. Also, epilepsy is a common disease in neuro-intensive care. In this regard, this thesis also analyzes the differences of various deep learning model between the CHB-MIT data set and clinical data set for epilepsy monitoring, in order to select the most appropriate model for the system being designed and developed. Finally, this thesis also verifies the AI-assisted analysis model.. The results show that the accuracy of the CNN network model based on the evaluation of consciousness disorder on the clinical data set reaches 82%. The CNN+STFT network model based on epilepsy monitoring reaches 90% of the accuracy rate in clinical data. Also, the multi-modal brain monitoring system built is fully verified. The EEG signal collected by this system has a high signal-to-noise ratio, strong anti-interference ability, and is very stable. The built brain monitoring system performs well in real-time and stability. Keywords: TBI, Neurocritical care, Multi-modal, Consciousness Assessment, seizures detection, deep learning, CNN, IoT

    Predicting hospital admissions to reduce crowding in the emergency departments

    Get PDF
    Having an increasing number of patients in the emergency department constitutes an obstacle to the admissions process and hinders the emergency department (ED)’s ability to deal with the continuously arriving demand for new admissions. In addition, forecasting is an important aid in many areas of hospital management, including elective surgery scheduling, bed management, and staff resourcing. Therefore, this paper aims to develop a precise prediction model for admissions in the Integral Healthcare System for Public Use in Catalonia. These models assist in reducing overcrowding in emergency rooms and improve the quality of care offered to patients. Data from 60 EDs were analyzed to determine the likelihood of hospital admission based on information readily available at the time of arrival in the ED. The first part of the study targeted the obtention of models with high accuracy and area under the curve (AUC), while the second part targeted the obtention of models with a sensitivity higher than 0.975 and analyzed the possible benefits that could come from the application of such models. From the 3,189,204 ED visits included in the study, 11.02% ended in admission to the hospital. The gradient boosting machine method was used to predict a binary outcome of either admission or discharge.This research was funded by Ministerio de Ciencia e Innovación Torres Quevedo grant number PTQ2021-012147.Peer ReviewedObjectius de Desenvolupament Sostenible::3 - Salut i BenestarObjectius de Desenvolupament Sostenible::3 - Salut i Benestar::3.8 - Assolir la cobertura sanitària universal, en particular la protecció contra els riscos financers, l’accés a serveis de salut essencials de qualitat i l’accés a medicaments i vacunes segurs, eficaços, assequibles i de qualitat per a totes les personesPostprint (published version

    Aufsätze zur historischen politischen Ökonomie

    Get PDF
    This dissertation centralizes the concept that the significance of "empires" persists long after their collapse, exerting influence both overtly and subtly, a phenomenon referred to as persistent post-imperial syndrome. This syndrome fosters insular ideologies, xenophobia, and a yearning for past grandeur. However, delving into the origins of these ideologies requires an exploration of historical context and factors that led to the actual decline of empires. Consequently, my research centers on the region of Eastern Europe and Russia, which witnessed the downfall of two empires—the Russian Empire and the Soviet Union. This region also stands out due to its involvement in significant social experiments with far-reaching effects on the populace. These experiments encompass the eradication of serfdom, partial liberalization efforts, the ascent of the Bolsheviks during the 1917 Revolution, the enforced industrialization that propelled the Soviet Union into global superpower status, albeit at a tremendous human cost, and the dramatic disintegration of the Soviet Empire (Zhuravskaya et al. 2021, p. 1). This dissertation is structured into three distinct chapters, consisting of two empirical sections and one theoretical portion. The initial two empirical chapters scrutinize the political economy's impact on the labor market within the domains of the Russian Empire and the Soviet Union. However, precise demarcations are challenging to establish due to fluid territorial boundaries. The theoretical chapter furnishes a more intricate grasp of the mechanisms facilitating transmission and persistence. This is achieved through a comprehensive theoretical exposition that centers on the shift in regimes from Nazi Germany to the German Democratic Republic—a state closely aligned with the Soviet Union. In essence, this research endeavors to assess the efficacy of state and strategic decision-making mechanisms in exerting control over specific populations via methods such as forced deportations, state surveillance, and targeted indoctrination. The ultimate objective is to furnish a holistic comprehension of the enduring consequences of empires and the contributing factors to their decline, employing Eastern Europe and Russia as illustrative examples. Throughout my analysis, the figure of Joseph Vissarionovich Dzhugashvili, known as Stalin, recurs consistently, assuming a pivotal role in each chapter. He emerges as a left-wing extremist and possible informant within the archives of the tsarist secret police, a dictator for whom ethnically motivated violence constituted a rehearsed aspect of governance, and as the mastermind behind the division between East and West Germany

    Modelling blue-light ambulance mobility in the London metropolitan area

    Get PDF
    Actions taken immediately following a life-threatening incident are critical for the survival of the patient. In particular, the timely arrival of ambulance crew often makes the difference between life and death. As a consequence, ambulance services are under persistent pressure to achieve rapid emergency response. Meeting stringent performance requirements poses special challenges in metropolitan areas where the higher population density results in high rates of life-threatening incident occurrence, compounded by lower response speeds due to traffic congestion. A key ingredient of data-driven approaches to address these challenges is the effective modelling of ambulance movement thus enabling the accurate prediction of the expected arrival time of a crew at the site of an incident. Ambulance mobility patterns however are distinct and in particular differ from civilian traffic: crews travelling with ashing blue lights and sirens are by law exempt from certain traffic regulations; and moreover, ambulance journeys are triggered by emergency incidents that are generated following distinct spatial and temporal patterns. We use a large historical dataset of incidents and ambulance location traces to model route selection and arrival times. Working on a road routing network modified to reflect the differences between emergency and regular vehicle traffic, we develop a methodology for matching ambulances Global Positioning System (GPS) coordinates to road segments, allowing the reconstruction of ambulance routes with precise speed data. We demonstrate how a road speed model that exploits this information achieves best predictive performance by implicitly capturing route-specific patterns in changing traffic conditions. We then present a hybrid model that achieves a high route similarity score while minimising journey duration error. This hybrid model outperforms alternative mobility models. To the best of our knowledge, this study represents the first attempt to apply data-driven methodologies to route selection and estimation of arrival times of ambulances travelling with blue lights and sirens
    • …
    corecore