6,982 research outputs found

    Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

    Full text link
    The final publication is available at Springer via http://dx.doi.org/DOI 10.1007/s10618-014-0378-6. Published online.Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen–Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset.The work by C Saez has been supported by an Erasmus Lifelong Learning Programme 2013 Grant. This work has been supported by own IBIME funds. The authors thank Dr. Gregor Stiglic, from the Univeristy of Maribor, Slovenia, for his support on the NHDS data.Sáez Silvestre, C.; Pereira Rodrigues, P.; Gama, J.; Robles Viejo, M.; García Gómez, JM. (2014). Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery. 28:1-1. doi:10.1007/s10618-014-0378-6S1128Aggarwal C (2003) A framework for diagnosing changes in evolving data streams. In Proceedings of the International Conference on Management of Data ACM SIGMOD, pp 575–586Amari SI, Nagaoka H (2007) Methods of information geometry. American Mathematical Society, Providence, RIArias E (2014) United states life tables, 2009. Natl Vital Statist Rep 62(7): 1–63Aspden P, Corrigan JM, Wolcott J, Erickson SM (2004) Patient safety: achieving a new standard for care. Committee on data standards for patient safety. The National Academies Press, Washington, DCBasseville M, Nikiforov IV (1993) Detection of abrupt changes: theory and application. Prentice-Hall Inc, Upper Saddle River, NJBorg I, Groenen PJF (2010) Modern multidimensional scaling: theory and applications. Springer, BerlinBowman AW, Azzalini A (1997) Applied smoothing techniques for data analysis: the Kernel approach with S-plus illustrations (Oxford statistical science series). Oxford University Press, OxfordBrandes U, Pich C (2007) Eigensolver methods for progressive multidimensional scaling of large data. In: Kaufmann M, Wagner D (eds) Graph drawing. Lecture notes in computer science, vol 4372. Springer, Berlin, pp 42–53Brockwell P, Davis R (2009) Time series: theory and methods., Springer series in statisticsSpringer, BerlinCesario SK (2002) The “Christmas Effect” and other biometeorologic influences on childbearing and the health of women. J Obstet Gynecol Neonatal Nurs 31(5):526–535Chakrabarti K, Garofalakis M, Rastogi R, Shim K (2001) Approximate query processing using wavelets. VLDB J 10(2–3):199–223Cruz-Correia RJ, Pereira Rodrigues P, Freitas A, Canario Almeida F, Chen R, Costa-Pereira A (2010) Data quality and integration issues in electronic health records. Information discovery on electronic health records, pp 55–96Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Studia Sci Math Hungar 2:299–318Dasu T, Krishnan S, Lin D, Venkatasubramanian S, Yi K (2009) Change (detection) you can believe. In: Finding distributional shifts in data streams. In: Proceedings of the 8th international symposium on intelligent data analysis: advances in intelligent data analysis VIII, IDA ’09. Springer, Berlin, pp 21–34Endres D, Schindelin J (2003) A new metric for probability distributions. IEEE Trans Inform Theory 49(7):1858–1860Gama J, Gaber MM (2007) Learning from data streams: processing techniques in sensor networks. Springer, BerlinGama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Bazzan A, Labidi S (eds) Advances in artificial intelligence—SBIA 2004., Lecture notes in computer scienceSpringer, Berlin, pp 286–295Gama J (2010) Knowledge discovery from data streams, 1st edn. Chapman & Hall, LondonGehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. SIGMOD Rec 30(2):13–24Guha S, Shim K, Woo J (2004) Rehist: relative error histogram construction algorithms. In: Proceedings of the thirtieth international conference on very large data bases VLDB, pp 300–311Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques. Morgan Kaufmann, Elsevier, Burlington, MAHowden LM, Meyer JA, (2011) Age and sex composition. 2010 Census Briefs US Department of Commerce. Economics and Statistics Administration, US Census BureauHrovat G, Stiglic G, Kokol P, Ojstersek M (2014) Contrasting temporal trend discovery for large healthcare databases. Comput Methods Program Biomed 113(1):251–257Keim DA (2000) Designing pixel-oriented visualization techniques: theory and applications. IEEE Trans Vis Comput Graph 6(1):59–78Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on Very large data bases, VLDB Endowment, VLDB ’04, vol 30, pp 180–191Klinkenberg R, Renz I (1998) Adaptive information filtering: Learning in the presence of concept drifts. In: Workshop notes of the ICML/AAAI-98 workshop learning for text categorization. AAAI Press, Menlo Park, pp 33–40Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biolog Cybern 43(1):59–69Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inform Theory 37:145–151Mitchell TM, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of page-hinckley, an approach for fault detection in an agro-alimentary production system. In: Proceedings of the 5th Asian Control Conference, vol 2, pp 815–818National Research Council (2011) Explaining different levels of longevity in high-income countries. The National Academies Press, Washington, DCNHDS (2010) United states department of health and human services. Centers for disease control and prevention. National center for health statistics. National hospital discharge survey codebookNHDS (2014) National Center for Health Statistics, National Hospital Discharge Survey (NHDS) data, US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics, Hyattsville, Maryland. http://www.cdc.gov/nchs/nhds.htmPapadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st international conference on very large data bases, VLDB endowment, VLDB ’05, pp 697–708Parzen E (1962) On estimation of a probability density function and mode. Ann Math Statist 33(3):1065–1076Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New YorkRodrigues P, Correia R (2013) Streaming virtual patient records. In: Krempl G, Zliobaite I, Wang Y, Forman G (eds) Real-world challenges for data stream mining. University Magdeburg, Otto-von-Guericke, pp 34–37Rodrigues P, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627Rodrigues PP, Gama Ja (2010) A simple dense pixel visualization for mobile sensor data mining. In: Proceedings of the second international conference on knowledge discovery from sensor data, sensor-KDD’08. Springer, Berlin, pp 175–189Rodrigues PP, Gama J, Sebastiã o R (2010) Memoryless fading windows in ubiquitous settings. In Proceedings of ubiquitous data mining (UDM) workshop in conjunction with the 19th european conference on artificial intelligence—ECAI 2010, pp 27–32Rodrigues PP, Sebastiã o R, Santos CC (2011) Improving cardiotocography monitoring: a memory-less stream learning approach. In: Proceedings of the learning from medical data streams workshop. Bled, SloveniaRubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vision 40(2):99–121Sebastião R, Gama J (2009) A study on change detection methods. In: 4th Portuguese conference on artificial intelligenceSebastião R, Gama J, Rodrigues P, Bernardes J (2010) Monitoring incremental histogram distribution for change detection in data streams. In: Gaber M, Vatsavai R, Omitaomu O, Gama J, Chawla N, Ganguly A (eds) Knowledge discovery from sensor data, vol 5840., Lecture notes in computer science. Springer, Berlin, pp 25–42Sebastião R, Silva M, Rabiço R, Gama J, Mendonça T (2013) Real-time algorithm for changes detection in depth of anesthesia signals. Evol Syst 4(1):3–12Sáez C, Martínez-Miranda J, Robles M, García-Gómez JM (2012) O rganizing data quality assessment of shifting biomedical data. Stud Health Technol Inform 180:721–725Sáez C, Robles M, García-Gómez JM (2013) Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data. In: Engineering in medicine and biology society (EMBC), 2013 35th annual international conference of the IEEE, pp 3226–3229Sáez C, Robles M, García-Gómez JM (2014) Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statist Method Med Res (forthcoming)Shewhart WA, Deming WE (1939) Statistical method from the viewpoint of quality control. Graduate School of the Department of Agriculture, Washington, DCShimazaki H, Shinomoto S (2010) Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29(1–2):171–182Solberg LI, Engebretson KI, Sperl-Hillen JM, Hroscikoski MC, O’Connor PJ (2006) Are claims data accurate enough to identify patients for performance measures or quality improvement? the case of diabetes, heart disease, and depression. Am J Med Qual 21(4):238–245Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACm SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACm, New York, NY, pp 706–711Stiglic G, Kokol P (2011) Interpretability of sudden concept drift in medical informatics domain. In Proceedings of the 2010 IEEE international conference on data mining workshops, pp 609–613Torgerson W (1952) Multidimensional scaling: I theory and method. Psychometrika 17(4):401–419Wang RY, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manage Inform Syst 12(4):5–33Weiskopf NG, Weng C (2013) M ethods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 20(1):144–151Wellings K, Macdowall W, Catchpole M, Goodrich J (1999) Seasonal variations in sexual activity and their implications for sexual health promotion. J R Soc Med 92(2):60–64Westgard JO, Barry PL (2010) Basic QC practices: training in statistical quality control for medical laboratories. Westgard Quality Corporation, Madison, WIWidmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–10

    Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications

    Full text link
    [EN] Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. Methods: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. Main findings: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the DiagnosisRelated Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. Conclusions: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.The authors would like to thank the Central Authority for Health Services, I.P. (ACSS) for providing access to the data. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financed by FEDER-Fundo Europeu de Desenvolvimento Regional funds through the COMPETE 2020-Operacional Programme for Competitiveness and Internationalisation (POCI) and by Portuguese funds through FCT- Fundacao para a Ciencia e a Tecnologia in the framework of the project POCI-01-0145-FEDER-030766 ("1st.IndiQare-Quality indicators in primary health care: validation and implementation of quality indicators as an assessment and comparison tool") . In addition, we would like to thank to projects GEMA (SBPLY/17/180501/000293) -Generation and Evaluation of Models for Data Quality, and ADAGIO (SBPLY/21/180501/000061) - Alarcos Data Governance framework and systems generation, both funded by the Department of Education, Culture and Sports of the JCCM and FEDER; and to AETHER-UCLM: A smart data holistic approach for context -aware data analytics focused on Quality and Security project (Ministerio de Ciencia e Innovacion, PID2020- 112540RB-C42) . CSS thanks the Universitat Politecnica de Valencia contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE- Santander Bank grant "Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19) ."Souza, J.; Caballero, I.; Vasco Santos, J.; Lobo, M.; Pinto, A.; Viana, J.; Sáez Silvestre, C.... (2022). Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications. Journal of Biomedical Informatics. 136:1-11. https://doi.org/10.1016/j.jbi.2022.10424211113

    Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories

    Full text link
    OBJECTIVE: To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ). MATERIALS AND METHODS: Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data. RESULTS: The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. DISCUSSION: Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed. CONCLUSION: Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.This work was supported by the Spanish Ministry of Economy and Competitiveness grant numbers RTC-2014-1530-1 and TIN-2013-43457-R, and by the Universitat Politecnica de Valencia grant number SP20141432.Sáez Silvestre, C.; Zurriaga, O.; Pérez-Panadés, J.; Melchor, I.; Robles Viejo, M.; García Gómez, JM. (2016). Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories. Journal of the American Medical Informatics Association. 23(6):1085-1095. https://doi.org/10.1093/jamia/ocw010S1085109523

    Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds

    Full text link
    [EN] Aim: The increasing availability of Big Biomedical Data is leading to large research data samples collected over long periods of time. We propose the analysis of the kinematics of data probability distributions over time towards the characterization of data temporal variability. Methods: First, we propose a kinematic model based on the estimation of a continuous data temporal trajectory, using Functional Data Analysis over the embedding of a non-parametric statistical manifold which points represent data temporal batches, the Information Geometric Temporal (IGT) plot. This model allows measuring the velocity and acceleration of data changes. Next, we propose a coordinate-free method to characterize the oriented seasonality of data based on the parallelism of lagged velocity vectors of the data trajectory throughout the IGT space, the Auto-Parallelism of Velocity Vectors (APVV) and APVVmap. Finally, we automatically explain the maximum variance components of the IGT space coordinates by means of correlating data points with known temporal factors from the domain application. Materials: Methods are evaluated on the US National Hospital Discharge Survey open dataset, consisting of 3,25M hospital discharges between 2000 and 2010. Results: Seasonal and abrupt behaviours were present on the estimated multivariate and univariate data trajectories. The kinematic analysis revealed seasonal effects and punctual increments in data celerity, the latter mainly related to abrupt changes in coding. The APVV and APVVmap revealed oriented seasonal changes on data trajectories. For most variables, their distributions tended to change to the same direction at a 12-month period, with a peak of change of directionality at mid and end of the year. Diagnosis and Procedure codes also included a 9-month periodic component. Kinematics and APVV methods were able to detect seasonal effects on extreme temporal subgrouped data, such as in Procedure code, where Fourier and autocorrelation methods were not able to. The automated explanation of IGT space coordinates was consistent with the results provided by the kinematic and seasonal analysis. Coordinates received different meanings according to the trajectory trend, seasonality and abrupt changes. Discussion: Treating data as a particle moving over time through a multidimensional probabilistic space and studying the kinematics of its trajectory has turned out to a new temporal variability methodology. Its results on the NHDS were aligned with the dataset and population descriptions found in the literature, contributing with a novel temporal variability characterization. We have demonstrated that the APVV and APVVmat are an appropriate tool for the coordinate-free and oriented analysis of trajectories or complex multivariate signals. Conclusion: The proposed methods comprise an exploratory methodology for the characterization of data temporal variability, what may be useful for a reliable reuse of Big Biomedical Data repositories acquired over long periods of time.This work was supported by UPV grant No. PAID-00-17, and projects DPI2016-80054-R and H2020-SC1-2016-CNECT No. 727560.Sáez, C.; Garcia-Gomez, JM. (2018). Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. International Journal of Medical Informatics. 119:109-124. https://doi.org/10.1016/j.ijmedinf.2018.09.015S10912411

    Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years

    Full text link
    [EN] Objective To evaluate the effects of Process-Reengineering interventions on the Electronic Health Records (EHR) of a hospital over 7 years. Materials and methods Temporal Variability Assessment (TVA) based on probabilistic data quality assessment was applied to the historic monthly-batched admission data of Hospital La Fe Valencia, Spain from 2010 to 2016. Routine healthcare data with a complete EHR was expanded by processed variables such as the Charlson Comorbidity Index. Results Four Process-Reengineering interventions were detected by quantifiable effects on the EHR: (1) the hospital relocation in 2011 involved progressive reduction of admissions during the next four months, (2) the hospital services re-configuration incremented the number of inter-services transfers, (3) the care-services re-distribution led to transfers between facilities (4) the assignment to the hospital of a new area with 80,000 patients in 2015 inspired the discharge to home for follow up and the update of the pre-surgery planned admissions protocol that produced a significant decrease of the patient length of stay. Discussion TVA provides an indicator of the effect of process re-engineering interventions on healthcare practice. Evaluating the effect of facilities¿ relocation and increment of citizens (findings 1, 3¿4), the impact of strategies (findings 2¿3), and gradual changes in protocols (finding 4) may help on the hospital management by optimizing interventions based on their effect on EHRs or on data reuse. Conclusions The effects on hospitals EHR due to process re-engineering interventions can be evaluated using the TVA methodology. Being aware of conditioned variations in EHR is of the utmost importance for the reliable reuse of routine hospitalization data.F.J.P.B, C.S., J.M.G.G. and J.A.C. were funded Universitat Politecnica de Valencia, project "ANALISIS DE LA CALIDAD Y VARIABILIDAD DE DATOS MEDICOS". www.upv.es. J.M.G.G.is also partially supported by: Ministerio de Economia y Competitividad of Spain through MTS4up project (National Plan for Scientific and Technical Research and Innovation 2013-2016, No. DPI2016-80054-R); and European Commission projects H2020-SC1-2016-CNECT Project (No. 727560) and H2020-SC1-BHC-2018-2020 (No. 825750). The funders did not play any role in the study design, data collection and analysis, decision to publish, nor preparation of the manuscript.Perez-Benito, FJ.; Sáez Silvestre, C.; Conejero, JA.; Tortajada, S.; Valdivieso, B.; Garcia-Gomez, JM. (2019). Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years. PLoS ONE. 14(8):1-19. https://doi.org/10.1371/journal.pone.0220369S119148Aguilar-Savén, R. S. (2004). Business process modelling: Review and framework. International Journal of Production Economics, 90(2), 129-149. doi:10.1016/s0925-5273(03)00102-6Poulymenopoulou, M. (2003). Journal of Medical Systems, 27(4), 325-335. doi:10.1023/a:1023701219563Dadam P, Reichert M, Kuhn K. Clinical Workflows -The Killer Application for Process-oriented Information Systems? Proceedings of the 4th International Conference on Business Information Systems. London: Springer London; 2000. pp. 36–59. doi: https://doi.org/10.1007/978-1-4471-0761-3Lenz, R., & Reichert, M. (2007). IT support for healthcare processes – premises, challenges, perspectives. Data & Knowledge Engineering, 61(1), 39-58. doi:10.1016/j.datak.2006.04.007Rebuge, Á., & Ferreira, D. R. (2012). Business process analysis in healthcare environments: A methodology based on process mining. Information Systems, 37(2), 99-116. doi:10.1016/j.is.2011.01.003Amour EAEH, Ghannouchi SA. Applying Data Mining Techniques to Discover KPIs Relationships in Business Process Context. 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). IEEE; 2017. pp. 230–237. doi: https://doi.org/10.1109/PDCAT.2017.00045Chou, Y.-C., Chen, B.-Y., Tang, Y.-Y., Qiu, Z.-J., Wu, M.-F., Wang, S.-C., … Chuang, W.-C. (2010). Prescription-Filling Process Reengineering of an Outpatient Pharmacy. Journal of Medical Systems, 36(2), 893-902. doi:10.1007/s10916-010-9553-5Leu, J.-D., & Huang, Y.-T. (2009). An Application of Business Process Method to the Clinical Efficiency of Hospital. Journal of Medical Systems, 35(3), 409-421. doi:10.1007/s10916-009-9376-4Gand K. Investigating on Requirements for Business Model Representations: The Case of Information Technology in Healthcare. 2017 IEEE 19th Conference on Business Informatics (CBI). IEEE; 2017. pp. 471–480. doi: https://doi.org/10.1109/CBI.2017.36Ferreira, G. S. A., Silva, U. R., Costa, A. L., & Pádua, S. I. D. de D. (2018). The promotion of BPM and lean in the health sector: main results. Business Process Management Journal, 24(2), 400-424. doi:10.1108/bpmj-06-2016-0115Abdulrahman Jabour RM. Cancer Reporting: Timeliness Analysis and Process. 2016; Available: https://search.proquest.com/openview/4ecf737c5ef6d2d503e948df8031fe54/1?pq-origsite=gscholar&cbl=18750&diss=yHewitt M, Simone J V. Enhancing Data Systems to Improve the Quality of Cancer Care [Internet]. National Academy Press; 2000. Available: http://www.nap.edu/catalog/9970.htmlWeiskopf, N. G., & Weng, C. (2013). Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1), 144-151. doi:10.1136/amiajnl-2011-000681Saez C, Robles M, Garcia-Gomez JM. Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. IEEE; 2013. pp. 3226–3229. doi: https://doi.org/10.1109/EMBC.2013.6610228Sáez, C., Rodrigues, P. P., Gama, J., Robles, M., & García-Gómez, J. M. (2014). Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery, 29(4), 950-975. doi:10.1007/s10618-014-0378-6Sáez, C., Zurriaga, O., Pérez-Panadés, J., Melchor, I., Robles, M., & García-Gómez, J. M. (2016). Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. Journal of the American Medical Informatics Association, 23(6), 1085-1095. doi:10.1093/jamia/ocw010International Ethical Guidelines for Epidemiological Studies [Internet]. Geneva: Council for International Organizations of Medical Sciences (CIOMS) in collaboration with the World Health Organization; 2009. Available: https://cioms.ch/wp-content/uploads/2017/01/International_Ethical_Guidelines_LR.pdfResearch Ethics Committee of the Universitari i Politècnic La Fe Hospital [Internet]. Available: https://www.iislafe.es/en/research/ethics-committees/Charlson, M. E., Pompei, P., Ales, K. L., & MacKenzie, C. R. (1987). A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. Journal of Chronic Diseases, 40(5), 373-383. doi:10.1016/0021-9681(87)90171-8Schneeweiss, S., Wang, P. S., Avorn, J., & Glynn, R. J. (2003). Improved Comorbidity Adjustment for Predicting Mortality in Medicare Populations. Health Services Research, 38(4), 1103-1120. doi:10.1111/1475-6773.00165Quan, H., Sundararajan, V., Halfon, P., Fong, A., Burnand, B., Luthi, J.-C., … Ghali, W. A. (2005). Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data. Medical Care, 43(11), 1130-1139. doi:10.1097/01.mlr.0000182534.19832.83Sáez Silvestre C. Probabilistic methods for multi-source and temporal biomedical data quality assessment [Internet]. Thesis. Universitat Politècnica de València. 2016. doi: https://doi.org/10.4995/Thesis/10251/62188Amari S, Nagaoka H. Methods of Information Geometry [Internet]. Amer. Math. Soc. and Oxford Univ. Press. American Mathematical Society; 2000. Available: https://books.google.es/books?hl=es&lr=&id=vc2FWSo7wLUC&oi=fnd&pg=PR7&dq=Methods+of+Information+geometry&ots=4HmyCCY4PX&sig=2-dpCuwMQvEC1iREjxdfIX0yEls#v=onepage&q=MethodsofInformationgeometry&f=falseCsiszár, I., & Shields, P. C. (2004). Information Theory and Statistics: A Tutorial. Foundations and Trends™ in Communications and Information Theory, 1(4), 417-528. doi:10.1561/0100000004Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145-151. doi:10.1109/18.61115M.Cover T. Elements Of Information Theory Notes [Internet]. 2006. Available: http://books.google.fr/books?id=VWq5GG6ycxMC&printsec=frontcover&dq=intitle:Elements+of+Information+Theory&hl=&cd=1&source=gbs_api%5Cnpapers2://publication/uuid/BAF426F8-5A4F-44A4-8333-FA8187160D9BBrandes, U., & Pich, C. (s. f.). Eigensolver Methods for Progressive Multidimensional Scaling of Large Data. Lecture Notes in Computer Science, 42-53. doi:10.1007/978-3-540-70904-6_6Liaw, S. T., Rahimi, A., Ray, P., Taggart, J., Dennis, S., de Lusignan, S., … Talaei-Khoei, A. (2013). Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature. International Journal of Medical Informatics, 82(1), 10-24. doi:10.1016/j.ijmedinf.2012.10.001Arts, D. G. T. (2002). Defining and Improving Data Quality in Medical Registries: A Literature Review, Case Study, and Generic Framework. Journal of the American Medical Informatics Association, 9(6), 600-611. doi:10.1197/jamia.m1087Bray, F., & Parkin, D. M. (2009). Evaluation of data quality in the cancer registry: Principles and methods. Part I: Comparability, validity and timeliness. European Journal of Cancer, 45(5), 747-755. doi:10.1016/j.ejca.2008.11.032Parkin, D. M., & Bray, F. (2009). Evaluation of data quality in the cancer registry: Principles and methods Part II. Completeness. European Journal of Cancer, 45(5), 756-764. doi:10.1016/j.ejca.2008.11.033Fernandez-Llatas, C., Ibanez-Sanchez, G., Celda, A., Mandingorra, J., Aparici-Tortajada, L., Martinez-Millana, A., … Traver, V. (2019). Analyzing Medical Emergency Processes with Process Mining: The Stroke Case. Lecture Notes in Business Information Processing, 214-225. doi:10.1007/978-3-030-11641-5_17Fernandez-Llatas, C., Lizondo, A., Monton, E., Benedi, J.-M., & Traver, V. (2015). Process Mining Methodology for Health Process Tracking Using Real-Time Indoor Location Systems. Sensors, 15(12), 29821-29840. doi:10.3390/s151229769Van der Aalst, W., Weijters, T., & Maruster, L. (2004). Workflow mining: discovering process models from event logs. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1128-1142. doi:10.1109/tkde.2004.47Weijters AJMM, Van Der Aalst WMP, Alves De Medeiros AK. Process Mining with the HeuristicsMiner Algorithm [Internet]. Available: https://pdfs.semanticscholar.org/1cc3/d62e27365b8d7ed6ce93b41c193d0559d086.pdfShim, S. J., & Kumar, A. (2010). Simulation for emergency care process reengineering in hospitals. Business Process Management Journal, 16(5), 795-805. doi:10.1108/14637151011076476Svolba, G., & Bauer, P. (1999). Statistical Quality Control in Clinical Trials. Controlled Clinical Trials, 20(6), 519-530. doi:10.1016/s0197-2456(99)00029-xKahn, M. G., Raebel, M. A., Glanz, J. M., Riedlinger, K., & Steiner, J. F. (2012). A Pragmatic Framework for Single-site and Multisite Data Quality Assessment in Electronic Health Record-based Clinical Research. Medical Care, 50, S21-S29. doi:10.1097/mlr.0b013e318257dd67Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1-52. doi:10.1145/1541880.1541883Heinrich, B., Klier, M., & Kaiser, M. (2009). A Procedure to Develop Metrics for Currency and its Application in CRM. Journal of Data and Information Quality, 1(1), 1-28. doi:10.1145/1515693.1515697Sirgo, G., Esteban, F., Gómez, J., Moreno, G., Rodríguez, A., Blanch, L., … Bodí, M. (2018). Validation of the ICU-DaMa tool for automatically extracting variables for minimum dataset and quality indicators: The importance of data quality assessment. International Journal of Medical Informatics, 112, 166-172. doi:10.1016/j.ijmedinf.2018.02.007Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504-507. doi:10.1126/science.1127647Kohn LT, Corrigan JM. To err is human: building a safer health system. A report of the Committee on Quality of Health Care in America. 2000. p. 287. National Academies Press

    Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset

    Full text link
    [EN] Objective: The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Materials and Methods: We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. Results: Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. Conclusions: Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.This work was supported by Universitat Politecnica de Valencia contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE-Santander Bank grant "Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19)."Sáez Silvestre, C.; Romero, N.; Conejero, JA.; Garcia-Gomez, JM. (2021). Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset. Journal of the American Medical Informatics Association. 28(2):360-364. https://doi.org/10.1093/jamia/ocaa25836036428

    Probing resting-state functional connectivity in the infant brain: methods and potentiality

    Full text link
    Early brain development is characterized by rapid growth and perpetual reconfiguration, driven by a dynamic milieu of heterogeneous processes. Moreover, potent postnatal brain plasticity engenders increased vulnerability to environmental stimuli. However, little is known regarding the ontogeny and temporal manifestations of inter- and intra-regional functional connectivity that comprise functional brain networks. Recently, resting-state functional magnetic resonance imaging (fMRI) emerged as a promising non-invasive neuroinvestigative tool, measuring spontaneous fluctuations in blood oxygen level dependent (BOLD) signal at rest that reflect baseline neuronal activity. Its application has expanded to infant populations in the past decade, providing unprecedented insight into functional organization of the developing brain, as well as early biomarkers of abnormal/ disease states. However, rapid extension of the resting-state technique to infant populations leaves many methodological issues need to be resolved prior to standardization of the technique. The purpose of this thesis is to describe a protocol for intrinsic functional connectivity analysis, and extraction of resting-state networks in infants <12 months of age using the data-driven approach independent component analysis (ICA). To begin, we review the evolution of resting-state fMRI application in infant populations, including the biological premise for neural networks. Next, we present a protocol designed such that investigators without previous knowledge in the field can implement the analysis and reliably obtain viable results consistent with previous literature. Presented protocol provides detailed, albeit basic framework for RSN analysis, with interwoven discussion of basic theory behind each technique, as well as the rationale behind selecting parameters. The overarching goal is to catalyze efforts towards development of robust, infant-specific acquisition and preprocessing pipelines, as well as promote greater transparency by researchers regarding methods used. Finally, we review the literature, current methodological challenges and potential future directions for the field of infant resting-state fMRI

    MEG/EEG source reconstruction, statistical evaluation, and visualization with NUTMEG.

    Get PDF
    NUTMEG is a source analysis toolbox geared towards cognitive neuroscience researchers using MEG and EEG, including intracranial recordings. Evoked and unaveraged data can be imported to the toolbox for source analysis in either the time or time-frequency domains. NUTMEG offers several variants of adaptive beamformers, probabilistic reconstruction algorithms, as well as minimum-norm techniques to generate functional maps of spatiotemporal neural source activity. Lead fields can be calculated from single and overlapping sphere head models or imported from other software. Group averages and statistics can be calculated as well. In addition to data analysis tools, NUTMEG provides a unique and intuitive graphical interface for visualization of results. Source analyses can be superimposed onto a structural MRI or headshape to provide a convenient visual correspondence to anatomy. These results can also be navigated interactively, with the spatial maps and source time series or spectrogram linked accordingly. Animations can be generated to view the evolution of neural activity over time. NUTMEG can also display brain renderings and perform spatial normalization of functional maps using SPM's engine. As a MATLAB package, the end user may easily link with other toolboxes or add customized functions
    corecore