118 research outputs found
Recommended from our members
On the challenges and opportunities in visualization for machine learning and knowledge extraction: A research agenda
We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research
Developing Artificial Intelligence tools to investigate the phenotypes and correlates of Chronic Kidney Disease patients in West Virginia
ABSTRACT
Developing Artificial Intelligence tools to investigate the phenotypes and correlates of Chronic Kidney Disease patients in West Virginia
Marzieh Amiri Shahbazi
Chronic kidney disease (CKD) is responsible for disrupting the lives of 37 million people just in the USA, which is about 1 in 7 adults. CKD results in a gradual loss of kidney function over time. Sometimes CKD doesn’t produce any significant symptoms until it reaches an advanced stage. On the other hand, acute kidney injury (AKI) accounts for a sudden decline in the kidney’s function. As a result, the kidneys fail to filter waste materials from the blood and cause an increase in blood pressure. High blood pressure can cause heart disease and, in the long-term, induce CKD. Literature to date says AKI leads to long-term adverse kidney outcomes and linked to CKD. AKI diagnosis, its severity, treatment, and recovery process have a major impact on the likelihood of a future diagnosis of CKD. This research attempts to understand the patient’s trajectory toward developing CKD after AKI diagnosis, key triggers contributing to this trajectory and ultimately develop an Artificial intelligence-based prognosis tool. To comprehend the role of AKI and previous hospitalization in the progress of CKD, various cohorts of CKD patients are created: i) AKI after hospitalization before CKD ii) Random AKI before CKD, and iii) No AKI before CKD. Prior comorbidities, medications, lab results, and pertinent procedures are considered, and for each cohort of patients, the most prevalent phenotypes are identified. The patient cohorts required for this analysis are generated from CKD patients residing in West Virginia. The data is provided by TriNetx, a global network platform. K-means clustering, and the latent class analysis (LCA) approach is used to identify and group the phenotypes of CKD for each cohort. The high-risk patient groups generated by the clustering algorithms are compared with each other. These results will help clinicians to understand the risk factors of CKD and the overall trajectory of the development of CKD. This research suggests that a single method of care does not work for all patients since phenotypes vary for distinct groups of patients and categorizing patients into distinct groups allows for the allocation of different resources and strategies for the care of different groups of patients. From this research, it is evident that patients’ risk profiles change over the years before developing CKD. There are also similarities as well as differences across the cohorts for each year, which suggests that CKD risk factors may be linked to prior AKI, hospitalization, or inpatient care
Visualisation Methods of Hierarchical Biological Data: A Survey and Review
The sheer amount of high dimensional biomedical data requires machine learning, and advanced data visualization techniques to make the data understandable for human experts. Most biomedical data today is in arbitrary high dimensional spaces, and is not directly accessible to the human expert for a visual and interactive analysis process. To cope with this challenge, the application of machine learning and knowledge extraction methods is indispensable throughout the entire data analysis workflow. Nevertheless, human experts need to understand and interpret the data and experimental results. Appropriate understanding is typically supported by visualizing the results adequately, which is not a simple task. Consequently, data visualization is one of the most crucial steps in conveying biomedical results. It can and should be considered as a critical part of the analysis pipeline. Still as of today, 2D representations dominate, and human perception is limited to this lower dimension to understand the data. This makes the visualization of the results in an understandable and comprehensive manner a grand challenge.
This paper reviews the current state of visualization methods in a biomedical context. It focuses on hierarchical biological data as a source for visualization, and gives a comprehensiv
Synthetic Interventions
Consider a setting where there are heterogeneous units (e.g.,
individuals, sub-populations) and interventions (e.g., socio-economic
policies). Our goal is to learn the potential outcome associated with every
intervention on every unit (i.e., causal parameters). Towards
this, we present a causal framework, synthetic interventions (SI), to infer
these causal parameters while only observing each of the units
under at most two interventions, independent of . This can be significant as
the number of interventions, i.e, level of personalization, grows. Importantly,
our estimator also allows for latent confounders that determine how
interventions are assigned. Theoretically, under a novel tensor factor model
across units, measurements, and interventions, we formally establish an
identification result for each of these causal parameters and
establish finite-sample consistency and asymptotic normality of our estimator.
The estimator is furnished with a data-driven test to verify its suitability.
Empirically, we validate our framework through both experimental and
observational case studies; namely, a large-scale A/B test performed on an
e-commerce platform, and an evaluation of mobility restriction on morbidity
outcomes due to COVID-19. We believe this has important implications for
program evaluation and the design of data-efficient RCTs with heterogeneous
units and multiple interventions
Signatures of T cell immunity revealed using sequence similarity with TCRDivER algorithm
Changes in the T cell receptor (TCR) repertoires have become important markers for monitoring disease or therapy progression. With the rise of immunotherapy usage in cancer, infectious and autoimmune disease, accurate assessment and comparison of the "state" of the TCR repertoire has become paramount. One important driver of change within the repertoire is T cell proliferation following immunisation. A way of monitoring this is by investigating large clones of individual T cells believed to bind epitopes connected to the disease. However, as a single target can be bound by many different TCRs, monitoring individual clones cannot fully account for T cell cross-reactivity. Moreover, T cells responding to the same target often exhibit higher sequence similarity, which highlights the importance of accounting for TCR similarity within the repertoire. This complexity of binding relationships between a TCR and its target convolutes comparison of immune responses between individuals or comparisons of TCR repertoires at different timepoints. Here we propose TCRDivER algorithm (T cell Receptor Diversity Estimates for Repertoires), a global method of T cell repertoire comparison using diversity profiles sensitive to both clone size and sequence similarity. This approach allowed for distinction between spleen TCR repertoires of immunised and non-immunised mice, showing the need for including both facets of repertoire changes simultaneously. The analysis revealed biologically interpretable relationships between sequence similarity and clonality. These aid in understanding differences and separation of repertoires stemming from different biological context. With the rise of availability of sequencing data we expect our tool to find broad usage in clinical and research applications
Determination of correlates of protection against tuberculosis in nonhuman primate models
Tuberculosis (TB) is one of the greatest global health challenges society faces. BCG, the only licensed vaccine for TB, has profoundly variable efficacy and does not prevent the spread of TB. Due to the lack of an effective vaccine, there are no correlates of protection to use in vaccine development. The goal of this dissertation was to develop new tools for pre-clinical and clinical trials of TB vaccines, including new outcome measures and predictive markers of efficacy. Development of these tools will expedite down-selection of vaccinate candidates, reducing their ultimate cost and hastening the reduction and eventual elimination of this disease. BCG afforded the best levels of protection in the rhesus macaque model of TB, which closely resembled TB disease in human infants. Boosting BCG by protein antigens or adenoviral vectored antigens did not improve, and in some cases worsened, outcome. A T cell signature in the lung-draining lymph nodes (LN) at necropsy, early gamma interferon (IFN-γ) ELISPOT and early PET-CT markers correlated with improved outcome in this model. We further characterized the protection afforded by an experimental boost to BCG, H56, which has been shown to prevent reactivation TB in cynomolgus macaques. BCG/H56 prevented establishment of disease in lung- draining LN. BCG/H56 also mitigated lung inflammation, which reduced apparent risk of reactivation TB by PET-CT. Early control of disease in the lung-draining LN, as well as a T cell signature, was associated with reduced risk of reactivation TB. Both studies provided evidence that PET-CT markers correlate with outcome. We thus built a holistic outcome score based iv
strictly on quantifiable outcomes: gross pathology and bacterial burden determined at necropsy, and constructed models that robustly predict this outcome score early using early PET-CT markers. Altogether, these studies highlight the importance of the lung-draining LN as a site of bacterial persistence and the ability of PET-CT to assess disease and predict vaccine efficacy. Further work will build upon these studies to determine the best site of vaccination to prevent disease, and develop a blood signature correlate for use in clinical trials
CAIPI in Practice: Towards Explainable Interactive Medical Image Classification
Would you trust physicians if they cannot explain their decisions to you?
Medical diagnostics using machine learning gained enormously in importance
within the last decade. However, without further enhancements many
state-of-the-art machine learning methods are not suitable for medical
application. The most important reasons are insufficient data set quality and
the black-box behavior of machine learning algorithms such as Deep Learning
models. Consequently, end-users cannot correct the model's decisions and the
corresponding explanations. The latter is crucial for the trustworthiness of
machine learning in the medical domain. The research field explainable
interactive machine learning searches for methods that address both
shortcomings. This paper extends the explainable and interactive CAIPI
algorithm and provides an interface to simplify human-in-the-loop approaches
for image classification. The interface enables the end-user (1) to investigate
and (2) to correct the model's prediction and explanation, and (3) to influence
the data set quality. After CAIPI optimization with only a single
counterexample per iteration, the model achieves an accuracy of on
the Medical MNIST and on the Fashion MNIST. This accuracy is
approximately equal to state-of-the-art Deep Learning optimization procedures.
Besides, CAIPI reduces the labeling effort by approximately .Comment: Manuscript accepted at IFIP AIAI 202
Data science, analytics and artificial intelligence in e-health : trends, applications and challenges
Acknowledgments. This work has been partially supported by the Divina Pastora Seguros company.More than ever, healthcare systems can use data, predictive models, and intelligent algorithms to optimize their operations and the service they provide. This paper reviews the existing literature regarding the use of data science/analytics methods and artificial intelligence algorithms in healthcare. The paper also discusses how healthcare organizations can benefit from these tools to efficiently deal with a myriad of new possibilities and strategies. Examples of real applications are discussed to illustrate the potential of these methods. Finally, the paper highlights the main challenges regarding the use of these methods in healthcare, as well as some open research lines
- …