10 research outputs found
Multi-objective Optimization for Incremental Decision Tree Learning
Abstract. Decision tree learning can be roughly classified into two categories: static and incremental inductions. Static tree induction applies greedy search in splitting test for obtaining a global optimal model. Incremental tree induction constructs a decision model by analyzing data in short segments; during each segment a local optimal tree structure is formed. Very Fast Decision Tree [4] is a typical incremental tree induction based on the principle of Hoeffding bound for node-splitting test. But it does not work well under noisy data. In this paper, we propose a new incremental tree induction model called incrementally Optimized Very Fast Decision Tree (iOVFDT), which uses a multi-objective incremental optimization method. iOVFDT also integrates four classifiers at the leaf levels. The proposed incremental tree induction model is tested with a large volume of data streams contaminated with noise. Under such noisy data, we investigate how iOVFDT that represents incremental induction method working with local optimums compares to C4.5 which loads the whole dataset for building a globally optimal decision tree. Our experiment results show that iOVFDT is able to achieve similar though slightly lower accuracy, but the decision tree size and induction time are much smaller than that of C4.5
DEVELOPING NOVEL COMPUTER-AIDED DETECTION AND DIAGNOSIS SYSTEMS OF MEDICAL IMAGES
Reading medical images to detect and diagnose diseases is often difficult and has large inter-reader variability. To address this issue, developing computer-aided detection and diagnosis (CAD) schemes or systems of medical images has attracted broad research interest in the last several decades. Despite great effort and significant progress in previous studies, only limited CAD schemes have been used in clinical practice. Thus, developing new CAD schemes is still a hot research topic in medical imaging informatics field. In this dissertation, I investigate the feasibility of developing several new innovative CAD schemes for different application purposes. First, to predict breast tumor response to neoadjuvant chemotherapy and reduce unnecessary aggressive surgery, I developed two CAD schemes of breast magnetic resonance imaging (MRI) to generate quantitative image markers based on quantitative analysis of global kinetic features. Using the image marker computed from breast MRI acquired pre-chemotherapy, CAD scheme enables to predict radiographic complete response (CR) of breast tumors to neoadjuvant chemotherapy, while using the imaging marker based on the fusion of kinetic and texture features extracted from breast MRI performed after neoadjuvant chemotherapy, CAD scheme can better predict the pathologic complete response (pCR) of the patients. Second, to more accurately predict prognosis of stroke patients, quantifying brain hemorrhage and ventricular cerebrospinal fluid depicting on brain CT images can play an important role. For this purpose, I developed a new interactive CAD tool to segment hemorrhage regions and extract radiological imaging marker to quantitatively determine the severity of aneurysmal subarachnoid hemorrhage at presentation and correlate the estimation with various homeostatic/metabolic derangements and predict clinical outcome. Third, to improve the efficiency of primary antibody screening processes in new cancer drug development, I developed a CAD scheme to automatically identify the non-negative tissue slides, which indicate reactive antibodies in digital pathology images. Last, to improve operation efficiency and reliability of storing digital pathology image data, I developed a CAD scheme using optical character recognition algorithm to automatically extract metadata from tissue slide label images and reduce manual entry for slide tracking and archiving in the tissue pathology laboratories.
In summary, in these studies, we developed and tested several innovative approaches to identify quantitative imaging markers with high discriminatory power. In all CAD schemes, the graphic user interface-based visual aid tools were also developed and implemented. Study results demonstrated feasibility of applying CAD technology to several new application fields, which has potential to assist radiologists, oncologists and pathologists improving accuracy and consistency in disease diagnosis and prognosis assessment of using medical image
Psychometrics in Practice at RCEC
A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud
All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment
Assessment of a multi-measure functional connectivity approach
Efforts to find differences in brain activity patterns of subjects with neurological and
psychiatric disorders that could help in their diagnosis and prognosis have been increasing in recent years and promise to revolutionise clinical practice and our understanding of such illnesses in the future. Resting-state functional magnetic resonance imaging (rsfMRI) data has been increasingly used to evaluate said activity and to characterize the connectivity between distinct brain regions, commonly organized in functional connectivity (FC) matrices. Here, machine learning methods were used to assess the extent to which multiple FC matrices, each determined with a different statistical method, could change classification performance relative to when only one matrix is used, as is common practice.
Used statistical methods include correlation, coherence, mutual information, transfer entropy and non-linear correlation, as implemented in the MULAN toolbox. Classification was made using random forests and support vector machine (SVM) classifiers. Besides
the previously mentioned objective, this study had three other goals: to individually investigate which of these statistical methods yielded better classification performances, to confirm the importance of the blood-oxygen-level-dependent (BOLD) signal in the frequency range 0.009-0.08 Hz for FC based classifications as well as to assess the impact of feature selection in SVM classifiers. Publicly available rs-fMRI data from the Addiction Connectome Preprocessed Initiative (ACPI) and the ADHD-200 databases was used to perform classification of controls vs subjects with Attention-Deficit/Hyperactivity Disorder (ADHD). Maximum accuracy and macro-averaged f-measure values of 0.744 and 0.677 were respectively achieved in the ACPI dataset and of 0.678 and 0.648 in the ADHD-200 dataset. Results show that combining matrices could significantly improve classification accuracy and macro-averaged f-measure if feature selection is made. Also, the results of this study suggest that mutual information methods might play an important role in FC based classifications, at least when classifying subjects with ADHD
Statistical and Machine Learning Techniques Applied to Algorithm Selection for Solving Sparse Linear Systems
There are many applications and problems in science and engineering that require large-scale numerical simulations and computations. The issue of choosing an appropriate method to solve these problems is very common, however it is not a trivial one, principally because this decision is most of the times too hard for humans to make, or certain degree of expertise and knowledge in the particular discipline, or in mathematics, are required. Thus, the development of a methodology that can facilitate or automate this process and helps to understand the problem, would be of great interest and help. The proposal is to utilize various statistically based machine-learning and data mining techniques to analyze and automate the process of choosing an appropriate numerical algorithm for solving a specific set of problems (sparse linear systems) based on their individual properties
Information technologies for pain management
Millions of people around the world suffer from pain, acute or chronic and this raises the
importance of its screening, assessment and treatment. The importance of pain is attested by
the fact that it is considered the fifth vital sign for indicating basic bodily functions, health
and quality of life, together with the four other vital signs: blood pressure, body
temperature, pulse rate and respiratory rate. However, while these four signals represent an
objective physical parameter, the occurrence of pain expresses an emotional status that
happens inside the mind of each individual and therefore, is highly subjective that makes
difficult its management and evaluation. For this reason, the self-report of pain is considered
the most accurate pain assessment method wherein patients should be asked to periodically
rate their pain severity and related symptoms. Thus, in the last years computerised systems
based on mobile and web technologies are becoming increasingly used to enable patients to
report their pain which lead to the development of electronic pain diaries (ED). This approach
may provide to health care professionals (HCP) and patients the ability to interact with the
system anywhere and at anytime thoroughly changes the coordinates of time and place and
offers invaluable opportunities to the healthcare delivery. However, most of these systems
were designed to interact directly to patients without presence of a healthcare professional
or without evidence of reliability and accuracy. In fact, the observation of the existing
systems revealed lack of integration with mobile devices, limited use of web-based interfaces
and reduced interaction with patients in terms of obtaining and viewing information. In
addition, the reliability and accuracy of computerised systems for pain management are
rarely proved or their effects on HCP and patients outcomes remain understudied.
This thesis is focused on technology for pain management and aims to propose a monitoring
system which includes ubiquitous interfaces specifically oriented to either patients or HCP
using mobile devices and Internet so as to allow decisions based on the knowledge obtained
from the analysis of the collected data. With the interoperability and cloud computing
technologies in mind this system uses web services (WS) to manage data which are stored in a
Personal Health Record (PHR).
A Randomised Controlled Trial (RCT) was implemented so as to determine the effectiveness
of the proposed computerised monitoring system. The six weeks RCT evidenced the
advantages provided by the ubiquitous access to HCP and patients so as to they were able to
interact with the system anywhere and at anytime using WS to send and receive data. In
addition, the collected data were stored in a PHR which offers integrity and security as well
as permanent on line accessibility to both patients and HCP. The study evidenced not only
that the majority of participants recommend the system, but also that they recognize it
suitability for pain management without the requirement of advanced skills or experienced users. Furthermore, the system enabled the definition and management of patient-oriented
treatments with reduced therapist time. The study also revealed that the guidance of HCP at
the beginning of the monitoring is crucial to patients' satisfaction and experience stemming
from the usage of the system as evidenced by the high correlation between the
recommendation of the application, and it suitability to improve pain management and to
provide medical information. There were no significant differences regarding to
improvements in the quality of pain treatment between intervention group and control group.
Based on the data collected during the RCT a clinical decision support system (CDSS) was
developed so as to offer capabilities of tailored alarms, reports, and clinical guidance. This
CDSS, called Patient Oriented Method of Pain Evaluation System (POMPES), is based on the
combination of several statistical models (one-way ANOVA, Kruskal-Wallis and Tukey-Kramer)
with an imputation model based on linear regression. This system resulted in fully accuracy
related to decisions suggested by the system compared with the medical diagnosis, and
therefore, revealed it suitability to manage the pain. At last, based on the aerospace systems
capability to deal with different complex data sources with varied complexities and
accuracies, an innovative model was proposed. This model is characterized by a qualitative
analysis stemming from the data fusion method combined with a quantitative model based on
the comparison of the standard deviation together with the values of mathematical
expectations. This model aimed to compare the effects of technological and pen-and-paper
systems when applied to different dimension of pain, such as: pain intensity, anxiety,
catastrophizing, depression, disability and interference. It was observed that pen-and-paper
and technology produced equivalent effects in anxiety, depression, interference and pain
intensity. On the contrary, technology evidenced favourable effects in terms of
catastrophizing and disability. The proposed method revealed to be suitable, intelligible, easy
to implement and low time and resources consuming. Further work is needed to evaluate the
proposed system to follow up participants for longer periods of time which includes a
complementary RCT encompassing patients with chronic pain symptoms. Finally, additional
studies should be addressed to determine the economic effects not only to patients but also
to the healthcare system
Computer aided drug design: Drug target directed in silico approaches
Ph.DDOCTOR OF PHILOSOPH
The Biases of Decision Tree Pruning Strategies
Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm's performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypothesis per se, but it can be understood to do pre-pruning of the evolving tree. We study a backtracking search algorithm, called Rank, for learning rank-minimal decision trees. Our experiments follow closely those performed by Schaffer [20]. They confirm the main findings of Schaffer: in learning concepts with simple description pruning works, for concepts with a complex description and when all concepts are equally likely pruning is injurious, rather than beneficial, to the average performance of the greedy topdown induction of decision trees. Pre-pruning, as a gentler technique, settles in the ..