10 research outputs found

    Multi-objective Optimization for Incremental Decision Tree Learning

    Get PDF
    Abstract. Decision tree learning can be roughly classified into two categories: static and incremental inductions. Static tree induction applies greedy search in splitting test for obtaining a global optimal model. Incremental tree induction constructs a decision model by analyzing data in short segments; during each segment a local optimal tree structure is formed. Very Fast Decision Tree [4] is a typical incremental tree induction based on the principle of Hoeffding bound for node-splitting test. But it does not work well under noisy data. In this paper, we propose a new incremental tree induction model called incrementally Optimized Very Fast Decision Tree (iOVFDT), which uses a multi-objective incremental optimization method. iOVFDT also integrates four classifiers at the leaf levels. The proposed incremental tree induction model is tested with a large volume of data streams contaminated with noise. Under such noisy data, we investigate how iOVFDT that represents incremental induction method working with local optimums compares to C4.5 which loads the whole dataset for building a globally optimal decision tree. Our experiment results show that iOVFDT is able to achieve similar though slightly lower accuracy, but the decision tree size and induction time are much smaller than that of C4.5

    University of Helsinki Department of Computer Science Annual Report 1999

    Get PDF

    DEVELOPING NOVEL COMPUTER-AIDED DETECTION AND DIAGNOSIS SYSTEMS OF MEDICAL IMAGES

    Get PDF
    Reading medical images to detect and diagnose diseases is often difficult and has large inter-reader variability. To address this issue, developing computer-aided detection and diagnosis (CAD) schemes or systems of medical images has attracted broad research interest in the last several decades. Despite great effort and significant progress in previous studies, only limited CAD schemes have been used in clinical practice. Thus, developing new CAD schemes is still a hot research topic in medical imaging informatics field. In this dissertation, I investigate the feasibility of developing several new innovative CAD schemes for different application purposes. First, to predict breast tumor response to neoadjuvant chemotherapy and reduce unnecessary aggressive surgery, I developed two CAD schemes of breast magnetic resonance imaging (MRI) to generate quantitative image markers based on quantitative analysis of global kinetic features. Using the image marker computed from breast MRI acquired pre-chemotherapy, CAD scheme enables to predict radiographic complete response (CR) of breast tumors to neoadjuvant chemotherapy, while using the imaging marker based on the fusion of kinetic and texture features extracted from breast MRI performed after neoadjuvant chemotherapy, CAD scheme can better predict the pathologic complete response (pCR) of the patients. Second, to more accurately predict prognosis of stroke patients, quantifying brain hemorrhage and ventricular cerebrospinal fluid depicting on brain CT images can play an important role. For this purpose, I developed a new interactive CAD tool to segment hemorrhage regions and extract radiological imaging marker to quantitatively determine the severity of aneurysmal subarachnoid hemorrhage at presentation and correlate the estimation with various homeostatic/metabolic derangements and predict clinical outcome. Third, to improve the efficiency of primary antibody screening processes in new cancer drug development, I developed a CAD scheme to automatically identify the non-negative tissue slides, which indicate reactive antibodies in digital pathology images. Last, to improve operation efficiency and reliability of storing digital pathology image data, I developed a CAD scheme using optical character recognition algorithm to automatically extract metadata from tissue slide label images and reduce manual entry for slide tracking and archiving in the tissue pathology laboratories. In summary, in these studies, we developed and tested several innovative approaches to identify quantitative imaging markers with high discriminatory power. In all CAD schemes, the graphic user interface-based visual aid tools were also developed and implemented. Study results demonstrated feasibility of applying CAD technology to several new application fields, which has potential to assist radiologists, oncologists and pathologists improving accuracy and consistency in disease diagnosis and prognosis assessment of using medical image

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Assessment of a multi-measure functional connectivity approach

    Get PDF
    Efforts to find differences in brain activity patterns of subjects with neurological and psychiatric disorders that could help in their diagnosis and prognosis have been increasing in recent years and promise to revolutionise clinical practice and our understanding of such illnesses in the future. Resting-state functional magnetic resonance imaging (rsfMRI) data has been increasingly used to evaluate said activity and to characterize the connectivity between distinct brain regions, commonly organized in functional connectivity (FC) matrices. Here, machine learning methods were used to assess the extent to which multiple FC matrices, each determined with a different statistical method, could change classification performance relative to when only one matrix is used, as is common practice. Used statistical methods include correlation, coherence, mutual information, transfer entropy and non-linear correlation, as implemented in the MULAN toolbox. Classification was made using random forests and support vector machine (SVM) classifiers. Besides the previously mentioned objective, this study had three other goals: to individually investigate which of these statistical methods yielded better classification performances, to confirm the importance of the blood-oxygen-level-dependent (BOLD) signal in the frequency range 0.009-0.08 Hz for FC based classifications as well as to assess the impact of feature selection in SVM classifiers. Publicly available rs-fMRI data from the Addiction Connectome Preprocessed Initiative (ACPI) and the ADHD-200 databases was used to perform classification of controls vs subjects with Attention-Deficit/Hyperactivity Disorder (ADHD). Maximum accuracy and macro-averaged f-measure values of 0.744 and 0.677 were respectively achieved in the ACPI dataset and of 0.678 and 0.648 in the ADHD-200 dataset. Results show that combining matrices could significantly improve classification accuracy and macro-averaged f-measure if feature selection is made. Also, the results of this study suggest that mutual information methods might play an important role in FC based classifications, at least when classifying subjects with ADHD

    Statistical and Machine Learning Techniques Applied to Algorithm Selection for Solving Sparse Linear Systems

    Get PDF
    There are many applications and problems in science and engineering that require large-scale numerical simulations and computations. The issue of choosing an appropriate method to solve these problems is very common, however it is not a trivial one, principally because this decision is most of the times too hard for humans to make, or certain degree of expertise and knowledge in the particular discipline, or in mathematics, are required. Thus, the development of a methodology that can facilitate or automate this process and helps to understand the problem, would be of great interest and help. The proposal is to utilize various statistically based machine-learning and data mining techniques to analyze and automate the process of choosing an appropriate numerical algorithm for solving a specific set of problems (sparse linear systems) based on their individual properties

    Information technologies for pain management

    Get PDF
    Millions of people around the world suffer from pain, acute or chronic and this raises the importance of its screening, assessment and treatment. The importance of pain is attested by the fact that it is considered the fifth vital sign for indicating basic bodily functions, health and quality of life, together with the four other vital signs: blood pressure, body temperature, pulse rate and respiratory rate. However, while these four signals represent an objective physical parameter, the occurrence of pain expresses an emotional status that happens inside the mind of each individual and therefore, is highly subjective that makes difficult its management and evaluation. For this reason, the self-report of pain is considered the most accurate pain assessment method wherein patients should be asked to periodically rate their pain severity and related symptoms. Thus, in the last years computerised systems based on mobile and web technologies are becoming increasingly used to enable patients to report their pain which lead to the development of electronic pain diaries (ED). This approach may provide to health care professionals (HCP) and patients the ability to interact with the system anywhere and at anytime thoroughly changes the coordinates of time and place and offers invaluable opportunities to the healthcare delivery. However, most of these systems were designed to interact directly to patients without presence of a healthcare professional or without evidence of reliability and accuracy. In fact, the observation of the existing systems revealed lack of integration with mobile devices, limited use of web-based interfaces and reduced interaction with patients in terms of obtaining and viewing information. In addition, the reliability and accuracy of computerised systems for pain management are rarely proved or their effects on HCP and patients outcomes remain understudied. This thesis is focused on technology for pain management and aims to propose a monitoring system which includes ubiquitous interfaces specifically oriented to either patients or HCP using mobile devices and Internet so as to allow decisions based on the knowledge obtained from the analysis of the collected data. With the interoperability and cloud computing technologies in mind this system uses web services (WS) to manage data which are stored in a Personal Health Record (PHR). A Randomised Controlled Trial (RCT) was implemented so as to determine the effectiveness of the proposed computerised monitoring system. The six weeks RCT evidenced the advantages provided by the ubiquitous access to HCP and patients so as to they were able to interact with the system anywhere and at anytime using WS to send and receive data. In addition, the collected data were stored in a PHR which offers integrity and security as well as permanent on line accessibility to both patients and HCP. The study evidenced not only that the majority of participants recommend the system, but also that they recognize it suitability for pain management without the requirement of advanced skills or experienced users. Furthermore, the system enabled the definition and management of patient-oriented treatments with reduced therapist time. The study also revealed that the guidance of HCP at the beginning of the monitoring is crucial to patients' satisfaction and experience stemming from the usage of the system as evidenced by the high correlation between the recommendation of the application, and it suitability to improve pain management and to provide medical information. There were no significant differences regarding to improvements in the quality of pain treatment between intervention group and control group. Based on the data collected during the RCT a clinical decision support system (CDSS) was developed so as to offer capabilities of tailored alarms, reports, and clinical guidance. This CDSS, called Patient Oriented Method of Pain Evaluation System (POMPES), is based on the combination of several statistical models (one-way ANOVA, Kruskal-Wallis and Tukey-Kramer) with an imputation model based on linear regression. This system resulted in fully accuracy related to decisions suggested by the system compared with the medical diagnosis, and therefore, revealed it suitability to manage the pain. At last, based on the aerospace systems capability to deal with different complex data sources with varied complexities and accuracies, an innovative model was proposed. This model is characterized by a qualitative analysis stemming from the data fusion method combined with a quantitative model based on the comparison of the standard deviation together with the values of mathematical expectations. This model aimed to compare the effects of technological and pen-and-paper systems when applied to different dimension of pain, such as: pain intensity, anxiety, catastrophizing, depression, disability and interference. It was observed that pen-and-paper and technology produced equivalent effects in anxiety, depression, interference and pain intensity. On the contrary, technology evidenced favourable effects in terms of catastrophizing and disability. The proposed method revealed to be suitable, intelligible, easy to implement and low time and resources consuming. Further work is needed to evaluate the proposed system to follow up participants for longer periods of time which includes a complementary RCT encompassing patients with chronic pain symptoms. Finally, additional studies should be addressed to determine the economic effects not only to patients but also to the healthcare system

    Computer aided drug design: Drug target directed in silico approaches

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    The Biases of Decision Tree Pruning Strategies

    No full text
    Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm's performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypothesis per se, but it can be understood to do pre-pruning of the evolving tree. We study a backtracking search algorithm, called Rank, for learning rank-minimal decision trees. Our experiments follow closely those performed by Schaffer [20]. They confirm the main findings of Schaffer: in learning concepts with simple description pruning works, for concepts with a complex description and when all concepts are equally likely pruning is injurious, rather than beneficial, to the average performance of the greedy topdown induction of decision trees. Pre-pruning, as a gentler technique, settles in the ..
    corecore