1,120 research outputs found

    Classification of Cardiotocography Data with WEKA

    Get PDF
    Cardiotocography (CTG) records fetal heart rate (FHR) and uterine contractions (UC) simultaneously. Cardiotocography trace patterns help doctors to understand the state of the fetus. Even after the introduction of cardiotocograph, the capacity to predict is still inaccurate. This paper evaluates some commonly used classification methods using WEKA. Precision,Recall, F-Measrue and ROC curve have been used as the metric to evaluate the performance of classifiers. As opposed to some of the earlier research works that were unable to identify Suspicious and Pathologic patterns, the results obtained from the study in this paper could precisely identify pathologic and Suspicious cases. Best results were obtained from J48, Random Forest and Classification via Regression

    Reducing the loss of information through annealing text distortion

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects

    Classification of Cardiotocography Data with WEKA

    Get PDF
    Cardiotocography (CTG) records fetal heart rate (FHR) and uterine contractions (UC) simultaneously. Cardiotocography trace patterns help doctors to understand the state of the fetus. Even after the introduction of cardiotocograph, the capacity to predict is still inaccurate. This paper evaluates some commonly used classification methods using WEKA. Precision,Recall, F-Measrue and ROC curve have been used as the metric to evaluate the performance of classifiers. As opposed to some of the earlier research works that were unable to identify Suspicious and Pathologic patterns, the results obtained from the study in this paper could precisely identify pathologic and Suspicious cases. Best results were obtained from J48, Random Forest and Classification via Regression

    A Fast Quartet Tree Heuristic for Hierarchical Clustering

    Get PDF
    The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4)3{n \choose 4} weighted quartet topologies on nn objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with arXiv:cs/0606048 in cs.D

    Computational intelligence methods for predicting fetal outcomes from heart rate patterns

    Get PDF
    In this thesis, methods for evaluating the fetal state are compared to make predictions based on Cardiotocography (CTG) data. The first part of this research is the development of an algorithm to extract features from the CTG data. A feature extraction algorithm is presented that is capable of extracting most of the features in the SISPORTO software package as well as late and variable decelerations. The resulting features are used for classification based on both U.S. National Institutes of Health (NIH) categories and umbilical cord pH data. The first experiment uses the features to classify the results into three different categories suggested by the NIH and commonly being used in practice in hospitals across the United States. In addition, the algorithms developed here were used to predict cord pH levels, the actual condition that the three NIH categories are used to attempt to measure. This thesis demonstrates the importance of machine learning in Maternal and Fetal Medicine. It provides assistance for the obstetricians in assessing the state of the fetus better than the category methods, as only about 30% of the patients in the Pathological category suffer from acidosis, while the majority of acidotic babies were in the suspect category, which is considered lower risk. By predicting the direct indicator of acidosis, umbilical cord pH, this work demonstrates a methodology to achieve a more accurate prediction of fetal outcomes using Fetal Heartrate and Uterine Activity with accuracies of greater than 99.5% for predicting categories and greater than 70% for fetal acidosis based on pH values --Abstract, page iii

    Foetal echocardiographic segmentation

    Get PDF
    Congenital heart disease affects just under one percentage of all live births [1]. Those defects that manifest themselves as changes to the cardiac chamber volumes are the motivation for the research presented in this thesis. Blood volume measurements in vivo require delineation of the cardiac chambers and manual tracing of foetal cardiac chambers is very time consuming and operator dependent. This thesis presents a multi region based level set snake deformable model applied in both 2D and 3D which can automatically adapt to some extent towards ultrasound noise such as attenuation, speckle and partial occlusion artefacts. The algorithm presented is named Mumford Shah Sarti Collision Detection (MSSCD). The level set methods presented in this thesis have an optional shape prior term for constraining the segmentation by a template registered to the image in the presence of shadowing and heavy noise. When applied to real data in the absence of the template the MSSCD algorithm is initialised from seed primitives placed at the centre of each cardiac chamber. The voxel statistics inside the chamber is determined before evolution. The MSSCD stops at open boundaries between two chambers as the two approaching level set fronts meet. This has significance when determining volumes for all cardiac compartments since cardiac indices assume that each chamber is treated in isolation. Comparison of the segmentation results from the implemented snakes including a previous level set method in the foetal cardiac literature show that in both 2D and 3D on both real and synthetic data, the MSSCD formulation is better suited to these types of data. All the algorithms tested in this thesis are within 2mm error to manually traced segmentation of the foetal cardiac datasets. This corresponds to less than 10% of the length of a foetal heart. In addition to comparison with manual tracings all the amorphous deformable model segmentations in this thesis are validated using a physical phantom. The volume estimation of the phantom by the MSSCD segmentation is to within 13% of the physically determined volume

    A Pipeline for Clustering by Compression with Application to Patient Stratification in Spondyloarthritis

    Get PDF
    Funding Information: The authors acknowledge Fundação para a Ciência e Tecnologia, LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020 and Instituto de Telecomunicações Research Unit, ref. UIDB/50008/2020, and UIDP/50008/2020. The authors also acknowledge the Project PREDICT (PTDC/CCI-CIF/29877/2017), funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Programa Operacional Regional LISBOA (LISBOA2020), and by national funds, through Fundacção para a Ciência e Tecnologia (FCT), and projects MATISSE (DSAIPA/DS/0026/2019), MONET (PTDC/CCI-BIO/4180/2020) and SmartGlauco (PTDC/CTM-REF/2679/2020). Publisher Copyright: © 2023 by the authors.The normalized compression distance (NCD) is a similarity measure between a pair of finite objects based on compression. Clustering methods usually use distances (e.g., Euclidean distance, Manhattan distance) to measure the similarity between objects. The NCD is yet another distance with particular characteristics that can be used to build the starting distance matrix for methods such as hierarchical clustering or K-medoids. In this work, we propose Zgli, a novel Python module that enables the user to compute the NCD between files inside a given folder. Inspired by the CompLearn Linux command line tool, this module iterates on it by providing new text file compressors, a new compression-by-column option for tabular data, such as CSV files, and an encoder for small files made up of categorical data. Our results demonstrate that compression by column can yield better results than previous methods in the literature when clustering tabular data. Additionally, the categorical encoder shows that it can augment categorical data, allowing the use of the NCD for new data types. One of the advantages is that using this new feature does not require knowledge or context of the data. Furthermore, the fact that the new proposed module is written in Python, one of the most popular programming languages for machine learning, potentiates its use by developers to tackle problems with a new approach based on compression. This pipeline was tested in clinical data and proved a promising computational strategy by providing patient stratification via clusters aiding in precision medicine.publishersversionpublishe

    Normalized Web Distance and Word Similarity

    Get PDF
    There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592
    corecore