1,127 research outputs found
Classification of Cardiotocography Data with WEKA
Cardiotocography (CTG) records fetal heart rate (FHR) and uterine contractions (UC) simultaneously. Cardiotocography trace patterns help doctors to understand the state of the fetus. Even after the introduction of cardiotocograph, the capacity to predict is still inaccurate. This paper evaluates some commonly used classification methods using WEKA. Precision,Recall, F-Measrue and ROC curve have been used as the metric to evaluate the performance of classifiers. As opposed to some of the earlier research works that were unable to identify Suspicious and Pathologic patterns, the results obtained from the study in this paper could precisely identify pathologic and Suspicious cases. Best results were obtained from J48, Random Forest and Classification via Regression
Reducing the loss of information through annealing text distortion
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects
Classification of Cardiotocography Data with WEKA
Cardiotocography (CTG) records fetal heart rate (FHR) and uterine contractions (UC) simultaneously. Cardiotocography trace patterns help doctors to understand the state of the fetus. Even after the introduction of cardiotocograph, the capacity to predict is still inaccurate. This paper evaluates some commonly used classification methods using WEKA. Precision,Recall, F-Measrue and ROC curve have been used as the metric to evaluate the performance of classifiers. As opposed to some of the earlier research works that were unable to identify Suspicious and Pathologic patterns, the results obtained from the study in this paper could precisely identify pathologic and Suspicious cases. Best results were obtained from J48, Random Forest and Classification via Regression
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Computational intelligence methods for predicting fetal outcomes from heart rate patterns
In this thesis, methods for evaluating the fetal state are compared to make predictions based on Cardiotocography (CTG) data. The first part of this research is the development of an algorithm to extract features from the CTG data. A feature extraction algorithm is presented that is capable of extracting most of the features in the SISPORTO software package as well as late and variable decelerations. The resulting features are used for classification based on both U.S. National Institutes of Health (NIH) categories and umbilical cord pH data. The first experiment uses the features to classify the results into three different categories suggested by the NIH and commonly being used in practice in hospitals across the United States. In addition, the algorithms developed here were used to predict cord pH levels, the actual condition that the three NIH categories are used to attempt to measure. This thesis demonstrates the importance of machine learning in Maternal and Fetal Medicine. It provides assistance for the obstetricians in assessing the state of the fetus better than the category methods, as only about 30% of the patients in the Pathological category suffer from acidosis, while the majority of acidotic babies were in the suspect category, which is considered lower risk. By predicting the direct indicator of acidosis, umbilical cord pH, this work demonstrates a methodology to achieve a more accurate prediction of fetal outcomes using Fetal Heartrate and Uterine Activity with accuracies of greater than 99.5% for predicting categories and greater than 70% for fetal acidosis based on pH values --Abstract, page iii
Foetal echocardiographic segmentation
Congenital heart disease affects just under one percentage of all live births [1].
Those defects that manifest themselves as changes to the cardiac chamber volumes
are the motivation for the research presented in this thesis.
Blood volume measurements in vivo require delineation of the cardiac chambers and
manual tracing of foetal cardiac chambers is very time consuming and operator
dependent. This thesis presents a multi region based level set snake deformable
model applied in both 2D and 3D which can automatically adapt to some extent
towards ultrasound noise such as attenuation, speckle and partial occlusion artefacts.
The algorithm presented is named Mumford Shah Sarti Collision Detection (MSSCD).
The level set methods presented in this thesis have an optional shape prior term for
constraining the segmentation by a template registered to the image in the presence
of shadowing and heavy noise.
When applied to real data in the absence of the template the MSSCD algorithm is
initialised from seed primitives placed at the centre of each cardiac chamber. The
voxel statistics inside the chamber is determined before evolution. The MSSCD stops
at open boundaries between two chambers as the two approaching level set fronts
meet. This has significance when determining volumes for all cardiac compartments
since cardiac indices assume that each chamber is treated in isolation. Comparison
of the segmentation results from the implemented snakes including a previous level
set method in the foetal cardiac literature show that in both 2D and 3D on both real
and synthetic data, the MSSCD formulation is better suited to these types of data.
All the algorithms tested in this thesis are within 2mm error to manually traced
segmentation of the foetal cardiac datasets. This corresponds to less than 10% of
the length of a foetal heart. In addition to comparison with manual tracings all the
amorphous deformable model segmentations in this thesis are validated using a
physical phantom. The volume estimation of the phantom by the MSSCD
segmentation is to within 13% of the physically determined volume
A Pipeline for Clustering by Compression with Application to Patient Stratification in Spondyloarthritis
Funding Information: The authors acknowledge Fundação para a Ciência e Tecnologia, LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020 and Instituto de Telecomunicações Research Unit, ref. UIDB/50008/2020, and UIDP/50008/2020. The authors also acknowledge the Project PREDICT (PTDC/CCI-CIF/29877/2017), funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Programa Operacional Regional LISBOA (LISBOA2020), and by national funds, through Fundacção para a Ciência e Tecnologia (FCT), and projects MATISSE (DSAIPA/DS/0026/2019), MONET (PTDC/CCI-BIO/4180/2020) and SmartGlauco (PTDC/CTM-REF/2679/2020). Publisher Copyright: © 2023 by the authors.The normalized compression distance (NCD) is a similarity measure between a pair of finite objects based on compression. Clustering methods usually use distances (e.g., Euclidean distance, Manhattan distance) to measure the similarity between objects. The NCD is yet another distance with particular characteristics that can be used to build the starting distance matrix for methods such as hierarchical clustering or K-medoids. In this work, we propose Zgli, a novel Python module that enables the user to compute the NCD between files inside a given folder. Inspired by the CompLearn Linux command line tool, this module iterates on it by providing new text file compressors, a new compression-by-column option for tabular data, such as CSV files, and an encoder for small files made up of categorical data. Our results demonstrate that compression by column can yield better results than previous methods in the literature when clustering tabular data. Additionally, the categorical encoder shows that it can augment categorical data, allowing the use of the NCD for new data types. One of the advantages is that using this new feature does not require knowledge or context of the data. Furthermore, the fact that the new proposed module is written in Python, one of the most popular programming languages for machine learning, potentiates its use by developers to tackle problems with a new approach based on compression. This pipeline was tested in clinical data and proved a promising computational strategy by providing patient stratification via clusters aiding in precision medicine.publishersversionpublishe
- …