Search CORE

31 research outputs found

Personalized modeling for prediction with decision-path models

Author: Cooper GF
Ferreira A
Oliveira AC
Ribeiro GA
Visweswaran S
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/06/2015
Field of study

Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach

D-Scholarship@Pitt

A Deep Learning Anomaly Detection Method in Textual Data

Author: Jafari Amir
Publication venue
Publication date: 25/11/2022
Field of study

In this article, we propose using deep learning and transformer architectures combined with classical machine learning algorithms to detect and identify text anomalies in texts. Deep learning model provides a very crucial context information about the textual data which all textual context are converted to a numerical representation. We used multiple machine learning methods such as Sentence Transformers, Auto Encoders, Logistic Regression and Distance calculation methods to predict anomalies. The method are tested on the texts data and we used syntactic data from different source injected into the original text as anomalies or use them as target. Different methods and algorithm are explained in the field of outlier detection and the results of the best technique is presented. These results suggest that our algorithm could potentially reduce false positive rates compared with other anomaly detection methods that we are testing.Comment: 8 Pages, 4 Figure

arXiv.org e-Print Archive

A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Author: Clarke Daniel
Dey Soumyabrata
Garai Subrata
Rao A. Ravishankar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/04/2023
Field of study

The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterative k-means clustering algorithm is applied to features derived through a split-apply-combine paradigm used in the database literature. Outliers are identified as singleton or small clusters. This algorithm is swept across the dataset in a searchlight manner. The dimensions that contain outliers are combined in pairs with other dimensions using a susbset scan technique to gain further insight into the outliers. We illustrate this system by anaylzing open health care data released by New York State. We apply our iterative k-means searchlight followed by subset scanning. Several anomalous trends in the data are identified, including cost overruns at specific hospitals, and increases in diagnoses such as suicides. These constitute novel findings in the literature, and are of potential use to regulatory agencies, policy makers and concerned citizens.Comment: 2018 International Joint Conference on Neural Networks (IJCNN

arXiv.org e-Print Archive

RobustSPAM for Inference from Noisy Longitudinal Data and Preservation of Privacy

Author: Aivaliotis G
Kowalik L
Palczewska A
Palczewski J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

The availability of complex temporal datasets in social, health and consumer contexts has driven the development of pattern mining techniques that enable the use of classical machine learning tools for model building. In this work we introduce a robust temporal pattern mining framework for finding predictive patterns in complex timestamped multivariate and noisy data. We design an algorithm RobustSPAM that enables mining of temporal patterns from data with noisy timestamps. We apply our algorithm to social care data from a local government body and investigate how the efficiency and accuracy of the method depends on the level of noise. We further explore the trade-off between the loss of predictivity due to perturbation of timestamps and the risk of person re-identification

Crossref

White Rose Research Online

Clinical Time Series Prediction with a Hierarchical Dynamical System

Author: C. Rasmussen
C.J. Kim
E. Bibbona
J. Ko
J. Wang
M. Hauskrecht
R. Kalman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref