24 research outputs found

    Infrequent item mining in multiple data streams

    Full text link
    The problem of extracting infrequent patterns from streams and building associations between these patterns is becoming increasingly relevant today as many events of interest such as attacks in network data or unusual stories in news data occur rarely. The complexity of the problem is compounded when a system is required to deal with data from multiple streams. To address these problems, we present a framework that combines the time based association mining with a pyramidal structure that allows a rolling analysis of the stream and maintains a synopsis of the data without requiring increasing memory resources. We apply the algorithms and show the usefulness of the techniques. © 2007 Crown Copyright

    Effective anomaly detection in sensor networks data streams

    Get PDF
    This paper addresses a major challenge in data mining applications where the full information about the underlying processes, such as sensor networks or large online database, cannot be practically obtained due to physical limitations such as low bandwidth or memory, storage, or computing power. Motivated by the recent theory on direct information sampling called compressed sensing (CS), we propose a framework for detecting anomalies from these largescale data mining applications where the full information is not practically possible to obtain. Exploiting the fact that the intrinsic dimension of the data in these applications are typically small relative to the raw dimension and the fact that compressed sensing is capable of capturing most information with few measurements, our work show that spectral methods that used for volume anomaly detection can be directly applied to the CS data with guarantee on performance. Our theoretical contributions are supported by extensive experimental results on large datasets which show satisfactory performance.<br /

    Multi-task transfer learning for in hospital-death prediction for ICU patients

    Full text link
    Multi-Task Transfer Learning (MTTL) is an efficient approach for learning from inter-related tasks with small sample size and imbalanced class distribution. Since the intensive care unit (ICU) data set (publicly available in Physionet) has subjects from four different ICU types, we hypothesizethat there is an underlying relatedness amongst various ICU types. Therefore, this study aims to explore MTTL model for in-hospital mortality prediction of ICU patients. We used singletask learning (STL) approach on the augmented data as well as individual ICU data and compared the performance with the proposed MTTL model. As a performance measurement metrics, we used sensitivity (Sens), positive predictivity (+Pred), and Score. MTTL with class balancing showed the best performance with score of 0.78, 0.73, o.52 and 0.63 for ICU type 1(Coronary care unit), 2 (Cardiac surgery unit), 3 (Medical ICU) and 4 (Surgical ICU) respectively. In contrast the maximum score obtained using STL approach was 0.40 for ICU type 1 &amp; 2. These results indicates that the performance of in-hospital mortality can be improved using ICU type information and by balancing the &rsquo;non-survivor&rsquo; class. The findings of the study may be useful for quantifying the quality of ICU care, managing ICU resources and selecting appropriate interventions

    A framework for classifying online mental health related communities with an interest in depression

    Full text link
    Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines

    Imipramine Is an Orally Active Drug against Both Antimony Sensitive and Resistant Leishmania donovani Clinical Isolates in Experimental Infection

    Get PDF
    Background: In an endeavor to find an orally active and affordable antileishmanial drug, we tested the efficacy of a cationic amphiphilic drug, imipramine, commonly used for the treatment of depression in humans. The only available orally active antileishmanial drug is miltefosine with long half life and teratogenic potential limits patient compliance. Thus there is a genuine need for an orally active antileishmanial drug. Previously it was shown that imipramine, a tricyclic antidepressant alters the protonmotive force in promastigotes, but its in vivo efficacy was not reported. Methodology/Principal Findings: Here we show that the drug is highly active against antimony sensitive and resistant Leishmania donovani in both promastigotes and intracellular amastigotes and in LD infected hamster model. The drug wasfound to decrease the mitochondrial transmembrane potential of Leishmania donovani (LD) promastigotes and purified amastigotes after 8 h of treatment, whereas miltefosine effected only a marginal change even after 24 h. The drug restores defective antigen presenting ability of the parasitized macrophages. The status of the host protective factors TNF a, IFN c and iNOS activity increased with the concomitant decrease in IL 10 and TGF b level in imipramine treated infected hamsters and evolution of matured sterile hepatic granuloma. The 10-day therapeutic window as a monotherapy, showing about 90% clearance of organ parasites in infected hamsters regardless of their SSG sensitivity. Conclusions: This study showed that imipramine possibly qualifies for a new use of an old drug and can be used as an effective orally active drug for the treatment of Kala-azar

    Improved risk predictions via sparse imputation of patient conditions in electronic medical records

    Full text link
    Electronic Medical Records (EMR) are increasingly used for risk prediction. EMR analysis is complicated by missing entries. There are two reasons - the &ldquo;primary reason for admission&rdquo; is included in EMR, but the co-morbidities (other chronic diseases) are left uncoded, and, many zero values in the data are accurate, reflecting that a patient has not accessed medical facilities. A key challenge is to deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflecting the fact that patients have some, but not all diseases. We propose a novel model to fill-in these missing values, and use the new representation for prediction of key hospital events. To &ldquo;fill-in&rdquo; missing values, we represent the feature-patient matrix as a product of two low rank factors, preserving the sparsity property in the product. Intuitively, the product regularization allows sparse imputation of patient conditions reflecting common comorbidities across patients. We develop a scalable optimization algorithm based on Block coordinate descent method to find an optimal solution. We evaluate the proposed framework on two real world EMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions). Our result shows that the AUC for 3 months admission prediction is improved significantly from (0.741 to 0.786) for Cancer data and (0.678 to 0.724) for AMI data. We also extend the proposed method to a supervised model for predicting of multiple related risk outcomes (e.g. emergency presentations and admissions in hospital over 3, 6 and 12 months period) in an integrated framework. For this model, the AUC averaged over outcomes is improved significantly from (0.768 to 0.806) for Cancer data and (0.685 to 0.748) for AMI data

    Prediciton of emergency events: a multi-task multi-label learning approach

    Full text link
    Prediction of patient outcomes is critical to plan resources in an hospital emergency department. We present a method to exploit longitudinal data from Electronic Medical Records (EMR), whilst exploiting multiple patient outcomes. We divide the EMR data into segments where each segment is a task, and all tasks are associated with multiple patient outcomes over a 3, 6 and 12 month period. We propose a model that learns a prediction function for each task-label pair, interacting through two subspaces: the first subspace is used to impose sharing across all tasks for a given label. The second subspace captures the task-specific variations and is shared across all the labels for a given task. The proposed model is formulated as an iterative optimization problems and solved using a scalable and efficient Block co-ordinate descent (BCD) method. We apply the proposed model on two hospital cohorts - Cancer and Acute Myocardial Infarction (AMI) patients collected over a two year period from a large hospital emergency department. We show that the predictive performance of our proposed models is significantly better than those of several state-of-the-art multi-task and multi-label learning methods

    Improved Subspace Clustering via Exploitation of Spatial Constraints

    No full text
    We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation

    Multiple task transfer learning with small sample sizes

    Full text link
    Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning
    corecore