3,357 research outputs found

    Partial mixture model for tight clustering of gene expression time-course

    Get PDF
    Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

    A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system

    Get PDF
    The advancement of technology had encouraged mankind to design and create useful equipment and devices. These equipment enable users to fully utilize them in various applications. Pulp mill is one of the heavy industries that consumes large amount of electricity in its production. Due to this, any malfunction of the equipment might cause mass losses to the company. In particular, the breakdown of the generator would cause other generators to be overloaded. In the meantime, the subsequence loads will be shed until the generators are sufficient to provide the power to other loads. Once the fault had been fixed, the load shedding scheme can be deactivated. Thus, load shedding scheme is the best way in handling such condition. Selected load will be shed under this scheme in order to protect the generators from being damaged. Multi Criteria Decision Making (MCDM) can be applied in determination of the load shedding scheme in the electric power system. In this thesis two methods which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis, a series of analyses are conducted and the results are determined. Among these two methods which are AHP and TOPSIS, the results shown that TOPSIS is the best Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill system. TOPSIS is the most effective solution because of the highest percentage effectiveness of load shedding between these two methods. The results of the AHP and TOPSIS analysis to the pulp mill system are very promising

    Brain Connectivity Networks for the Study of Nonlinear Dynamics and Phase Synchrony in Epilepsy

    Get PDF
    Assessing complex brain activity as a function of the type of epilepsy and in the context of the 3D source of seizure onset remains a critical and challenging endeavor. In this dissertation, we tried to extract the attributes of the epileptic brain by looking at the modular interactions from scalp electroencephalography (EEG). A classification algorithm is proposed for the connectivity-based separation of interictal epileptic EEG from normal. Connectivity patterns of interictal epileptic discharges were investigated in different types of epilepsy, and the relation between patterns and the epileptogenic zone are also explored in focal epilepsy. A nonlinear recurrence-based method is applied to scalp EEG recordings to obtain connectivity maps using phase synchronization attributes. The pairwise connectivity measure is obtained from time domain data without any conversion to the frequency domain. The phase coupling value, which indicates the broadband interdependence of input data, is utilized for the graph theory interpretation of local and global assessment of connectivity activities. The method is applied to the population of pediatric individuals to delineate the epileptic cases from normal controls. A probabilistic approach proved a significant difference between the two groups by successfully separating the individuals with an accuracy of 92.8%. The investigation of connectivity patterns of the interictal epileptic discharges (IED), which were originated from focal and generalized seizures, was resulted in a significant difference ( ) in connectivity matrices. It was observed that the functional connectivity maps of focal IED showed local activities while generalized cases showed global activated areas. The investigation of connectivity maps that resulted from temporal lobe epilepsy individuals has shown the temporal and frontal areas as the most affected regions. In general, functional connectivity measures are considered higher order attributes that helped the delineation of epileptic individuals in the classification process. The functional connectivity patterns of interictal activities can hence serve as indicators of the seizure type and also specify the irritated regions in focal epilepsy. These findings can indeed enhance the diagnosis process in context to the type of epilepsy and effects of relative location of the 3D source of seizure onset on other brain areas

    Machine vibration monitoring for diagnostics through hypothesis testing

    Get PDF
    Nowadays, the subject of machine diagnostics is gathering growing interest in the research field as switching from a programmed to a preventive maintenance regime based on the real health conditions (i.e., condition-based maintenance) can lead to great advantages both in terms of safety and costs. Nondestructive tests monitoring the state of health are fundamental for this purpose. An effective form of condition monitoring is that based on vibration (vibration monitoring), which exploits inexpensive accelerometers to perform machine diagnostics. In this work, statistics and hypothesis testing will be used to build a solid foundation for damage detection by recognition of patterns in a multivariate dataset which collects simple time features extracted from accelerometric measurements. In this regard, data from high-speed aeronautical bearings were analyzed. These were acquired on a test rig built by the Dynamic and Identification Research Group (DIRG) of the Department of Mechanical and Aerospace Engineering at Politecnico di Torino. The proposed strategy was to reduce the multivariate dataset to a single index which the health conditions can be determined. This dimensionality reduction was initially performed using Principal Component Analysis, which proved to be a lossy compression. Improvement was obtained via Fisher’s Linear Discriminant Analysis, which finds the direction with maximum distance between the damaged and healthy indices. This method is still ineffective in highlighting phenomena that develop in directions orthogonal to the discriminant. Finally, a lossless compression was achieved using the Mahalanobis distance-based Novelty Indices, which was also able to compensate for possible latent confounding factors. Further, considerations about the confidence, the sensitivity, the curse of dimensionality, and the minimum number of samples were also tackled for ensuring statistical significance. The results obtained here were very good not only in terms of reduced amounts of missed and false alarms, but also considering the speed of the algorithms, their simplicity, and the full independence from human interaction, which make them suitable for real time implementation and integration in condition-based maintenance (CBM) regimes

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel
    • …
    corecore