1,722 research outputs found

    Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

    Get PDF
    Biomarkers which predict patient’s survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time

    Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data

    Get PDF
    In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrap632+632+and k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tosim96sim 96%correct classification rates with less than 10% of the original features

    A Survey on Data Mining Techniques Applied to Energy Time Series Forecasting

    Get PDF
    Data mining has become an essential tool during the last decade to analyze large sets of data. The variety of techniques it includes and the successful results obtained in many application fields, make this family of approaches powerful and widely used. In particular, this work explores the application of these techniques to time series forecasting. Although classical statistical-based methods provides reasonably good results, the result of the application of data mining outperforms those of classical ones. Hence, this work faces two main challenges: (i) to provide a compact mathematical formulation of the mainly used techniques; (ii) to review the latest works of time series forecasting and, as case study, those related to electricity price and demand markets.Ministerio de Economía y Competitividad TIN2014-55894-C2-RJunta de Andalucía P12- TIC-1728Universidad Pablo de Olavide APPB81309

    The 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies

    Get PDF
    This publication comprises the papers presented at the 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland, on May 9-11, 1995. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    Intelligent maintenance management in a reconfigurable manufacturing environment using multi-agent systems

    Get PDF
    Thesis (M. Tech.) -- Central University of Technology, Free State, 2010Traditional corrective maintenance is both costly and ineffective. In some situations it is more cost effective to replace a device than to maintain it; however it is far more likely that the cost of the device far outweighs the cost of performing routine maintenance. These device related costs coupled with the profit loss due to reduced production levels, makes this reactive maintenance approach unacceptably inefficient in many situations. Blind predictive maintenance without considering the actual physical state of the hardware is an improvement, but is still far from ideal. Simply maintaining devices on a schedule without taking into account the operational hours and workload can be a costly mistake. The inefficiencies associated with these approaches have contributed to the development of proactive maintenance strategies. These approaches take the device health state into account. For this reason, proactive maintenance strategies are inherently more efficient compared to the aforementioned traditional approaches. Predicting the health degradation of devices allows for easier anticipation of the required maintenance resources and costs. Maintenance can also be scheduled to accommodate production needs. This work represents the design and simulation of an intelligent maintenance management system that incorporates device health prognosis with maintenance schedule generation. The simulation scenario provided prognostic data to be used to schedule devices for maintenance. A production rule engine was provided with a feasible starting schedule. This schedule was then improved and the process was determined by adhering to a set of criteria. Benchmarks were conducted to show the benefit of optimising the starting schedule and the results were presented as proof. Improving on existing maintenance approaches will result in several benefits for an organisation. Eliminating the need to address unexpected failures or perform maintenance prematurely will ensure that the relevant resources are available when they are required. This will in turn reduce the expenditure related to wasted maintenance resources without compromising the health of devices or systems in the organisation

    FlashProfile: A Framework for Synthesizing Data Profiles

    Get PDF
    We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across 153153 tasks over 7575 large real datasets, we observe a median profiling time of only 0.7\sim\,0.7\,s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201
    corecore