9 research outputs found

    Data Mining Integration with PostgreSQL Extension by K-Means, ID3 and 1R Method

    Get PDF
    Data mining is a tool that allows users to quickly access large amounts of data. The purpose of this study was to analyze the integration of data mining technique algorithms into the PostgreSQL database management system. The method used in this research is K-Means, ID3 and 1R, the tools used to implement data mining using RapidMiner and PostgreSQL tools. In this study, the number of rows to be analyzed is 100,000 records, 500,000 records, and 1,000,000 records. The results obtained are the algorithm implemented to validate the data by using an experimental design that serves to observe the time that the analysis of the algorithm that has been integrated into the DBMS is smaller than the results from Rapidminer. As the number of records increases, data analysis becomes difficult using RapidMiner.Data mining techniques, Database management system, Partition, Response tim

    U-Control Chart Based Differential Evolution Clustering for Determining the Number of Cluster in k-Means

    Get PDF
    The automatic clustering differential evolution (ACDE) is one of the clustering methods that are able to determine the cluster number automatically. However, ACDE still makes use of the manual strategy to determine k activation threshold thereby affecting its performance. In this study, the ACDE problem will be ameliorated using the u-control chart (UCC) then the cluster number generated from ACDE will be fed to k-means. The performance of the proposed method was tested using six public datasets from the UCI repository about academic efficiency (AE) and evaluated with Davies Bouldin Index (DBI) and Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches

    Improvements for determining the number of clusters in k-means for innovation databases in SMEs

    Get PDF
    The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadísticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering

    Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

    Get PDF
    Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

    Economic Design of X-bar Chart Using Genetic Algorithm

    Get PDF
    Control chart is a key tool in Statistical Process Control. This chart is one type of statistical tool which is used to monitor the quality of a process. It gives a visual representation of the status of the process indication whether the process is under control or not. It is used for finding any variation present in any process. Control charts display the variation in a process, so that anyone can easily determine whether the process is within control or it is out of control. For the design of X-bar control chart we need to find the optimal values of sample size, sampling frequency and width of control limit. In our work, we made a computer program in MATLAB based on Genetic Algorithm for finding the optimal values of above three parameters so that the total expected cost is minimized. Our result showed that Genetic Algorithm provides better result as compared to others reported in the literature

    Integrating Multiobjective Optimization With The Six Sigma Methodology For Online Process Control

    Get PDF
    Over the past two decades, the Define-Measure-Analyze-Improve-Control (DMAIC) framework of the Six Sigma methodology and a host of statistical tools have been brought to bear on process improvement efforts in today’s businesses. However, a major challenge of implementing the Six Sigma methodology is maintaining the process improvements and providing real-time performance feedback and control after solutions are implemented, especially in the presence of multiple process performance objectives. The consideration of a multiplicity of objectives in business and process improvement is commonplace and, quite frankly, necessary. However, balancing the collection of objectives is challenging as the objectives are inextricably linked, and, oftentimes, in conflict. Previous studies have reported varied success in enhancing the Six Sigma methodology by integrating optimization methods in order to reduce variability. These studies focus these enhancements primarily within the Improve phase of the Six Sigma methodology, optimizing a single objective. The current research and practice of using the Six Sigma methodology and optimization methods do little to address the real-time feedback and control for online process control in the case of multiple objectives. This research proposes an innovative integrated Six Sigma multiobjective optimization (SSMO) approach for online process control. It integrates the Six Sigma DMAIC framework with a nature-inspired optimization procedure that iteratively perturbs a set of decision variables providing feedback to the online process, eventually converging to a set of tradeoff process configurations that improves and maintains process stability. For proof of concept, the approach is applied to a general business process model – a well-known inventory management model – that is formally defined and specifies various process costs as objective functions. The proposed iv SSMO approach and the business process model are programmed and incorporated into a software platform. Computational experiments are performed using both three sigma (3σ)-based and six sigma (6σ)-based process control, and the results reveal that the proposed SSMO approach performs far better than the traditional approaches in improving the stability of the process. This research investigation shows that the benefits of enhancing the Six Sigma method for multiobjective optimization and for online process control are immense

    Study on New Sampling Plans and Optimal Integration with Proactive Maintenance in Production Systems

    Get PDF
    Sampling plans are statistical process control (SPC) tools used mainly in production processes. They are employed to control processes by monitoring the quality of produced products and alerting for necessary adjustments or maintenance. Sampling is used when an undesirable change (shift) in a process is unobservable and needs time to discover. Basically, the shift occurs when an assignable cause affects the process. Wrong setups, defective raw materials, degraded components are examples of assignable causes. The assignable cause causes a variable (or attribute) quality characteristic to shift from the desired state to an undesired state. The main concern of sampling is to observe a process shift quickly by signaling a true alarm, at which, maintenance is performed to restore the process to its normal operating conditions. While responsive maintenance is performed if a shift is detected, proactive maintenance such as age-replacement is integrated with the design of sampling. A sampling plan is designed economically or economically-statistically. An economical design does not assess the system performance, whereas the economic-statistical design includes constraints on system performance such as the average outgoing quality and the effective production rate. The objective of this dissertation is to study sampling plans by attributes. Two studies are conducted in this dissertation. In the first study, a sampling model is developed for attribute inspection in a multistage system with multiple assignable causes that could propagate downstream. In the second study, an integrated model of sampling and maintenance with maintenance at the time of the false alarm is proposed. Most of the sampling plans are designed based on the occurrence of one assignable cause. Therefore, a sampling plan that allows two assignable causes to occur is developed in the first study. A multistage serial system of two unreliable machines with one assignable cause that could occur on each machine is assumed where the joint occurrence of assignable causes propagates the process\u27s shift to a higher value. As a result, the system state at any time is described by one in-control and three out-of-control states where the evolution from a state to another depends on the competencies between shifts. A stochastic methodology to model all competing scenarios is developed. This methodology forms a base that could be used if the number of machines and/or states increase. In the second study, an integrated model of sampling and scheduled maintenance is proposed. In addition to the two opportunities for maintenance at the true alarm and scheduled maintenance, an additional opportunity for preventive maintenance at the time of a false alarm is suggested. Since a false alarm could occur at any sampling time, preventive maintenance is assumed to increase with time. The effectiveness of the proposed model is compared to the effectiveness of separate models of scheduled maintenance and sampling. Inspired by the conducted studies, different topics of sampling and maintenance are proposed for future research. Two topics are suggested for integrating sampling with selective maintenance. The third topic is an extension of the first study where more than two shifts can occur simultaneously

    Economic Design of Control Charts Using Metaheuristic Approaches

    Get PDF
    Statistical Process Control (SPC) is a collection of problem solving tools useful in achieving process stability and improving capability through the reduction of variability using statistical methods. It can help industries in reduction of cost, improvement of quality and pursuit of continuous improvement. Among all the SPC tools, the control chart is most widely used in practice. Out of all the control charts, chart is the simplest to use and hence most popularly used for monitoring and controlling processes in an industry.A process may go out-of-control due to shift in process mean and/or process variance. To detect both types of shifts, R chart is often used along with chart. The design of chart refers to selection of three design variables such as sample size (n), sampling interval (h) and width of control limits (k). On the other hand, the joint design of and R charts involves four design variables i.e., sample size (n), sampling interval (h), and widths of control limits for both charts (i.e., k1 and k2). There are four types of control chart designs, namely (i) heuristic design, (ii) statistical design, (iii) economic design, and (iv) economic statistical design. In heuristic design, the values of design variables are selected using some thumb rules. In statistical design, the design variables are selected in such a way that the two statistical errors, namely Type-I error ( ), and Type-II error ( ) are kept at minimum values. In economic design, a cost function is constructed involving various costs like the cost of sampling and testing, the cost of false alarm, the cost to detect and eliminate the assignable cause(s), and the cost of producing non-conforming products when the process is operating out-of-control. The design parameters of the control chart are then selected so that this cost function is minimized. The design based on combined features of statistical design and economic design is termed as economic statistical design where the cost function is minimized while satisfying the statistical constraints. The effectiveness of economic design or economic statistical design depends on the accuracy of minimization of cost function. So, use of effectively designed control charts is highly essential for ensuring quality control at minimum cost. Most of the researchers have used either approximate or traditional optimization techniques for minimizing the cost function. With time, more and more efficient optimization methods have been utilized for this purpose. There are a number of metaheuristic algorithms reported in literature for optimization in various types of design problems. Out of them one each from two different groups are selected for the present work i.e., simulated annealing (SA) and teaching-learning based optimization (TLBO). SA is a point to point based metaheuristic technique, whereas TLBO is population based technique. SA is one of the oldest metaheuristic algorithms and proved to be the most robust one, whereas TLBO is one of the most recent and promising techniques. The present work requires optimization techniques that can solve non-linear, non-differentiable, multi-variable, unconstrained as well as constrained type of objective function. Both the above techniques are capable of optimizing this type of objective function. However, from literature review it is observed that neither of these two metaheuristic approaches has been applied in economic or economic statistical design of any type of control chart. In this work, both these metaheuristic techniques (i.e., SA and TLBO) have been applied for minimization of cost function for economic as well as economic statistical design point of view for individual chart, and by taking and R charts jointly in case of continuous as well as discontinuous process. Thus, a total of the following eight distinct design cases have been considered for their optimization. 1. Economic design of chart for continuous process 2. Economic design of chart for discontinuous process 3. Economic statistical design of chart for continuous process 4. Economic statistical design of chart for discontinuous process 5. Joint economic design of and R charts for continuous process 6. Joint economic design of and R charts for discontinuous process 7. Joint economic statistical design of and R charts for continuous process 8. Joint economic statistical design of and R charts for discontinuous process All the above designs are illustrated through numerical examples taken from literature using two metaheuristics i.e., SA and TLBO separately. These two independent techniques are used to validate their results with each other. Their results are found to be superior to that reported earlier in the literature. Thus, eight types of methodologies based on SA or TLBO approach are recommended in this thesis for designing control charts from economic point of view. Sensitivity analysis has been carried out using fractional factorial design of experiments and analysis of variance for each of the eight design cases, to examine the effects of all the cost and process parameters on all the output responses such as sample size, sampling interval, width of control limits and expected loss costper unit time. The process parameters which significantly affect the output responses are identified in each of the eight design cases. These results are expected to be helpful for quality control personnel in identifying the significant factors and thereby taking utmost care in choosing their values while designing the control charts on economic basis

    Políticas de amostragem em controlo estatístico da qualidade

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsNesta Dissertação apresentam-se e estudam-se, de uma forma crítica, dois novos métodos de amostragem adaptativa e uma nova medida de desempenho de métodos de amostragem, no contexto do controlo estatístico da qualidade. Considerando como base uma carta de controlo para a média do tipo Shewhart, estudamos as suas propriedades estatísticas e realizamos estudos comparativos, em termos do seu desempenho estatístico, com alguns dos métodos mais referenciados na literatura.Inicialmente, desenvolvemos um novo método adaptativo de amostragem no qual os intervalos entre amostras são obtidos com base na função densidade da distribuição de Laplace reduzida. Este método revela-se, particularmente, eficiente na deteção de moderadas e grandes alterações da média, pouco sensível à limitação do menor intervalo de amostragem e robusto face a diferentes situações consideradas para a não normalidade da característica da qualidade. Em determinadas situações, este método é sempre mais eficiente do que o método com intervalos de amostragem adaptativos,dimensões amostrais fixas e coeficientes dos limites de controlo fixos. Tendo como base o método de amostragem definido no ponto anterior e um método no qual os intervalos de amostragem são definidos antes do início do controlo do processo com base na taxa cumulativa de risco do sistema, apresentamos um novo método de amostragem que combina o método de intervalos predefinidos com o método de intervalos adaptativos. Neste método, os instantes de amostragem são definidos pela média ponderada dos instantes dos dois métodos, atribuindo-se maior peso ao método adaptativo para alterações moderadas (onde o método predefinido é menos eficaz) e maior peso ao método predefinido nos restantes casos (onde o método adaptativo é menos eficaz). Desta forma, os instantes de amostragem, inicialmente calendarizados de acordo com as expectativas de ocorrência de uma alteração tomando como base a distribuição do tempo de vida do sistema, são adaptados em função do valor da estatística amostral calculada no instante anterior. Este método é sempre mais eficiente do que o método periódico clássico, o que não acontece com nenhum outro esquema adaptativo, e do que o método de amostragem VSI para alguns pares de amostragem, posicionando-se como uma forte alternativa aos procedimentos de amostragem encontrados na literatura. Por fim, apresentamos uma nova medida de desempenho de métodos de amostragem. Considerando que dois métodos em comparação têm o mesmo tempo médio de mau funcionamento, o desempenho dos métodos é comparado através do número médio de amostras recolhidas sob controlo. Tendo em conta o tempo de vida do sistema, com diferentes taxas de risco, esta medida mostra-se robusta e permite, num contexto económico, um melhor controlo de custos por unidade de tempo
    corecore