3 research outputs found

    Klasifikasi Teks menggunakan Genetic Programming dengan Implementasi Web Scraping dan Map Reduce

    Get PDF
    Classification of text documents on online media is a big data problem and requires automation. Research has developed a text classification system with pre-processing using map-reduce and web scraping data collection. This study aims to evaluate text classification performance by combining genetic programming algorithms, map-reduce and web scraping for processing large data in the form of text. Data collection was carried out by observing web-based scraping. Data was collected by reducing 8126 duplicates. Map-reduce has tokenized and stopped-word removal with 28507 terms with 4306 unique terms and 24201 duplication terms. Text classification evaluation shows that a single tree produces better accuracy (0.7072) than a decision tree (0.6874), and the lowest is a multi-tree (0.6726). For the acquisition of genetic programming support values with the multi-tree, the highest average support is 0.3854, followed by the decision tree with 0.3584 and the smallest single tree with 0.3494. In general, the amount of support is not in line with the accuracy value achieved.Classification of text documents on online media is a big data problem and requires automation. Research has developed a text classification system with pre-processing using map-reduce and web scraping data collection. This study aims to evaluate text classification performance by combining genetic programming algorithms, map-reduce and web scraping for processing large data in the form of text. Data collection was carried out by observing web-based scraping. Data was collected by reducing 8126 duplicates. Map-reduce has tokenized and stopped-word removal with 28507 terms with 4306 unique terms and 24201 duplication terms. Text classification evaluation shows that a single tree produces better accuracy (0.7072) than a decision tree (0.6874), and the lowest is a multi-tree (0.6726). For the acquisition of genetic programming support values with the multi-tree, the highest average support is 0.3854, followed by the decision tree with 0.3584 and the smallest single tree with 0.3494. In general, the amount of support is not in line with the accuracy value achieved

    Combinando Técnicas de Mineração de Dados para Melhorar o Processo de Detecção Automática de Arritmia Cardíaca

    Get PDF
    Algoritmos de Classificação Automática são ferramentas promissoras no auxílio de diagnósticos de Arritmia Cardíaca (AC), entretanto sofrem com dois problemas: (1) muitos atributos numéricos gerados na decomposição de um Eletrocardiograma (ECG); e (2) o número de pacientes com ACs é muito menor do que aqueles tidos como normais (bases desbalanceadas). Nesse trabalho, combinamos técnicas de mineração de dados (i.e. clustering, feature selection e oversampling) para criar modelos de classificação mais eficazes. Em nossas avaliações, utilizando uma coleção da UCI, melhoramos significativamente a eficácia do algoritmo Random Forest, alcançando uma acurácia de 88%, valor superior ao melhor já reportado na literatura

    Modeling Price Volatility based on a Genetic Programming Approach

    Get PDF
    Business profitability is highly dependent on risk management strategies to hedge future cash flow uncertainty. Commodity price shocks and fluctuations are key risks for companies with global supply chains. The purpose of this paper is to show how Artificial Intelligence (AI) techniques can be used to model the volatility of commodity prices. More specifically we introduce a new model – LIQ-GARCH - that uses Genetic Programming to forecast volatility. The newly generated model is then used to forecast the volatility of the following three indexes: the Commodity Research Bureau (CRB) index, the West Texas Intermediate (WTI) oil futures prices and the Baltic Dry Index (BDI). The empirical model performance tests show that the newly generated model in this paper is considerably more accurate than the traditional GARCH model. As a result, this model can help businesses to design optimal risk management strategies and to hedge themselves against price uncertainty