262 research outputs found

    A Vertical and Horizontal Intelligent Dataset Reduction Approach for Cyber-Physical Power Aware Intrusion Detection Systems

    Get PDF
    The Cypher Physical Power Systems (CPPS) became vital targets for intruders because of the large volume of high speed heterogeneous data provided from the Wide Area Measurement Systems (WAMS). The Nonnested Generalized Exemplars (NNGE) algorithm is one of the most accurate classification techniques that can work with such data of CPPS. However, NNGE algorithm tends to produce rules that test a large number of input features. This poses some problems for the large volume data and hinders the scalability of any detection system. In this paper, we introduce VHDRA, a Vertical and Horizontal Data Reduction Approach, to improve the classification accuracy and speed of the NNGE algorithm and reduce the computational resource consumption. VHDRA provides the following functionalities: (1) it vertically reduces the dataset features by selecting the most significant features and by reducing the NNGE's hyperrectangles. (2) It horizontally reduces the size of data while preserving original key events and patterns within the datasets using an approach called STEM, State Tracking and Extraction Method. The experiments show that the overall performance of VHDRA using both the vertical and the horizontal reduction reduces the NNGE hyperrectangles by 29.06%, 37.34%, and 26.76% and improves the accuracy of the NNGE by 8.57%, 4.19%, and 3.78% using the Multi-, Binary, and Triple class datasets, respectively.This work was made possible by NPRP Grant # NPRP9-005-1-002 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

    Specimens at the Center: An Informatics Workflow and Toolkit for Specimen-level analysis of Public DNA database data

    Get PDF
    Major public DNA databases — NCBI GenBank, the DNA DataBank of Japan (DDBJ), and the European Molecular Biology Laboratory (EMBL) — are invaluable biodiversity libraries. Systematists and other biodiversity scientists commonly mine these databases for sequence data to use in phylogenetic studies, but such studies generally use only the taxonomic identity of the sequenced tissue, not the specimen identity. Thus studies that use DNA supermatrices to construct phylogenetic trees with species at the tips typically do not take advantage of the fact that for many individuals in the public DNA databases, several DNA regions have been sampled; and for many species, two or more individuals have been sampled. Thus these studies typically do not make full use of the multigene datasets in public DNA databases to test species coherence and select optimal sequences to represent a species. In this study, we introduce a set of tools developed in the R programming language to construct individual-based trees from NCBI GenBank data and present a set of trees for the genus Carex (Cyperaceae) constructed using these methods. For the more than 770 species for which we found sequence data, our approach recovered an average of 1.85 gene regions per specimen, up to seven for some specimens, and more than 450 species represented by two or more specimens. Depending on the subset of genes analyzed, we found up to 42% of species monophyletic. We introduce a simple tree statistic—the Taxonomic Disparity Index (TDI)—to assist in curating specimen-level datasets and provide code for selecting maximally informative (or, conversely, minimally misleading) sequences as species exemplars. While tailored to the Carex dataset, the approach and code presented in this paper can readily be generalized to constructing individual-level trees from large amounts of data for any species group

    Molecular phylogeny of pearl oysters and their relatives (Mollusca, Bivalvia, Pterioidea)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The superfamily Pterioidea is a morphologically and ecologically diverse lineage of epifaunal marine bivalves distributed throughout the tropical and subtropical continental shelf regions. This group includes commercially important pearl culture species and model organisms used for medical studies of biomineralization. Recent morphological treatment of selected pterioideans and molecular phylogenetic analyses of higher-level relationships in Bivalvia have challenged the traditional view that pterioidean families are monophyletic. This issue is examined here in light of molecular data sets composed of DNA sequences for nuclear and mitochondrial loci, and a published character data set of anatomical and shell morphological characters.</p> <p>Results</p> <p>The present study is the first comprehensive species-level analysis of the Pterioidea to produce a well-resolved, robust phylogenetic hypothesis for nearly all extant taxa. The data were analyzed for potential biases due to taxon and character sampling, and idiosyncracies of different molecular evolutionary processes. The congruence and contribution of different partitions were quantified, and the sensitivity of clade stability to alignment parameters was explored.</p> <p>Conclusions</p> <p>Four primary conclusions were reached: (1) the results strongly supported the monophyly of the Pterioidea; (2) none of the previously defined families (except for the monotypic Pulvinitidae) were monophyletic; (3) the arrangement of the genera was novel and unanticipated, however strongly supported and robust to changes in alignment parameters; and (4) optimizing key morphological characters onto topologies derived from the analysis of molecular data revealed many instances of homoplasy and uncovered synapomorphies for major nodes. Additionally, a complete species-level sampling of the genus <it>Pinctada </it>provided further insights into the on-going controversy regarding the taxonomic identity of major pearl culture species.</p

    Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

    Get PDF
    [Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality. Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams. In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high- dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc

    Genetic approach for optimizing ensembles of classifiers

    Get PDF
    Proceeding of: Twenty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS), Coconut Grove, Florida. May 15–17, 2008An ensemble of classifiers is a set of classifiers whose predictions are combined in some way to classify new instances. Early research has shown that, in general, an ensemble of classifiers is more accurate than any of the single classifiers in the ensemble. Usually the gains obtained by combining different classifiers are more affected by the chosen classifiers than by the used combination. It is common in the research on this topic to select by hand the right combination of classifiers and the method to combine them, but the approach presented in this work uses genetic algorithms for selecting the classifiers and the combination method to use. Our approach, GA-Ensemble, is inspired by a previous work, called GA-Stacking. GA-Stacking is a method that uses genetic algorithms to find domain-specific Stacking configurations. The main goal of this work is to improve the efficiency of GAStacking and to compare GA-Ensemble with current ensemble building techniques. Preliminary results have show that the approach finds ensembles of classifiers whose performance is as good as the best techniques, without having to set up manually the classifiers and the ensemble method

    GA-stacking: Evolutionary stacked generalization

    Get PDF
    Stacking is a widely used technique for combining classifiers and improving prediction accuracy. Early research in Stacking showed that selecting the right classifiers, their parameters and the meta-classifiers was a critical issue. Most of the research on this topic hand picks the right combination of classifiers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking configurations. Since this can lead to overfitting, one of the goals of this paper is to empirically evaluate the overall efficiency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach finds Stacking configurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.This work has been partially supported by the Spanish MCyT under projects TRA2007-67374-C02-02 and TIN-2005-08818-C04. Also, it has been supported under MEC grant by TIN2005-08945-C06-05. We thank anonymous reviewers for their helpful comments.Publicad

    Modelos híbridos de aprendizaje basados en instancias y reglas para Clasificación Monotónica

    Get PDF
    En los problemas de clasificación supervisada, el atributo respuesta depende de determinados atributos de entrada explicativos. En muchos problemas reales el atributo respuesta está representado por valores ordinales que deberían incrementarse cuando algunos de los atributos explicativos de entrada también lo hacen. Estos son los llamados problemas de clasificación con restricciones monotónicas. En esta Tesis, hemos revisado los clasificadores monotónicos propuestos en la literatura y hemos formalizado la teoría del aprendizaje basado en ejemplos anidados generalizados para abordar la clasificación monotónica. Propusimos dos algoritmos, un primer algoritmos voraz, que require de datos monotónicos y otro basado en algoritmos evolutivos, que es capaz de abordar datos imperfectos que presentan violaciones monotónicas entre las instancias. Ambos mejoran el acierto, el índice de no-monotonicidad de las predicciones y la simplicidad de los modelos sobre el estado-del-arte.In supervised prediction problems, the response attribute depends on certain explanatory attributes. Some real problems require the response attribute to represent ordinal values that should increase with some of the explaining attributes. They are called classification problems with monotonicity constraints. In this thesis, we have reviewed the monotonic classifiers proposed in the literature and we have formalized the nested generalized exemplar learning theory to tackle monotonic classification. Two algorithms were proposed, a first greedy one, which require monotonic data and an evolutionary based algorithm, which is able to address imperfect data with monotonic violations present among the instances. Both improve the accuracy, the non-monotinic index of predictions and the simplicity of models over the state-of-the-art.Tesis Univ. Jaén. Departamento INFORMÁTIC

    Forecasting daily water consumption: case study of Nobres (Brazil)

    Get PDF
    [EN] In order to move operational efficiency from the water system supply (Waterworks, i.e. a combined water purification plant and pumping station) forecast water consumption values 24 h ahead are required. The objective of this paper is to develop mathematical model forecast water consumption 24 h ahead for Nobres city, Mato Grosso State, in Brazil. The methodology developed comprises the following steps: (1) literature review; (2) gathering and data analysis (water consumption and climatic); (3) proposal of a model forecast water consumption; (4) calibration and verification of the proposed model; and, (5) application of model. The mathematical modelling techniques employed were Linear Regression, Fourier Series and Expert System. The results indicated that there is error average percentage of less than 10% of model indicating that provided a good fit and can be used to predict water consumption. It can be concluded that the model development which may be used for operational planning the Waterworks study.[PT] Para obter eficiência operacional em sistema de abastecimento de água (SAA) a previsão de consumo de água em curto prazo (para o próximo dia) é necessária. Esse trabalho teve o objetivo de desenvolver um modelo matemático de previsão do consumo diário de água da cidade de Nobres, Estado de Mato Grosso, no Brasil. As etapas metodologicas realizadas foram: (1) revisão de literatura; (2) descrição da área de estudo; (3) coleta e análise de dados (consumo de água e clima); (4) proposição de modelo de previsão de consumo diário de água; (5) calibração e verificação do modelo; (6) aplicação de modelo. As técnicas de modelagem matemática empregadas foram regressão linear, Séries de Fourier e sistema especialista. Os resultados indicaram um erro médio percentual do modelo inferior a 10% indicando que apresentou bom ajuste e pode ser utilizado para prever o consumo de água. Como principal conclusão apresenta-se que o modelo desenvolvido que pode ser utilizado para o planejamento operacinal do SAA estudadoOs autores agradecem à Empresa de Saneamento de Nobres Ltda. (ESAN) e ao Instituto Nacional de Meteorologia (INMET), pelo fornecimento de dados e pela colaboração com o desenvolvimento da pesquisa.Silva, WTP.; Campos, MM.; Santos, AA. (2016). Previsão consumo diário de água: um estudo de caso de Nobres (Brasil). Ingeniería del Agua. 20(2):73-85. doi:10.4995/ia.2016.4122.SWORD7385202Altunkaynak, A., Özger, M., Çakmakci, M. (2005). Water consumption of Istanbul City by using logic fuzzy. Water Resources Management, 19(5), 641-654. http://dx.doi.org/10.1007/s11269-005-7371-1Artero, A. O. (2009). Inteligência artificial: teórica e prática. Editora Livraria da Física, São Paulo, BRA.Draper, N.R., Smith, H. (1981). Applied regression analysis. John Wiley and Sons, New York, USA.Falkenberg, A. V. (2005). Previsão de consumo urbano de água em curto prazo. Dissertação de Mestrado, Publicação Mestrado em Métodos Numéricos em Engenharia, Universidade Federal do Paraná, Curitiba, BRA.Gandulfo, R. O. (1990). Séries de Fourier e convergência. Matemática Universitária,11, 27-52.Giarratano, J. C., Riley, G. D. (2004). Expert system: principles and programming. PWS Publishing Company, Boston, USA.Gujarati, D. (2000). Econometria Básica. 4. ed. Elsevier, Rio de Janeiro, BRA.IBGE CIDADES. (2016). Infográficos: evolução populacional e pirâmide etária. IBGE. Disponivel em: http://cidades.ibge.gov.br/painel/populacao.php?lang=&codmun=510590&search=mato-grosso|nobres|infogr%E1ficos:-evolu%E7%E3o-populacional-e-pir%E2mide-et%E1ria Acceso em: 27 fev. 2016James, J. F. (2011). A student's guide to Fourier Transforms. 3. ed. Cambridge University Press, Cambridge, UK. http://dx.doi.org/10.1017/CBO9780511762307Klahr, P., Waterman, D. A. (1986). Expert systems techniques, tools and applications. Addison-Wesley Pub. Co., New York, USA.Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics,33(1), 159-174. http://dx.doi.org/10.2307/2529310León, C., Martín, S., Elena, J. M., Luque, J. (2000). EXPLORE - Hybrid expert system for water networks management. Journal of Water Resources Planning and Management,126(2), 65-74. http://dx.doi.org/10.1061/(ASCE)0733-9496(2000)126:2(65)Lertpalangsunti, N., Chan, C. W., Mason, R., Tontiwachwuthikul, P. (1999). A toolset for construction of hybrid intelligent forecasting systems: application for water demand prediction. Artificial Intelligence in Engineering, 13(1), 21-42. http://dx.doi.org/10.1016/S0954-1810(98)00008-9Liao, S. H. (2005). Expert system methodology and applications - a decade review from 1995 to 2004. Expert Systems with Applications,28(1), 93-103. http://dx.doi.org/10.1016/j.eswa.2004.08.003MCidades. (2008). Municipalização dos serviços de abastecimento de água e de esgotamento sanitário no Estado do Mato Grosso: Diagnóstico, lições e perspectivas. Ministério das Cidades/Secretaria Nacional de Saneamento Ambiental/Programa de Modernização do Setor de Saneamento (PMSS)/Unidade de Gerenciamento do Programa (UGP). Disponivel em: http://www.pmss.gov.br/index.php/apoio-a-estados/166-municipalizacao-dos-servicos-de-abastecimento-de-agua-e-de-esgotamento-sanitario-no-estado-do-mato-grosso-diagnostico-licoes-e-perspectivas-continuacao Acceso em: 13 jul. 2015.Mendes, D. R. (2011). Reconhecimento de Orador em Dois. Dissertação de Mestrado, Publicação Mestrado Integrado em Engenharia Electrotécnica e de Computadores, Faculdade de Engenharia da Universidade do Porto, Porto, PRT.Nikolopoulos, C. (1997). Expert systems: introduction to fist and second generation and hybrid knowledge based system. Marcel Dekker, New York, USA.Odan, F. K. (2010). Previsão de demanda para sistema de abastecimento de água. Dissertação de Mestrado, Publicação PPG-SHS, Escola de Engenharia de São Carlos, Universidade de São Paulo, São Carlos, BRA.Rezende, S. O., Evsukoff, A. G., Garcia, A. C. B., Carvalho, A. C. P. L. F., Braga, A. P., Monard, M. C., Ebecken, N. F. F., Almeida, P. E. M., Ludermir, T. B. (2005). Sistemas inteligentes: fundamentos e aplicações. Editora Manole Ltda, Barueri, BRA.Santos, F. J. (2014). Introdução às Séries de Fourier. PUC Minas. Disponivel em: http://www.matematica.pucminas.br/profs/web_fabiano/calculo4/sf.pdf Acceso em: 23. abr. 2015SEPLAN. (2008). Informativo Socioeconômico de Mato Grosso 2005. Central de Texto. Disponivel em: http://www.seplan.mt.gov.br/sitios/indicador/informativo_populacional_%20e_economico_2008.pdf Acceso em: 17 jul. 2012Silva, R. T., Rocha, W. S. (1999). Caracterização da demanda urbana de água. BRASIL, SEDU, SPU, PNCDA, Brasília, BRA.Silva, W. T. P., Silva, L. M., Chichorro, J. F. (2008). Gestão de recursos hídricos: perspectivas do consumo per capita de água em Cuiabá. Engenharia Sanitária Ambiental,13(1), 8-14. http://dx.doi.org/10.1590/S1413-41522008000100002Simões, M. G., Shaw, I. S. (2007). Controle e modelagem fuzzy. Blucher: FAPESP, São Paulo, BRA.Spring, G. S. (1997). Critical Review of Expert System Validation in Transportation. Journal of the Transportation Research Board,1588, 104-109. http://dx.doi.org/10.3141/1588-13Stein, E. M., Shakarachi, R. (2003). Fourier analysis: an introdution. Princeton University Press, New Jersey, USA.Tian, D., Martinez, C. J., Asefa, T. (2016). Improving Short-Term Urban Water Demand Forecasts with Reforecast Analog Ensembles. Journal of Water Resources Planning and Management, 04016008 , http://dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000632Tsutiya, M. T. (2006). Abastecimento de água. Escola Politécnica de São Paulo, São Paulo, BRA.Zaharie, D., Perian, L., Negru, V., Zamfirache, F. (2011). Evolutionary Pruning of Non-Nested Generalized Exemplars. Proceedings of the 6th IEEE International Symposium on Applied Computational Intelligence and Informatics, May 19-21, Timişoara, Romania, 57-62. http://dx.doi.org/10.1109/saci.2011.5872973Zhou, S. L., McMahon, T. A., Walton, A., Lewis, J. (2000). Forecasting daily urban water demand: a case study of Melbourne. Journal of Hydrology,236(3-4), 153-164. http://dx.doi.org/10.1016/S0022-1694(00)00287-

    Interactive Search of Rules in Medical Data Using Multiobjective Evolutionary Algorithms

    Get PDF
    ABSTRACT In this work, we propose an approach for evolving rules from medical data based on an interactive multi-criteria evolutionary search: besides selecting the set of criteria and the sets of potential antecedent and consequent attributes, the user can also intervene in the searching process by marking the uninteresting rules. The marked rules are further used in estimating a supplementary optimization criterion which expresses the user&apos;s opinion on the rule quality and is taken into account in the evolutionary process
    corecore