9 research outputs found

    A Parallel Mining Algorithm for Maximum Erasable Itemset Based on Multi-core Processor

    Get PDF
    Mining the erasable itemset is an interesting research domain, which has been applied to solve the problem of how to efficiently use limited funds to optimise production in economic crisis. After the problem of mining the erasable itemset was posed, researchers have proposed many algorithms to solve it, among which mining the maximum erasable itemset is a significant direction for research. Since all subsets of the maximum erasable itemset are erasable itemsets, all erasable itemsets can be obtained by mining the maximum erasable itemset, which reduces both the quantity of candidate and resultant itemsets generated during the mining process. However, computing many itemset values still takes a lot of CPU time when mining huge amounts of data. And it is difficult to solve the problem quickly with sequential algorithms. Therefore, this proposed study presents a parallel algorithm for the mining of maximum erasable itemsets, called PAMMEI, based on a multi-core processor platform. The algorithm divides the entire mining task into multiple subtasks and assigns them to multiple processor cores for parallel execution, while using an efficient pruning strategy to downsize the space to be searched and increase the mining speed. To verify the efficiency of the PAMMEI algorithm, the paper compares it with most advanced algorithms. The experimental results show that PAMMEI is superior to the comparable algorithms with respect to runtime, memory usage and scalability

    Clustering and Classification of Like-Minded People from Their Tweets

    Get PDF
    International audienceSeveral challenges accompanied the growth of online social networks, such as grouping people with similar interest. Grouping like-minded people is of a high importance. Indeed, it leads to many applications like link prediction and friend or product suggestion, and explains various social phenomenon. In this paper, we present two methods of grouping like-minded people based on their textual posts. Compared to three baseline methods K-Means, LDA and the Scalable Multi-stage Clustering algorithm (SMSC), our algorithms achieves relative improvements on two corpora of tweets

    Classification of n-th order limit language in formal language classes

    Get PDF
    The study of splicing systems and their language has grown rapidly since Paun developed a splicing system known as a regular splicing scheme that produces a regular language. Since then, the researchers have been eager to classify the splicing language into certain classes in the Chomsky hierarchy, such as context-free language, contextsensitive language and recursive enumerable language. Previously, the study on the nth order limit language was conducted from the biological perspective to the limit language produced. Still, no research has been done from the generation of language point of view. This research presents a generalization on the type of classes of the formal language, the n-th order limit language. The cases to obtain the n-th order limit language are revisited and used to obtain the types of language classes according to the Chomsky hierarchy produced by the n-th order limit language

    Semantic models in Web based Educational System integration

    Get PDF
    International audienceWeb based e-Education systems are an important kind of information systems that benefited from Web standards for implementation, deployment and integration. In this paper we propose and evaluate a semantic Web approach to support the features and interoperability of a real industrial e-Education system in production. We show how ontology-based knowledge representation supports the required features, their extension to new ones and the integration of external resources (e.g. official standards) as well as the interoperability with other systems.We designed and implemented a proof of concept in an industrial context that was qualitatively and quantitatively evaluated and we benchmarked different alternatives on real data and real queries. We present a complete evaluation of the quality of service and response time in this industrial context and we show that on a real-world tesbed Semantic Web based solutions can meet the industrial requirements, both in terms of functionalities and efficiency compared to existing operational solutions. We also show that an ontology-oriented modelling opens up new opportunities of advanced functionalities supporting resource recommendation and adaptive learning

    On the Construction and Cryptanalysis of Multi-Ciphers

    Get PDF
    In this compilational work, we combine various techniques from classical cryptography and steganography to construct ciphers that conceal multiple plaintexts in a single ciphertext. We name these multi-ciphers . Most notably, we construct and cryptanalyze a Four-In-One-Cipher: the first cipher which conceals four separate plaintexts in a single ciphertext. Following a brief overview of classical cryptography and steganography, we consider strategies that can be used to creatively combine these two fields to construct multi-ciphers. Finally, we cryptanalyze three multi-ciphers which were constructed using the techniques described in this paper. This cryptanalysis relies on both traditional algorithms that are used to decode classical ciphers and new algorithms which we use to extract the additional plaintexts concealed by the multi-ciphers. We implement these algorithms in Python, and provide code snippets. The primary goal of this work is to inform others who might be otherwise unfamiliar with the fields of classical cryptography and steganography from a new perspective which lies at the intersection of these two fields. The ideas presented in this paper could prove useful in teaching cryptography, statistics, mathematics, and computer science to future generations in a unique, interdisciplinary fashion. This work might also serve as a source of creative inspiration for other cipher-making, code-breaking enthusiasts

    Genetic graph-based in clustering applied to static and streaming data analysis

    Full text link
    Tesis inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura: diciembre de 2014Unsupervised Learning Techniques have been widely used in Data Mining over the last few years. These techniques try to identify patterns in a dataset blindly. Clustering is one of the most promising elds in Unsupervised Learning. It consists on grouping the data by similarity. This eld has generated several research works which have tried to deal with di erent problems related to the pattern extraction and data grouping processes. One of the most innovative clustering methodologies is shape-based or continuity-based clustering which tries to group data according to the form they de ne in the space. This dissertation is focused on how to apply Genetic Algorithms to the continuitybased clustering problems. Genetic Algorithms have been traditionally used in optimization problems. They are featured by an encoding -which represents the solution space; a population set of chromosomes -which are the potential solutions; and some genetic operations -which are used to evolve the solutions in order to nd the best chromosome or solution. The main idea is to take advantage of their potential, generating new algorithms which can improve the performance of classical clustering algorithms, and apply them to static and streaming data. In order to design these algorithms, this dissertation has been based on the Spectral Clustering algorithm. This algorithm studies the spectrum of a Similarity Graph in order to de ne the clusters. The clusters de ned by Spectral Clustering usually respect the data continuity. Using this idea as a starting point, di erent graph-based genetic algorithms have been designed to deal with the continuity-based clustering problem. The di erent algorithms developed have been divided in three generations: The rst generation is based on genetic graph-based clustering algorithms. In this generation we combined graph-based clustering and genetic algorithms to generate a graph topology among the data, in order to nd the best way to cut the graph. This cutting process is used to discriminate the nal clusters. The main idea is to use hybrid algorithms which combine di erent metrics extracted from graph theory. In order to evaluate the performance on real-world problems, these algorithms have been also applied to text summarization. The second generation is based on multi-objective genetic graph-based clustering algorithms. This generation introduces the Pareto Front generated by the di erent tness functions used in the genetic search. The Pareto Front is used to study the solution space and provides more robust and accurate solutions. During this generation we also used co-evolutionary algorithms to include the number of clusters in the search space. Finally, the last generation is focused on large and streaming data analysis. During this generation the previous algorithms have been adapted to deal with large data, combining di erent methodologies such as online clustering and MapReduce. The main idea is to study their performance compared with other algorithms. The dissertation also includes a description of other graph-based bio-inspired algorithms, in this case Ant Colony Optimization Clustering algorithms, which have been designed during the dissertation, in order to extend the range of study to other bio-inspired areas. Finally, with the purpose of evaluating the algorithms of the di erent generations, we have compared them with relevant and well-known clustering algorithms using synthetic and real-world datasets extracted from the literature and the UCI Machine Learning RepositoryLas técnicas de aprendizaje no supervisado han sido ampliamente utilizadas en minería de datos en los últimos años. Estas técnicas tratan de extraer patrones de un conjunto de datos de forma ciega. Dentro de las mismas, el Clustering es uno de los campos más prometedores. Este consiste en la agrupación de los datos por similitud. Este campo ha generado varios trabajos de investigación que han tratado de hacer frente a diferentes problemas relacionados con la extracción de patrones y los procesos de agrupación de datos. Una de las metodologías de clustering más innovadoras se basa en agrupar los datos por continuidad, respetando la forma que estos definen en espacio en el que se encuentran. Esta tesis se centra en la manera de aplicar algoritmos genéticos a los problemas de clustering basado en continuidad. Los algoritmos genéticos han sido utilizados tradicionalmente en problemas de optimización. Se caracterizan por una codificación -que representa el espacio de soluciones-, una población o conjunto de cromosomas -que son las soluciones potenciales dentro de este espacio-, y algunas operaciones genéticas -que se utilizan para evolucionar las soluciones con el fin de encontrar el mejor cromosoma o solución-. La idea principal es aprovechar el pontencial de los algoritmos genéticos generando nuevos algoritmos que pueden mejorar el rendimiento de los algoritmos clásicos aplicados tanto a datos estáticos como a flujos continuos de datos. De cara a diseñaar estos algoritmos, esta tesis doctoral utiliza el algoritmo de Spectral Clustering como punto de partida. Este algoritmo estudia el espectro de un grafo de similitud con el fin de dfinir las agrupaciones o clusters. Los grupos de nidos por Spectral Clustering suelen respetar la continuidad de los datos. Utilizando esta idea, se han diseñado diferentes algoritmos genéticos basados en grafos para hacer frente al problema de agrupación basada en continuidad. Los diferentes algoritmos desarrollados se han dividido en tres generaciones: La primera generación se basa en algoritmos de clustering genéticos basados en grafos. En esta generación se han combinado técnicas de Graph Clustering y algoritmos genéticos para generar una topología de grafo entre los datos, con el fin de encontrar la mejor manera de cortar el grafo. Este proceso de corte se utiliza para discriminar los grupos finales. La idea principal es utilizar algoritmos híbridos que combinan diferentes métricas extraídas de teoría de grafos. Con el fin de evaluar el comportamiento de los algoritmos en problemas del mundo real, estos algoritmos se han aplicado al problema de cómo generar resúmenes automáticos. La segunda generación se basa en algoritmos multi-objetivo de clustering genético basado en grafos. Esta generación introduce el Frente de Pareto, generado por las diferentes funciones de fitness utilizadas en la búsqueda genética. El frente de Pareto se utiliza para estudiar el espacio de soluciones y proporcionar soluciones más robustas y precisas. Durante esta generación también utilizamos algoritmos co-evolutivos de cara a incluir el número de clusters en el espacio de búsqueda Finalmente, la ultima generación se centra en el análisis de grandes cantidades y flujos de datos. Durante esta generación los algoritmos anteriormente mencionados se han adaptado para hacer frente a grandes volúmenes de datos, combinando diferentes metodologí as como el clustering online y MapReduce. La idea principal es estudiar su rendimiento en comparación con otros algoritmos. La tesis también incluye aportaciones de otros algoritmos bio-inspirados basados en grafos, en este caso, algoritmos de clustering usando optimización por colonias de hormigas. Estos algoritmos han sido diseñados durante el desarrollo de la tesis para ampliar el rango de estudio a otros entornos bio-inspirados. Por último, con el fin de evaluar los algoritmos de las diferentes generaciones, se han comparado con algoritmos de clustering conocidos. El rendimiento de estos algoritmos se ha medido utilizando conjuntos de datos sintéticos y reales extraídos de la literatura y del repositorio UCI de Machine Learning

    Enhancing Recommendations in Specialist Search Through Semantic-based Techniques and Multiple Resources

    Get PDF
    Information resources abound on the Internet, but mining these resources is a non-trivial task. Such abundance has raised the need to enhance services provided to users, such as recommendations. The purpose of this work is to explore how better recommendations can be provided to specialists in specific domains such as bioinformatics by introducing semantic techniques that reason through different resources and using specialist search techniques. Such techniques exploit semantic relations and hidden associations that occur as a result of the information overlapping among various concepts in multiple bioinformatics resources such as ontologies, websites and corpora. Thus, this work introduces a new method that reasons over different bioinformatics resources and then discovers and exploits different relations and information that may not exist in the original resources. Such relations may be discovered as a consequence of the information overlapping, such as the sibling and semantic similarity relations, to enhance the accuracy of the recommendations provided on bioinformatics content (e.g. articles). In addition, this research introduces a set of semantic rules that are able to extract different semantic information and relations inferred among various bioinformatics resources. This project introduces these semantic-based methods as part of a recommendation service within a content-based system. Moreover, it uses specialists' interests to enhance the provided recommendations by employing a method that is collecting user data implicitly. Then, it represents the data as adaptive ontological user profiles for each user based on his/her preferences, which contributes to more accurate recommendations provided to each specialist in the field of bioinformatics
    corecore