501 research outputs found

    Association rules implementation for affinity analysis between elements composing multimedia objects

    Get PDF
    The multimedia objects are a constantly growing resource in the world wide web, consequently it has generated as a necessity the design of methods and tools that allow to obtain new knowledge from the information analyzed. Association rules are a technique of Data Mining, whose purpose is to search for correlations between elements of a collection of data (data) as support for decision making from the identification and analysis of these correlations. Using algorithms such as: A priori, Frequent Parent Growth, QFP Algorithm, CBA, CMAR, CPAR, among others. On the other hand, multimedia applications today require the processing of unstructured data provided by multimedia objects, which are made up of text, images, audio and videos. For the storage, processing and management of multimedia objects, solutions have been generated that allow efficient search of data of interest to the end user, considering that the semantics of a multimedia object must be expressed by all the elements that composed of. In this article an analysis of the state of the art in relation to the implementation of the Association Rules in the processing of Multimedia objects is made, in addition the analysis of the consulted literature allows to generate questions about the possibility of generating a method of association rules for the analysis of these objects.Universidad de la Costa, Universidad Pontificia Bolivariana

    New Approach for Market Intelligence Using Artificial and Computational Intelligence

    Get PDF
    Small and medium sized retailers are central to the private sector and a vital contributor to economic growth, but often they face enormous challenges in unleashing their full potential. Financial pitfalls, lack of adequate access to markets, and difficulties in exploiting technology have prevented them from achieving optimal productivity. Market Intelligence (MI) is the knowledge extracted from numerous internal and external data sources, aimed at providing a holistic view of the state of the market and influence marketing related decision-making processes in real-time. A related, burgeoning phenomenon and crucial topic in the field of marketing is Artificial Intelligence (AI) that entails fundamental changes to the skillssets marketers require. A vast amount of knowledge is stored in retailers’ point-of-sales databases. The format of this data often makes the knowledge they store hard to access and identify. As a powerful AI technique, Association Rules Mining helps to identify frequently associated patterns stored in large databases to predict customers’ shopping journeys. Consequently, the method has emerged as the key driver of cross-selling and upselling in the retail industry. At the core of this approach is the Market Basket Analysis that captures knowledge from heterogeneous customer shopping patterns and examines the effects of marketing initiatives. Apriori, that enumerates frequent itemsets purchased together (as market baskets), is the central algorithm in the analysis process. Problems occur, as Apriori lacks computational speed and has weaknesses in providing intelligent decision support. With the growth of simultaneous database scans, the computation cost increases and results in dramatically decreasing performance. Moreover, there are shortages in decision support, especially in the methods of finding rarely occurring events and identifying the brand trending popularity before it peaks. As the objective of this research is to find intelligent ways to assist small and medium sized retailers grow with MI strategy, we demonstrate the effects of AI, with algorithms in data preprocessing, market segmentation, and finding market trends. We show with a sales database of a small, local retailer how our Åbo algorithm increases mining performance and intelligence, as well as how it helps to extract valuable marketing insights to assess demand dynamics and product popularity trends. We also show how this results in commercial advantage and tangible return on investment. Additionally, an enhanced normal distribution method assists data pre-processing and helps to explore different types of potential anomalies.Små och medelstora detaljhandlare är centrala aktörer i den privata sektorn och bidrar starkt till den ekonomiska tillväxten, men de möter ofta enorma utmaningar i att uppnå sin fulla potential. Finansiella svårigheter, brist på marknadstillträde och svårigheter att utnyttja teknologi har ofta hindrat dem från att nå optimal produktivitet. Marknadsintelligens (MI) består av kunskap som samlats in från olika interna externa källor av data och som syftar till att erbjuda en helhetssyn av marknadsläget samt möjliggöra beslutsfattande i realtid. Ett relaterat och växande fenomen, samt ett viktigt tema inom marknadsföring är artificiell intelligens (AI) som ställer nya krav på marknadsförarnas färdigheter. Enorma mängder kunskap finns sparade i databaser av transaktioner samlade från detaljhandlarnas försäljningsplatser. Ändå är formatet på dessa data ofta sådant att det inte är lätt att tillgå och utnyttja kunskapen. Som AI-verktyg erbjuder affinitetsanalys en effektiv teknik för att identifiera upprepade mönster som statistiska associationer i data lagrade i stora försäljningsdatabaser. De hittade mönstren kan sedan utnyttjas som regler som förutser kundernas köpbeteende. I detaljhandel har affinitetsanalys blivit en nyckelfaktor bakom kors- och uppförsäljning. Som den centrala metoden i denna process fungerar marknadskorgsanalys som fångar upp kunskap från de heterogena köpbeteendena i data och hjälper till att utreda hur effektiva marknadsföringsplaner är. Apriori, som räknar upp de vanligt förekommande produktkombinationerna som köps tillsammans (marknadskorgen), är den centrala algoritmen i analysprocessen. Trots detta har Apriori brister som algoritm gällande låg beräkningshastighet och svag intelligens. När antalet parallella databassökningar stiger, ökar också beräkningskostnaden, vilket har negativa effekter på prestanda. Dessutom finns det brister i beslutstödet, speciellt gällande metoder att hitta sällan förekommande produktkombinationer, och i att identifiera ökande popularitet av varumärken från trenddata och utnyttja det innan det når sin höjdpunkt. Eftersom målet för denna forskning är att hjälpa små och medelstora detaljhandlare att växa med hjälp av MI-strategier, demonstreras effekter av AI med hjälp av algoritmer i förberedelsen av data, marknadssegmentering och trendanalys. Med hjälp av försäljningsdata från en liten, lokal detaljhandlare visar vi hur Åbo-algoritmen ökar prestanda och intelligens i datautvinningsprocessen och hjälper till att avslöja värdefulla insikter för marknadsföring, framför allt gällande dynamiken i efterfrågan och trender i populariteten av produkterna. Ytterligare visas hur detta resulterar i kommersiella fördelar och konkret avkastning på investering. Dessutom hjälper den utvidgade normalfördelningsmetoden i förberedelsen av data och med att hitta olika slags anomalier

    Property Based Process and Product Synthesis and Design

    Get PDF

    A Planning Approach to Migrating Domain-specific Legacy Systems into Service Oriented Architecture

    Get PDF
    The planning work prior to implementing an SOA migration project is very important for its success. Up to now, most of this kind of work has been manual work. An SOA migration planning approach based on intelligent information processing methods is addressed to semi-automate the manual work. This thesis will investigate the principle research question: “How can we obtain SOA migration planning schemas (semi-) automatically instead of by traditional manual work in order to determine if legacy software systems should be migrated to SOA computation environment?”. The controlled experiment research method has been adopted for directing research throughout the whole thesis. Data mining methods are used to analyse SOA migration source and migration targets. The mined information will be the supplementation of traditional analysis results. Text similarity measurement methods are used to measure the matching relationship between migration sources and migration targets. It implements the quantitative analysis of matching relationships instead of common qualitative analysis. Concretely, an association rule and sequence pattern mining algorithms are proposed to analyse legacy assets and domain logics for establishing a Service model and a Component model. These two algorithms can mine all motifs with any min-support number without assuming any ordering. It is better than the existing algorithms for establishing Service models and Component models in SOA migration situations. Two matching strategies based on keyword level and superficial semantic levels are described, which can calculate the degree of similarity between legacy components and domain services effectively. Two decision-making methods based on similarity matrix and hybrid information are investigated, which are for creating SOA migration planning schemas. Finally a simple evaluation method is depicted. Two case studies on migrating e-learning legacy systems to SOA have been explored. The results show the proposed approach is encouraging and applicable. Therefore, the SOA migration planning schemas can be created semi-automatically instead of by traditional manual work by using data mining and text similarity measurement methods

    A framework for trend mining with application to medical data

    Get PDF
    This thesis presents research work conducted in the field of knowledge discovery. It presents an integrated trend-mining framework and SOMA, which is the application of the trend-mining framework in diabetic retinopathy data. Trend mining is the process of identifying and analysing trends in the context of the variation of support of the association/classification rules that have been extracted from longitudinal datasets. The integrated framework concerns all major processes from data preparation to the extraction of knowledge. At the pre-process stage, data are cleaned, transformed if necessary, and sorted into time-stamped datasets using logic rules. At the next stage, time-stamp datasets are passed through the main processing, in which the ARM technique of matrix algorithm is applied to identify frequent rules with acceptable confidence. Mathematical conditions are applied to classify the sequences of support values into trends. Afterwards, interestingness criteria are applied to obtain interesting knowledge, and a visualization technique is proposed that maps how objects are moving from the previous to the next time stamp. A validation and verification (external and internal validation) framework is described that aims to ensure that the results at the intermediate stages of the framework are correct and that the framework as a whole can yield results that demonstrate causality. To evaluate the thesis, SOMA was developed. The dataset is, in itself, also of interest, as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The Royal Liverpool University Hospital has been a major centre for retinopathy research since 1991. Retinopathy is a generic term used to describe damage to the retina of the eye, which can, in the long term, lead to visual loss. Diabetic retinopathy is used to evaluate the framework, to determine whether SOMA can extract knowledge that is already known to the medics. The results show that those datasets can be used to extract knowledge that can show causality between patients’ characteristics such as the age of patient at diagnosis, type of diabetes, duration of diabetes, and diabetic retinopathy

    A soft computing decision support framework for e-learning

    Get PDF
    Tesi per compendi de publicacions.Supported by technological development and its impact on everyday activities, e-Learning and b-Learning (Blended Learning) have experienced rapid growth mainly in higher education and training. Its inherent ability to break both physical and cultural distances, to disseminate knowledge and decrease the costs of the teaching-learning process allows it to reach anywhere and anyone. The educational community is divided as to its role in the future. It is believed that by 2019 half of the world's higher education courses will be delivered through e-Learning. While supporters say that this will be the educational mode of the future, its detractors point out that it is a fashion, that there are huge rates of abandonment and that their massification and potential low quality, will cause its fall, assigning it a major role of accompanying traditional education. There are, however, two interrelated features where there seems to be consensus. On the one hand, the enormous amount of information and evidence that Learning Management Systems (LMS) generate during the e-Learning process and which is the basis of the part of the process that can be automated. In contrast, there is the fundamental role of e-tutors and etrainers who are guarantors of educational quality. These are continually overwhelmed by the need to provide timely and effective feedback to students, manage endless particular situations and casuistics that require decision making and process stored information. In this sense, the tools that e-Learning platforms currently provide to obtain reports and a certain level of follow-up are not sufficient or too adequate. It is in this point of convergence Information-Trainer, where the current developments of the LMS are centered and it is here where the proposed thesis tries to innovate. This research proposes and develops a platform focused on decision support in e-Learning environments. Using soft computing and data mining techniques, it extracts knowledge from the data produced and stored by e-Learning systems, allowing the classification, analysis and generalization of the extracted knowledge. It includes tools to identify models of students' learning behavior and, from them, predict their future performance and enable trainers to provide adequate feedback. Likewise, students can self-assess, avoid those ineffective behavior patterns, and obtain real clues about how to improve their performance in the course, through appropriate routes and strategies based on the behavioral model of successful students. The methodological basis of the mentioned functionalities is the Fuzzy Inductive Reasoning (FIR), which is particularly useful in the modeling of dynamic systems. During the development of the research, the FIR methodology has been improved and empowered by the inclusion of several algorithms. First, an algorithm called CR-FIR, which allows determining the Causal Relevance that have the variables involved in the modeling of learning and assessment of students. In the present thesis, CR-FIR has been tested on a comprehensive set of classical test data, as well as real data sets, belonging to different areas of knowledge. Secondly, the detection of atypical behaviors in virtual campuses was approached using the Generative Topographic Mapping (GTM) methodology, which is a probabilistic alternative to the well-known Self-Organizing Maps. GTM was used simultaneously for clustering, visualization and detection of atypical data. The core of the platform has been the development of an algorithm for extracting linguistic rules in a language understandable to educational experts, which helps them to obtain patterns of student learning behavior. In order to achieve this functionality, the LR-FIR algorithm (Extraction of Linguistic Rules in FIR) was designed and developed as an extension of FIR that allows both to characterize general behavior and to identify interesting patterns. In the case of the application of the platform to several real e-Learning courses, the results obtained demonstrate its feasibility and originality. The teachers' perception about the usability of the tool is very good, and they consider that it could be a valuable resource to mitigate the time requirements of the trainer that the e-Learning courses demand. The identification of student behavior models and prediction processes have been validated as to their usefulness by expert trainers. LR-FIR has been applied and evaluated in a wide set of real problems, not all of them in the educational field, obtaining good results. The structure of the platform makes it possible to assume that its use is potentially valuable in those domains where knowledge management plays a preponderant role, or where decision-making processes are a key element, e.g. ebusiness, e-marketing, customer management, to mention just a few. The Soft Computing tools used and developed in this research: FIR, CR-FIR, LR-FIR and GTM, have been applied successfully in other real domains, such as music, medicine, weather behaviors, etc.Soportado por el desarrollo tecnológico y su impacto en las diferentes actividades cotidianas, el e-Learning (o aprendizaje electrónico) y el b-Learning (Blended Learning o aprendizaje mixto), han experimentado un crecimiento vertiginoso principalmente en la educación superior y la capacitación. Su habilidad inherente para romper distancias tanto físicas como culturales, para diseminar conocimiento y disminuir los costes del proceso enseñanza aprendizaje le permite llegar a cualquier sitio y a cualquier persona. La comunidad educativa se encuentra dividida en cuanto a su papel en el futuro. Se cree que para el año 2019 la mitad de los cursos de educación superior del mundo se impartirá a través del e-Learning. Mientras que los partidarios aseguran que ésta será la modalidad educativa del futuro, sus detractores señalan que es una moda, que hay enormes índices de abandono y que su masificación y potencial baja calidad, provocará su caída, reservándole un importante papel de acompañamiento a la educación tradicional. Hay, sin embargo, dos características interrelacionadas donde parece haber consenso. Por un lado, la enorme generación de información y evidencias que los sistemas de gestión del aprendizaje o LMS (Learning Management System) generan durante el proceso educativo electrónico y que son la base de la parte del proceso que se puede automatizar. En contraste, está el papel fundamental de los e-tutores y e-formadores que son los garantes de la calidad educativa. Éstos se ven continuamente desbordados por la necesidad de proporcionar retroalimentación oportuna y eficaz a los alumnos, gestionar un sin fin de situaciones particulares y casuísticas que requieren toma de decisiones y procesar la información almacenada. En este sentido, las herramientas que las plataformas de e-Learning proporcionan actualmente para obtener reportes y cierto nivel de seguimiento no son suficientes ni demasiado adecuadas. Es en este punto de convergencia Información-Formador, donde están centrados los actuales desarrollos de los LMS y es aquí donde la tesis que se propone pretende innovar. La presente investigación propone y desarrolla una plataforma enfocada al apoyo en la toma de decisiones en ambientes e-Learning. Utilizando técnicas de Soft Computing y de minería de datos, extrae conocimiento de los datos producidos y almacenados por los sistemas e-Learning permitiendo clasificar, analizar y generalizar el conocimiento extraído. Incluye herramientas para identificar modelos del comportamiento de aprendizaje de los estudiantes y, a partir de ellos, predecir su desempeño futuro y permitir a los formadores proporcionar una retroalimentación adecuada. Así mismo, los estudiantes pueden autoevaluarse, evitar aquellos patrones de comportamiento poco efectivos y obtener pistas reales acerca de cómo mejorar su desempeño en el curso, mediante rutas y estrategias adecuadas a partir del modelo de comportamiento de los estudiantes exitosos. La base metodológica de las funcionalidades mencionadas es el Razonamiento Inductivo Difuso (FIR, por sus siglas en inglés), que es particularmente útil en el modelado de sistemas dinámicos. Durante el desarrollo de la investigación, la metodología FIR ha sido mejorada y potenciada mediante la inclusión de varios algoritmos. En primer lugar un algoritmo denominado CR-FIR, que permite determinar la Relevancia Causal que tienen las variables involucradas en el modelado del aprendizaje y la evaluación de los estudiantes. En la presente tesis, CR-FIR se ha probado en un conjunto amplio de datos de prueba clásicos, así como conjuntos de datos reales, pertenecientes a diferentes áreas de conocimiento. En segundo lugar, la detección de comportamientos atípicos en campus virtuales se abordó mediante el enfoque de Mapeo Topográfico Generativo (GTM), que es una alternativa probabilística a los bien conocidos Mapas Auto-organizativos. GTM se utilizó simultáneamente para agrupamiento, visualización y detección de datos atípicos. La parte medular de la plataforma ha sido el desarrollo de un algoritmo de extracción de reglas lingüísticas en un lenguaje entendible para los expertos educativos, que les ayude a obtener los patrones del comportamiento de aprendizaje de los estudiantes. Para lograr dicha funcionalidad, se diseñó y desarrolló el algoritmo LR-FIR, (extracción de Reglas Lingüísticas en FIR, por sus siglas en inglés) como una extensión de FIR que permite tanto caracterizar el comportamiento general, como identificar patrones interesantes. En el caso de la aplicación de la plataforma a varios cursos e-Learning reales, los resultados obtenidos demuestran su factibilidad y originalidad. La percepción de los profesores acerca de la usabilidad de la herramienta es muy buena, y consideran que podría ser un valioso recurso para mitigar los requerimientos de tiempo del formador que los cursos e-Learning exigen. La identificación de los modelos de comportamiento de los estudiantes y los procesos de predicción han sido validados en cuanto a su utilidad por los formadores expertos. LR-FIR se ha aplicado y evaluado en un amplio conjunto de problemas reales, no todos ellos del ámbito educativo, obteniendo buenos resultados. La estructura de la plataforma permite suponer que su utilización es potencialmente valiosa en aquellos dominios donde la administración del conocimiento juegue un papel preponderante, o donde los procesos de toma de decisiones sean una pieza clave, por ejemplo, e-business, e-marketing, administración de clientes, por mencionar sólo algunos. Las herramientas de Soft Computing utilizadas y desarrolladas en esta investigación: FIR, CR-FIR, LR-FIR y GTM, ha sido aplicadas con éxito en otros dominios reales, como música, medicina, comportamientos climáticos, etc.Postprint (published version

    Social media analytics and the role of twitter in the 2014 South Africa general election: a case study

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science., University of the Witwatersrand, Johannesburg, 2018Social network sites such as Twitter have created vibrant and diverse communities in which users express their opinions and views on a variety of topics such as politics. Extensive research has been conducted in countries such as Ireland, Germany and the United States, in which text mining techniques have been used to obtain information from politically oriented tweets. The purpose of this research was to determine if text mining techniques can be used to uncover meaningful information from a corpus of political tweets collected during the 2014 South African General Election. The Twitter Application Programming Interface was used to collect tweets that were related to the three major political parties in South Africa, namely: the African National Congress (ANC), the Democratic Alliance (DA) and the Economic Freedom Fighters (EFF). The text mining techniques used in this research are: sentiment analysis, clustering, association rule mining and word cloud analysis. In addition, a correlation analysis was performed to determine if there exists a relationship between the total number of tweets mentioning a political party and the total number of votes obtained by that party. The VADER (Valence Aware Dictionary for sEntiment Reasoning) sentiment classifier was used to determine the public’s sentiment towards the three main political parties. This revealed an overwhelming neutral sentiment of the public towards the ANC, DA and EFF. The result produced by the VADER sentiment classifier was significantly greater than any of the baselines in this research. The K-Means cluster algorithm was used to successfully cluster the corpus of political tweets into political-party clusters. Clusters containing tweets relating to the ANC and EFF were formed. However, tweets relating to the DA were scattered across multiple clusters. A fairly strong relationship was discovered between the number of positive tweets that mention the ANC and the number of votes the ANC received in election. Due to the lack of data, no conclusions could be made for the DA or the EFF. The apriori algorithm uncovered numerous association rules, some of which were found to be interest- ing. The results have also demonstrated the usefulness of word cloud analysis in providing easy-to-understand information from the tweet corpus used in this study. This research has highlighted the many ways in which text mining techniques can be used to obtain meaningful information from a corpus of political tweets. This case study can be seen as a contribution to a research effort that seeks to unlock the information contained in textual data from social network sites.MT 201

    Innovation in manufacturing through digital technologies and applications: Thoughts and Reflections on Industry 4.0

    Get PDF
    The rapid pace of developments in digital technologies offers many opportunities to increase the efficiency, flexibility and sophistication of manufacturing processes; including the potential for easier customisation, lower volumes and rapid changeover of products within the same manufacturing cell or line. A number of initiatives on this theme have been proposed around the world to support national industries under names such as Industry 4.0 (Industrie 4.0 in Germany, Made-in-China in China and Made Smarter in the UK). This book presents an overview of the state of art and upcoming developments in digital technologies pertaining to manufacturing. The starting point is an introduction on Industry 4.0 and its potential for enhancing the manufacturing process. Later on moving to the design of smart (that is digitally driven) business processes which are going to rely on sensing of all relevant parameters, gathering, storing and processing the data from these sensors, using computing power and intelligence at the most appropriate points in the digital workflow including application of edge computing and parallel processing. A key component of this workflow is the application of Artificial Intelligence and particularly techniques in Machine Learning to derive actionable information from this data; be it real-time automated responses such as actuating transducers or informing human operators to follow specified standard operating procedures or providing management data for operational and strategic planning. Further consideration also needs to be given to the properties and behaviours of particular machines that are controlled and materials that are transformed during the manufacturing process and this is sometimes referred to as Operational Technology (OT) as opposed to IT. The digital capture of these properties and behaviours can then be used to define so-called Cyber Physical Systems. Given the power of these digital technologies it is of paramount importance that they operate safely and are not vulnerable to malicious interference. Industry 4.0 brings unprecedented cybersecurity challenges to manufacturing and the overall industrial sector and the case is made here that new codes of practice are needed for the combined Information Technology and Operational Technology worlds, but with a framework that should be native to Industry 4.0. Current computing technologies are also able to go in other directions than supporting the digital ‘sense to action’ process described above. One of these is to use digital technologies to enhance the ability of the human operators who are still essential within the manufacturing process. One such technology, that has recently become accessible for widespread adoption, is Augmented Reality, providing operators with real-time additional information in situ with the machines that they interact with in their workspace in a hands-free mode. Finally, two linked chapters discuss the specific application of digital technologies to High Pressure Die Casting (HDPC) of Magnesium components. Optimizing the HPDC process is a key task for increasing productivity and reducing defective parts and the first chapter provides an overview of the HPDC process with attention to the most common defects and their sources. It does this by first looking at real-time process control mechanisms, understanding the various process variables and assessing their impact on the end product quality. This understanding drives the choice of sensing methods and the associated smart digital workflow to allow real-time control and mitigation of variation in the identified variables. Also, data from this workflow can be captured and used for the design of optimised dies and associated processes