1,623 research outputs found

    Swarm intelligence for clustering dynamic data sets for web usage mining and personalization.

    Get PDF
    Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely decentralized control, collaborative learning, high exploration ability, and inspiration from dynamic social behavior. Thus FSI offers a natural choice for modeling dynamic social data and solving problems in such domains. One particular case of dynamic social data is online/web usage data which is rich in information about user activities, interests and choices. This natural analogy between SI and social behavior is the main motivation for the topic of investigation in this dissertation, with a focus on Flock based systems which have not been well investigated for this purpose. More specifically, we investigate the use of flock-based SI to solve two related and challenging problems by developing algorithms that form critical building blocks of intelligent personalized websites, namely, (i) providing a better understanding of the online users and their activities or interests, for example using clustering techniques that can discover the groups that are hidden within the data; and (ii) reducing information overload by providing guidance to the users on websites and services, typically by using web personalization techniques, such as recommender systems. Recommender systems aim to recommend items that will be potentially liked by a user. To support a better understanding of the online user activities, we developed clustering algorithms that address two challenges of mining online usage data: the need for scalability to large data and the need to adapt cluster sing to dynamic data sets. To address the scalability challenge, we developed new clustering algorithms using a hybridization of traditional Flock-based clustering with faster K-Means based partitional clustering algorithms. We tested our algorithms on synthetic data, real VCI Machine Learning repository benchmark data, and a data set consisting of real Web user sessions. Having linear complexity with respect to the number of data records, the resulting algorithms are considerably faster than traditional Flock-based clustering (which has quadratic complexity). Moreover, our experiments demonstrate that scalability was gained without sacrificing quality. To address the challenge of adapting to dynamic data, we developed a dynamic clustering algorithm that can handle the following dynamic properties of online usage data: (1) New data records can be added at any time (example: a new user is added on the site); (2) Existing data records can be removed at any time. For example, an existing user of the site, who no longer subscribes to a service, or who is terminated because of violating policies; (3) New parts of existing records can arrive at any time or old parts of the existing data record can change. The user\u27s record can change as a result of additional activity such as purchasing new products, returning a product, rating new products, or modifying the existing rating of a product. We tested our dynamic clustering algorithm on synthetic dynamic data, and on a data set consisting of real online user ratings for movies. Our algorithm was shown to handle the dynamic nature of data without sacrificing quality compared to a traditional Flock-based clustering algorithm that is re-run from scratch with each change in the data. To support reducing online information overload, we developed a Flock-based recommender system to predict the interests of users, in particular focusing on collaborative filtering or social recommender systems. Our Flock-based recommender algorithm (FlockRecom) iteratively adjusts the position and speed of dynamic flocks of agents, such that each agent represents a user, on a visualization panel. Then it generates the top-n recommendations for a user based on the ratings of the users that are represented by its neighboring agents. Our recommendation system was tested on a real data set consisting of online user ratings for a set of jokes, and compared to traditional user-based Collaborative Filtering (CF). Our results demonstrated that our recommender system starts performing at the same level of quality as traditional CF, and then, with more iterations for exploration, surpasses CF\u27s recommendation quality, in terms of precision and recall. Another unique advantage of our recommendation system compared to traditional CF is its ability to generate more variety or diversity in the set of recommended items. Our contributions advance the state of the art in Flock-based 81 for clustering and making predictions in dynamic Web usage data, and therefore have an impact on improving the quality of online services

    Bio-Inspired Multi-Agent Technology for Industrial Applications

    Get PDF

    Data Mining Industrial Applications

    Get PDF

    Analyzing the Fine Structure of Distributions

    Full text link
    One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.Comment: 66 pages, 81 figures, accepted in PLOS ON

    Identification of key research topics in 5G using co-word analysis

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe aim of this research is to better understand the field of 5G by analyzing the more than 10000 publications found in the Web of Science database. To achieve this, a co-word analysis was performed to identify research topics based on the author keywords and a strategic diagram was used to measure their level of maturity and relevance to the field. In total this analysis identified that all the articles can be grouped into seven topics, from which, two are mature but peripheral, one is both well developed and central to the field, and the rest are central, but underdeveloped. The value of this research, was the usage of a well-established technique that has been used in many fields, but never in the field of 5G which is growing in relevance

    Symbiotic Organisms Search Algorithm: theory, recent advances and applications

    Get PDF
    The symbiotic organisms search algorithm is a very promising recent metaheuristic algorithm. It has received a plethora of attention from all areas of numerical optimization research, as well as engineering design practices. it has since undergone several modifications, either in the form of hybridization or as some other improved variants of the original algorithm. However, despite all the remarkable achievements and rapidly expanding body of literature regarding the symbiotic organisms search algorithm within its short appearance in the field of swarm intelligence optimization techniques, there has been no collective and comprehensive study on the success of the various implementations of this algorithm. As a way forward, this paper provides an overview of the research conducted on symbiotic organisms search algorithms from inception to the time of writing, in the form of details of various application scenarios with variants and hybrid implementations, and suggestions for future research directions

    The use of computational intelligence for security in named data networking

    Get PDF
    Information-Centric Networking (ICN) has recently been considered as a promising paradigm for the next-generation Internet, shifting from the sender-driven end-to-end communication paradigma to a receiver-driven content retrieval paradigm. In ICN, content -rather than hosts, like in IP-based design- plays the central role in the communications. This change from host-centric to content-centric has several significant advantages such as network load reduction, low dissemination latency, scalability, etc. One of the main design requirements for the ICN architectures -since the beginning of their design- has been strong security. Named Data Networking (NDN) (also referred to as Content-Centric Networking (CCN) or Data-Centric Networking (DCN)) is one of these architectures that are the focus of an ongoing research effort that aims to become the way Internet will operate in the future. Existing research into security of NDN is at an early stage and many designs are still incomplete. To make NDN a fully working system at Internet scale, there are still many missing pieces to be filled in. In this dissertation, we study the four most important security issues in NDN in order to defense against new forms of -potentially unknown- attacks, ensure privacy, achieve high availability, and block malicious network traffics belonging to attackers or at least limit their effectiveness, i.e., anomaly detection, DoS/DDoS attacks, congestion control, and cache pollution attacks. In order to protect NDN infrastructure, we need flexible, adaptable and robust defense systems which can make intelligent -and real-time- decisions to enable network entities to behave in an adaptive and intelligent manner. In this context, the characteristics of Computational Intelligence (CI) methods such as adaption, fault tolerance, high computational speed and error resilient against noisy information, make them suitable to be applied to the problem of NDN security, which can highlight promising new research directions. Hence, we suggest new hybrid CI-based methods to make NDN a more reliable and viable architecture for the future Internet.Information-Centric Networking (ICN) ha sido recientemente considerado como un paradigma prometedor parala nueva generación de Internet, pasando del paradigma de la comunicación de extremo a extremo impulsada por el emisora un paradigma de obtención de contenidos impulsada por el receptor. En ICN, el contenido (más que los nodos, como sucede en redes IPactuales) juega el papel central en las comunicaciones. Este cambio de "host-centric" a "content-centric" tiene varias ventajas importantes como la reducción de la carga de red, la baja latencia, escalabilidad, etc. Uno de los principales requisitos de diseño para las arquitecturas ICN (ya desde el principiode su diseño) ha sido una fuerte seguridad. Named Data Networking (NDN) (también conocida como Content-Centric Networking (CCN) o Data-Centric Networking (DCN)) es una de estas arquitecturas que son objetode investigación y que tiene como objetivo convertirse en la forma en que Internet funcionará en el futuro. Laseguridad de NDN está aún en una etapa inicial. Para hacer NDN un sistema totalmente funcional a escala de Internet, todavía hay muchas piezas que faltan por diseñar. Enesta tesis, estudiamos los cuatro problemas de seguridad más importantes de NDN, para defendersecontra nuevas formas de ataques (incluyendo los potencialmente desconocidos), asegurar la privacidad, lograr una alta disponibilidad, y bloquear los tráficos de red maliciosos o al menos limitar su eficacia. Estos cuatro problemas son: detección de anomalías, ataques DoS / DDoS, control de congestión y ataques de contaminación caché. Para solventar tales problemas necesitamos sistemas de defensa flexibles, adaptables y robustos que puedantomar decisiones inteligentes en tiempo real para permitir a las entidades de red que se comporten de manera rápida e inteligente. Es por ello que utilizamos Inteligencia Computacional (IC), ya que sus características (la adaptación, la tolerancia a fallos, alta velocidad de cálculo y funcionamiento adecuado con información con altos niveles de ruido), la hace adecuada para ser aplicada al problema de la seguridad ND

    Rethinking the Delivery Architecture of Data-Intensive Visualization

    Get PDF
    The web has transformed the way people create and consume information. However, data-intensive science applications have rarely been able to take full benefits of the web ecosystem so far. Analysis and visualization have remained close to large datasets on large servers and desktops, because of the vast resources that data-intensive applications require. This hampers the accessibility and on-demand availability of data-intensive science. In this work, I propose a novel architecture for the delivery of interactive, data-intensive visualization to the web ecosystem. The proposed architecture, codenamed Fabric, follows the idea of keeping the server-side oblivious of application logic as a set of scalable microservices that 1) manage data and 2) compute data products. Disconnected from application logic, the services allow interactive data-intensive visualization be simultaneously accessible to many users. Meanwhile, the client-side of this architecture perceives visualization applications as an interaction-in image-out black box with the sole responsibility of keeping track of application state and mapping interactions into well-defined and structured visualization requests. Fabric essentially provides a separation of concern that decouples the otherwise tightly coupled client and server seen in traditional data applications. Initial results show that as a result of this, Fabric enables high scalability of audience, scientific reproducibility, and improves control and protection of data products
    corecore