348 research outputs found

    A Bibliographic View on Constrained Clustering

    Full text link
    A keyword search on constrained clustering on Web-of-Science returned just under 3,000 documents. We ran automatic analyses of those, and compiled our own bibliography of 183 papers which we analysed in more detail based on their topic and experimental study, if any. This paper presents general trends of the area and its sub-topics by Pareto analysis, using citation count and year of publication. We list available software and analyse the experimental sections of our reference collection. We found a notable lack of large comparison experiments. Among the topics we reviewed, applications studies were most abundant recently, alongside deep learning, active learning and ensemble learning.Comment: 18 pages, 11 figures, 177 reference

    3rd Workshop in Symbolic Data Analysis: book of abstracts

    Get PDF
    This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

    On improving the performance of optimistic distributed simulations

    No full text
    This report investigates means of improving the performance of optimistic distributed simulations without affecting the simulation accuracy. We argue that existing clustering algorithms are not adequate for application in distributed simulations, and outline some characteristics of an ideal algorithm that could be applied in this field. This report is structured as follows. We start by introducing the area of distributed simulation. Following a comparison of the dominant protocols used in distributed simulation, we elaborate on the current approaches of improving the simulation performance, using computation efficient techniques, exploiting the hardware configuration of processors, optimizations that can be derived from the simulation scenario, etc. We introduce the core characteristics of clustering approaches and argue that these cannot be applied in real-life distributed simulation problems. We present a typical distributed simulation setting and elaborate on the reasons that existing clustering approaches are not expected to improve the performance of a distributed simulation. We introduce a prototype distributed simulation platform that has been developed in the scope of this research, focusing on the area of emergency response and specifically building evacuation. We continue by outlining our current work on this issue, and finally, we end this report by outlining next actions which could be made in this field

    Numerical Relativity: A review

    Full text link
    Computer simulations are enabling researchers to investigate systems which are extremely difficult to handle analytically. In the particular case of General Relativity, numerical models have proved extremely valuable for investigations of strong field scenarios and been crucial to reveal unexpected phenomena. Considerable efforts are being spent to simulate astrophysically relevant simulations, understand different aspects of the theory and even provide insights in the search for a quantum theory of gravity. In the present article I review the present status of the field of Numerical Relativity, describe the techniques most commonly used and discuss open problems and (some) future prospects.Comment: 2 References added; 1 corrected. 67 pages. To appear in Classical and Quantum Gravity. (uses iopart.cls

    Greedy routing and virtual coordinates for future networks

    Get PDF
    At the core of the Internet, routers are continuously struggling with ever-growing routing and forwarding tables. Although hardware advances do accommodate such a growth, we anticipate new requirements e.g. in data-oriented networking where each content piece has to be referenced instead of hosts, such that current approaches relying on global information will not be viable anymore, no matter the hardware progress. In this thesis, we investigate greedy routing methods that can achieve similar routing performance as today but use much less resources and which rely on local information only. To this end, we add specially crafted name spaces to the network in which virtual coordinates represent the addressable entities. Our scheme enables participating routers to make forwarding decisions using only neighbourhood information, as the overarching pseudo-geometric name space structure already organizes and incorporates "vicinity" at a global level. A first challenge to the application of greedy routing on virtual coordinates to future networks is that of "routing dead-ends" that are local minima due to the difficulty of consistent coordinates attribution. In this context, we propose a routing recovery scheme based on a multi-resolution embedding of the network in low-dimensional Euclidean spaces. The recovery is performed by routing greedily on a blurrier view of the network. The different network detail-levels are obtained though the embedding of clustering-levels of the graph. When compared with higher-dimensional embeddings of a given network, our method shows a significant diminution of routing failures for similar header and control-state sizes. A second challenge to the application of virtual coordinates and greedy routing to future networks is the support of "customer-provider" as well as "peering" relationships between participants, resulting in a differentiated services environment. Although an application of greedy routing within such a setting would combine two very common fields of today's networking literature, such a scenario has, surprisingly, not been studied so far. In this context we propose two approaches to address this scenario. In a first approach we implement a path-vector protocol similar to that of BGP on top of a greedy embedding of the network. This allows each node to build a spatial map associated with each of its neighbours indicating the accessible regions. Routing is then performed through the use of a decision-tree classifier taking the destination coordinates as input. When applied on a real-world dataset (the CAIDA 2004 AS graph) we demonstrate an up to 40% compression ratio of the routing control information at the network's core as well as a computationally efficient decision process comparable to methods such as binary trees and tries. In a second approach, we take inspiration from consensus-finding in social sciences and transform the three-dimensional distance data structure (where the third dimension encodes the service differentiation) into a two-dimensional matrix on which classical embedding tools can be used. This transformation is achieved by agreeing on a set of constraints on the inter-node distances guaranteeing an administratively-correct greedy routing. The computed distances are also enhanced to encode multipath support. We demonstrate a good greedy routing performance as well as an above 90% satisfaction of multipath constraints when relying on the non-embedded obtained distances on synthetic datasets. As various embeddings of the consensus distances do not fully exploit their multipath potential, the use of compression techniques such as transform coding to approximate the obtained distance allows for better routing performances

    Multimodal non-linear latent semantic method for information retrieval

    Get PDF
    La búsqueda y recuperación de datos multimodales es una importante tarea dentro del campo de búsqueda y recuperación de información, donde las consultas y los elementos de la base de datos objetivo están representados por un conjunto de modalidades, donde cada una de ellas captura un aspecto de un fenómeno de interés. Cada modalidad contiene información complementaria y común a otras modalidades. Con el fin de tomar ventaja de la información adicional distribuida a través de las distintas modalidades han sido desarrollados muchos algoritmos y métodos que utilizan las propiedades estadísticas en los datos multimodales para encontrar correlaciones implícitas, otros aprenden a calcular distancias heterogéneas, otros métodos aprenden a proyectar los datos desde el espacio de entrada hasta un espacio semántico común, donde las diferentes modalidades son comparables y se puede construir un ranking a partir de ellas. En esta tesis se presenta el diseño de un sistema para la búsqueda y recuperación de información multimodal que aprende varias proyecciones no lineales a espacios semánticos latentes donde las distintas modalidades son representadas en conjunto y es posible realizar comparaciones y medidas de similitud para construir rankings multimodales. Adicionalmente se propone un método kernelizado para la proyección de datos a un espacio semántico latente usando la información de las etiquetas como método de supervisión para construir índice multimodal que integra los datos multimodales y la información de las etiquetas; este método puede proyectar los datos a tres diferentes espacios semánticos donde varias configuraciones de búsqueda y recuperación de información pueden ser aplicadas. El sistema y el método propuestos fueron evaluados en un conjunto de datos compuesto por casos médicos, donde cada caso consta de una imagen de tejido prostático, un reporte de texto del patólogo y un valor de Gleason score como etiqueta de supervisión. Combinando la información multimodal y la información en las etiquetas se generó un índice multimodal que se utilizó para realizar la tarea de búsqueda y recuperación de información por contenido obteniendo resultados sobresalientes. Las proyecciones no-lineales permiten al modelo una mayor flexibilidad y capacidad de representación. Sin embargo calcular estas proyecciones no-lineales en un conjunto de datos enorme es computacionalmente costoso, para reducir este costo y habilitar el modelo para procesar datos a gran escala, la técnica del budget fue utilizada, mostrando un buen compromiso entre efectividad y velocidad.Multimodal information retrieval is an information retrieval sub-task where queries and database target elements are composed of several modalities or views. A modality is a representation of complex phenomena, captured and measured by different sensors or information sources, each one encodes some information about it. Each modality representation contains complementary and shared information about the phenomenon of interest, this additional information can be used to improve the information retrieval process. Several methods have been developed to take advantage of additional information distributed across different modalities. Some of them exploit statistical properties in multimodal data to find correlations and implicit relationships, others learn heterogeneous distance functions, and others learn linear and non-linear projections that transform data from the original input space to a common latent semantic space where different modalities are comparable. In spite of the attention dedicated to this issue, multimodal information retrieval is still an open problem. This thesis presents a multimodal information retrieval system designed to learn several mapping functions to transform multimodal data to a latent semantic space, where different modalities are combined and can be compared to build a multimodal ranking and perform a multimodal information retrieval task. Additionally, a multimodal kernelized latent semantic embedding method is proposed to construct a supervised multimodal index, integrating multimodal data and label supervision. This method can perform mappings to three different spaces where some information retrieval task setups can be performed. The proposed system and method were evaluated in a multimodal medical case-based retrieval task where data is composed of whole-slide images of prostate tissue samples, pathologist’s text report and Gleason score as a supervised label. Multimodal data and labels were combined to produce a multimodal index. This index was used to retrieve multimodal information and achieves outstanding results compared with previous works on this topic. Non-linear mappings provide more flexibility and representation capacity to the proposed model. However, constructing the non-linear mapping in a large dataset using kernel methods can be computationally costly. To reduce the cost and allow large scale applications, the budget technique was introduced, showing good performance between speed and effectiveness.COLCIENCIASJóvenes investigadores 761/2016Línea de investigación: Ciencias de la computaciónMaestrí

    Quality of experience and access network traffic management of HTTP adaptive video streaming

    Get PDF
    The thesis focuses on Quality of Experience (QoE) of HTTP adaptive video streaming (HAS) and traffic management in access networks to improve the QoE of HAS. First, the QoE impact of adaptation parameters and time on layer was investigated with subjective crowdsourcing studies. The results were used to compute a QoE-optimal adaptation strategy for given video and network conditions. This allows video service providers to develop and benchmark improved adaptation logics for HAS. Furthermore, the thesis investigated concepts to monitor video QoE on application and network layer, which can be used by network providers in the QoE-aware traffic management cycle. Moreover, an analytic and simulative performance evaluation of QoE-aware traffic management on a bottleneck link was conducted. Finally, the thesis investigated socially-aware traffic management for HAS via Wi-Fi offloading of mobile HAS flows. A model for the distribution of public Wi-Fi hotspots and a platform for socially-aware traffic management on private home routers was presented. A simulative performance evaluation investigated the impact of Wi-Fi offloading on the QoE and energy consumption of mobile HAS.Die Doktorarbeit beschäftigt sich mit Quality of Experience (QoE) – der subjektiv empfundenen Dienstgüte – von adaptivem HTTP Videostreaming (HAS) und mit Verkehrsmanagement, das in Zugangsnetzwerken eingesetzt werden kann, um die QoE des adaptiven Videostreamings zu verbessern. Zuerst wurde der Einfluss von Adaptionsparameters und der Zeit pro Qualitätsstufe auf die QoE von adaptivem Videostreaming mittels subjektiver Crowdsourcingstudien untersucht. Die Ergebnisse wurden benutzt, um die QoE-optimale Adaptionsstrategie für gegebene Videos und Netzwerkbedingungen zu berechnen. Dies ermöglicht Dienstanbietern von Videostreaming verbesserte Adaptionsstrategien für adaptives Videostreaming zu entwerfen und zu benchmarken. Weiterhin untersuchte die Arbeit Konzepte zum Überwachen von QoE von Videostreaming in der Applikation und im Netzwerk, die von Netzwerkbetreibern im Kreislauf des QoE-bewussten Verkehrsmanagements eingesetzt werden können. Außerdem wurde eine analytische und simulative Leistungsbewertung von QoE-bewusstem Verkehrsmanagement auf einer Engpassverbindung durchgeführt. Schließlich untersuchte diese Arbeit sozialbewusstes Verkehrsmanagement für adaptives Videostreaming mittels WLAN Offloading, also dem Auslagern von mobilen Videoflüssen über WLAN Netzwerke. Es wurde ein Modell für die Verteilung von öffentlichen WLAN Zugangspunkte und eine Plattform für sozialbewusstes Verkehrsmanagement auf privaten, häuslichen WLAN Routern vorgestellt. Abschließend untersuchte eine simulative Leistungsbewertung den Einfluss von WLAN Offloading auf die QoE und den Energieverbrauch von mobilem adaptivem Videostreaming

    Swarm intelligence for clustering dynamic data sets for web usage mining and personalization.

    Get PDF
    Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely decentralized control, collaborative learning, high exploration ability, and inspiration from dynamic social behavior. Thus FSI offers a natural choice for modeling dynamic social data and solving problems in such domains. One particular case of dynamic social data is online/web usage data which is rich in information about user activities, interests and choices. This natural analogy between SI and social behavior is the main motivation for the topic of investigation in this dissertation, with a focus on Flock based systems which have not been well investigated for this purpose. More specifically, we investigate the use of flock-based SI to solve two related and challenging problems by developing algorithms that form critical building blocks of intelligent personalized websites, namely, (i) providing a better understanding of the online users and their activities or interests, for example using clustering techniques that can discover the groups that are hidden within the data; and (ii) reducing information overload by providing guidance to the users on websites and services, typically by using web personalization techniques, such as recommender systems. Recommender systems aim to recommend items that will be potentially liked by a user. To support a better understanding of the online user activities, we developed clustering algorithms that address two challenges of mining online usage data: the need for scalability to large data and the need to adapt cluster sing to dynamic data sets. To address the scalability challenge, we developed new clustering algorithms using a hybridization of traditional Flock-based clustering with faster K-Means based partitional clustering algorithms. We tested our algorithms on synthetic data, real VCI Machine Learning repository benchmark data, and a data set consisting of real Web user sessions. Having linear complexity with respect to the number of data records, the resulting algorithms are considerably faster than traditional Flock-based clustering (which has quadratic complexity). Moreover, our experiments demonstrate that scalability was gained without sacrificing quality. To address the challenge of adapting to dynamic data, we developed a dynamic clustering algorithm that can handle the following dynamic properties of online usage data: (1) New data records can be added at any time (example: a new user is added on the site); (2) Existing data records can be removed at any time. For example, an existing user of the site, who no longer subscribes to a service, or who is terminated because of violating policies; (3) New parts of existing records can arrive at any time or old parts of the existing data record can change. The user\u27s record can change as a result of additional activity such as purchasing new products, returning a product, rating new products, or modifying the existing rating of a product. We tested our dynamic clustering algorithm on synthetic dynamic data, and on a data set consisting of real online user ratings for movies. Our algorithm was shown to handle the dynamic nature of data without sacrificing quality compared to a traditional Flock-based clustering algorithm that is re-run from scratch with each change in the data. To support reducing online information overload, we developed a Flock-based recommender system to predict the interests of users, in particular focusing on collaborative filtering or social recommender systems. Our Flock-based recommender algorithm (FlockRecom) iteratively adjusts the position and speed of dynamic flocks of agents, such that each agent represents a user, on a visualization panel. Then it generates the top-n recommendations for a user based on the ratings of the users that are represented by its neighboring agents. Our recommendation system was tested on a real data set consisting of online user ratings for a set of jokes, and compared to traditional user-based Collaborative Filtering (CF). Our results demonstrated that our recommender system starts performing at the same level of quality as traditional CF, and then, with more iterations for exploration, surpasses CF\u27s recommendation quality, in terms of precision and recall. Another unique advantage of our recommendation system compared to traditional CF is its ability to generate more variety or diversity in the set of recommended items. Our contributions advance the state of the art in Flock-based 81 for clustering and making predictions in dynamic Web usage data, and therefore have an impact on improving the quality of online services
    corecore