80 research outputs found

    Exact algorithms for minimum sum-of-squares clustering

    Get PDF
    NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results

    Language Models for Novelty Detection in System Call Traces

    Full text link
    Due to the complexity of modern computer systems, novel and unexpected behaviors frequently occur. Such deviations are either normal occurrences, such as software updates and new user activities, or abnormalities, such as misconfigurations, latency issues, intrusions, and software bugs. Regardless, novel behaviors are of great interest to developers, and there is a genuine need for efficient and effective methods to detect them. Nowadays, researchers consider system calls to be the most fine-grained and accurate source of information to investigate the behavior of computer systems. Accordingly, this paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls, which can be seen as a language model. Language models estimate the likelihood of sequences, and since novelties deviate from previously observed behaviors by definition, they would be unlikely under the model. Following the success of neural networks for language models, three architectures are evaluated in this work: the widespread LSTM, the state-of-the-art Transformer, and the lower-complexity Longformer. However, large neural networks typically require an enormous amount of data to be trained effectively, and to the best of our knowledge, no massive modern datasets of kernel traces are publicly available. This paper addresses this limitation by introducing a new open-source dataset of kernel traces comprising over 2 million web requests with seven distinct behaviors. The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties while being data- and task-agnostic. The source code and trained models are publicly available on GitHub while the datasets are available on Zenodo.Comment: 12 pages, 7 figures, 3 table

    A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

    Get PDF
    We consider the problem of clustering a set of points so as to minimize the maximum intra-cluster dissimilarity, which is strongly NP-hard. Exact algorithms for this problem can handle datasets containing up to a few thousand observations, largely insufficient for the nowadays needs. The most popular heuristic for this problem, the complete-linkage hierarchical algorithm, provides feasible solutions that are usually far from optimal. We introduce a sampling-based exact algorithm aimed at solving large-sized datasets. The algorithm alternates between the solution of an exact procedure on a small sample of points, and a heuristic procedure to prove the optimality of the current solution. Our computational experience shows that our algorithm is capable of solving to optimality problems containing more than 500,000 observations within moderate time limits, this is two orders of magnitude larger than the limits of previous exact methods

    The Covering-Assignment Problem for Swarm-powered Ad-hoc Clouds: A Distributed 3D Mapping Use-case

    Full text link
    The popularity of drones is rapidly increasing across the different sectors of the economy. Aerial capabilities and relatively low costs make drones the perfect solution to improve the efficiency of those operations that are typically carried out by humans (e.g., building inspection, photo collection). The potential of drone applications can be pushed even further when they are operated in fleets and in a fully autonomous manner, acting de facto as a drone swarm. Besides automating field operations, a drone swarm can serve as an ad-hoc cloud infrastructure built on top of computing and storage resources available across the swarm members and other connected elements. Even in the absence of Internet connectivity, this cloud can serve the workloads generated by the swarm members themselves, as well as by the field agents operating within the area of interest. By considering the practical example of a swarm-powered 3D reconstruction application, we present a new optimization problem for the efficient generation and execution, on top of swarm-powered ad-hoc cloud infrastructure, of multi-node computing workloads subject to data geolocation and clustering constraints. The objective is the minimization of the overall computing times, including both networking delays caused by the inter-drone data transmission and computation delays. We prove that the problem is NP-hard and present two combinatorial formulations to model it. Computational results on the solution of the formulations show that one of them can be used to solve, within the configured time-limit, more than 50% of the considered real-world instances involving up to two hundred images and six drones

    Heuristics for optimizing 3D mapping missions over swarm-powered ad hoc clouds

    Full text link
    Drones have been getting more and more popular in many economy sectors. Both scientific and industrial communities aim at making the impact of drones even more disruptive by empowering collaborative autonomous behaviors -- also known as swarming behaviors -- within fleets of multiple drones. In swarming-powered 3D mapping missions, unmanned aerial vehicles typically collect the aerial pictures of the target area whereas the 3D reconstruction process is performed in a centralized manner. However, such approaches do not leverage computational and storage resources from the swarm members.We address the optimization of a swarm-powered distributed 3D mapping mission for a real-life humanitarian emergency response application through the exploitation of a swarm-powered ad hoc cloud. Producing the relevant 3D maps in a timely manner, even when the cloud connectivity is not available, is crucial to increase the chances of success of the operation. In this work, we present a mathematical programming heuristic based on decomposition and a variable neighborhood search heuristic to minimize the completion time of the 3D reconstruction process necessary in such missions. Our computational results reveal that the proposed heuristics either quickly reach optimality or improve the best known solutions for almost all tested realistic instances comprising up to 1000 images and fifteen drones

    A polynomial-time algorithm for the discrete facility location problem with limited distances and capacity constraints

    Get PDF
    The objective in terms of the facility location problem with limited distances is to minimize the sum of distance functions from the facility to its clients, but with a limit on each of these distances, from which the corresponding function becomes constant. The problem is applicable in situations where the service provided by the facility is insensitive after given threshold distances. In this paper, we propose a polynomial-time algorithm for the discrete version of the problem with capacity constraints regarding the number of served clients. These constraints are relevant for introducing quality measures in facility location decision processes as well as for justifying the facility creation

    Column generation algorithms for exact modularity maximization in networks

    No full text
    International audienceFinding modules, or clusters, in networks currently attracts much attention in several domains. The most studied criterion for doing so, due to Newman and Girvan [Phys. Rev. E 69, 026113 (2004)], is modularity maximization. Many heuristics have been proposed for maximizing modularity and yield rapidly near optimal solution or sometimes optimal ones but without a guarantee of optimality. There are few exact algorithms, prominent among which is a paper by Xu et al. [Eur. Phys. J. B 60, 231 (2007)]. Modularity maximization can also be expressed as a clique partitioning problem and the row generation algorithm of Grötschel and Wakabayashi [Math. Program. 45, 59 (1989)] applied. We propose to extend both of these algorithms using the powerful column generation methods for linear and non linear integer programming. Performance of the four resulting algorithms is compared on problems from the literature. Instances with up to 512 entities are solved exactly. Moreover, the computing time of previously solved problems are reduced substantially

    Algorithms for network modularity maximization

    No full text
    National audienceNetworks are often used to represent complex systems arising in a variety of fields. Social networks model interactions among people. Telecommunication networks model communications betwen them, such as in the WorldWide Web. Transportation networks model movements of goods and passengers. Biological networks model interactions between organisms, such as in food networks. A network (or graph) G = (V,E) is composed of a set of vertices, representing the entities of the system under study, and a set of edges joining pairs of vertices and representing a relation holding for such pairs. Identifying communities, or clusters, in complex networks is a topic of particular interest and is currently a very active research domain

    Less is more: simplified Nelder-Mead method for large unconstrained optimization

    Get PDF
    Nelder-Mead method (NM) for solving continuous non-linear optimization problem is probably the most cited and the most used method in the optimization literature and in practical applications, too. It belongs to the direct search methods, those which do not use the first and the second order derivatives. The popularity of NM is based on its simplicity. In this paper we propose even more simple algorithm for larger instances that follows NM idea. We call it Simplified NM (SNM): instead of generating all n + 1 simplex points in Rn, we perform search using just q + 1 vertices, where q is usually much smaller than n. Though the results cannot be better than after performing calculations in n+1 points as in NM, significant speed-up allows to run many times SNM from different starting solutions, usually getting better results than those obtained by NM within the same cpu time. Computational analysis is performed on 10 classical convex and non-convex instances, where the number of variables n can be arbitrarily large. The obtained results show that SNM is more effective than the original NM, confirming that LIMA yields good results when solving a continuous optimization problem

    Predição de abandono de tratamento da tuberculose

    Get PDF
    Projeto final Predição de abandono de tratamento da tuberculose, do curso Bootcamp Machine Learning, turma 1/21.16 páginasAnálise e Ciência de DadosSaúd
    • …
    corecore