7 research outputs found

    Active Learning with Semi-Supervised Support Vector Machines

    Get PDF
    A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive

    Selected Papers from Experimental Stress Analysis 2020

    Get PDF
    This Special Issue consists of selected papers from the Experimental Stress Analysis 2020 conference. Experimental Stress Analysis 2020 was organized with the support of the Czech Society for Mechanics, Expert Group of Experimental Mechanics, and was, for this particular year, held online in 19–22 October 2020. The objectives of the conference included identification of current situation, sharing professional experience and knowledge, discussing new theoretical and practical findings, and the establishment and strengthening of relationships between universities, companies, and scientists from the field of experimental mechanics in mechanical and civil engineering. The topics of the conference were focused on experimental research on materials and structures subjected to mechanical, thermal–mechanical, and dynamic loading, including damage, fatigue, and fracture analyses. The selected papers deal with top-level contemporary phenomena, such as modern durable materials, numerical modeling and simulations, and innovative non-destructive materials’ testing

    Information Management and Market Engineering. Vol. II

    Get PDF
    The research program Information Management and Market Engineering focuses on the analysis and the design of electronic markets. Taking a holistic view of the conceptualization and realization of solutions, the research integrates the disciplines business administration, economics, computer science, and law. Topics of interest range from the implementation, quality assurance, and advancement of electronic markets to their integration into business processes and legal frameworks

    Clustering-based Algorithms for Big Data Computations

    Get PDF
    In the age of big data, the amount of information that applications need to process often exceeds the computational capabilities of single machines. To cope with this deluge of data, new computational models have been defined. The MapReduce model allows the development of distributed algorithms targeted at large clusters, where each machine can only store a small fraction of the data. In the streaming model a single processor processes on-the-fly an incoming stream of data, using only limited memory. The specific characteristics of these models combined with the necessity of processing very large datasets rule out, in many cases, the adoption of known algorithmic strategies, prompting the development of new ones. In this context, clustering, the process of grouping together elements according to some proximity measure, is a valuable tool, which allows to build succinct summaries of the input data. In this thesis we develop novel algorithms for some fundamental problems, where clustering is a key ingredient to cope with very large instances or is itself the ultimate target. First, we consider the problem of approximating the diameter of an undirected graph, a fundamental metric in graph analytics, for which the known exact algorithms are too costly to use for very large inputs. We develop a MapReduce algorithm for this problem which, for the important class of graphs of bounded doubling dimension, features a polylogarithmic approximation guarantee, uses linear memory and executes in a number of parallel rounds that can be made sublinear in the input graph's diameter. To the best of our knowledge, ours is the first parallel algorithm with these guarantees. Our algorithm leverages a novel clustering primitive to extract a concise summary of the input graph on which to compute the diameter approximation. We complement our theoretical analysis with an extensive experimental evaluation, finding that our algorithm features an approximation quality significantly better than the theoretical upper bound and high scalability. Next, we consider the problem of clustering uncertain graphs, that is, graphs where each edge has a probability of existence, specified as part of the input. These graphs, whose applications range from biology to privacy in social networks, have an exponential number of possible deterministic realizations, which impose a big-data perspective. We develop the first algorithms for clustering uncertain graphs with provable approximation guarantees which aim at maximizing the probability that nodes be connected to the centers of their assigned clusters. A preliminary suite of experiments, provides evidence that the quality of the clusterings returned by our algorithms compare very favorably with respect to previous approaches with no theoretical guarantees. Finally, we deal with the problem of diversity maximization, which is a fundamental primitive in big data analytics: given a set of points in a metric space we are asked to provide a small subset maximizing some notion of diversity. We provide efficient streaming and MapReduce algorithms with approximation guarantees that can be made arbitrarily close to the ones of the best sequential algorithms available. The algorithms crucially rely on the use of a k-center clustering primitive to extract a succinct summary of the data and their analysis is expressed in terms of the doubling dimension of the input point set. Moreover, unlike previously known algorithms, ours feature an interesting tradeoff between approximation quality and memory requirements. Our theoretical findings are supported by the first experimental analysis of diversity maximization algorithms in streaming and MapReduce, which highlights the tradeoffs of our algorithms on both real-world and synthetic datasets. Moreover, our algorithms exhibit good scalability, and a significantly better performance than the approaches proposed in previous works

    Notes in Pure Mathematics & Mathematical Structures in Physics

    Full text link
    These Notes deal with various areas of mathematics, and seek reciprocal combinations, explore mutual relations, ranging from abstract objects to problems in physics.Comment: Small improvements and addition

    Verifying reliability properties using the hyperball abstract domain

    No full text
    Modern systems are increasingly susceptible to soft errors that manifest themselves as bit flips and possibly alter the semantics of an application. We would like to measure the quality degradation on semantics due to such bit flips, and thus we introduce a Hyperball abstract domain that allows us to determine the worst-case distance between expected and actual results. Similar to intervals, hyperballs describe a connected and dense space. The semantics of low-level code in the presence of bit flips is hard to accurately describe in such a space. We therefore combine the Hyperball domain with an existing affine system abstract domain that we extend to handle bit flips, which are introduce as disjunctions. Bit-flips can reduce the precision of our analysis, and we therefor introduce the Scale domain as a disjunctive refinement to minimize precision loss. This domain bounds the number of disjunctive elements by quantifying the over-approximation of different partitions and uses submodular optimization to find a good partitioning (within a bound of optimal).We evaluate these domains to show benefits and potential problems. For the application we examine here, adding the Scale domain to the Hyperball abstraction improves accuracy by up to two orders of magnitude. Our initial results demonstrate the feasibility of this approach, although we would like to further improve execution efficiency
    corecore