5,036 research outputs found

    Enhancing Stratified Graph Sampling Algorithms based on Approximate Degree Distribution

    Full text link
    Sampling technique has become one of the recent research focuses in the graph-related fields. Most of the existing graph sampling algorithms tend to sample the high degree or low degree nodes in the complex networks because of the characteristic of scale-free. Scale-free means that degrees of different nodes are subject to a power law distribution. So, there is a significant difference in the degrees between the overall sampling nodes. In this paper, we propose an idea of approximate degree distribution and devise a stratified strategy using it in the complex networks. We also develop two graph sampling algorithms combining the node selection method with the stratified strategy. The experimental results show that our sampling algorithms preserve several properties of different graphs and behave more accurately than other algorithms. Further, we prove the proposed algorithms are superior to the off-the-shelf algorithms in terms of the unbiasedness of the degrees and more efficient than state-of-the-art FFS and ES-i algorithms.Comment: 10 pages, 23 figures, the concept of approximate degree distribution, scale-free networks, graph sampling methods, stratified technolog

    Approximate Data Analytics Systems

    Get PDF
    Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources. To achieve these goals, we have designed and built the following approximate data analytics systems: • StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error. • IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. • PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing. • ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications

    Adaptive memory-based single distribution resampling for particle filter

    Get PDF
    The restrictions that are related to using single distribution resampling for some specific computing devices’ memory gives developers several difficulties as a result of the increased effort and time needed for the development of a particle filter. Thus, one needs a new sequential resampling algorithm that is flexible enough to allow it to be used with various computing devices. Therefore, this paper formulated a new single distribution resampling called the adaptive memory size-based single distribution resampling (AMSSDR). This resampling method integrates traditional variation resampling and traditional resampling in one architecture. The algorithm changes the resampling algorithm using the memory in a computing device. This helps the developer formulate a particle filter without over considering the computing devices’ memory utilisation during the development of different particle filters. At the start of the operational process, it uses the AMSSDR selector to choose an appropriate resampling algorithm (for example, rounding copy resampling or systematic resampling), based on the current computing devices’ physical memory. If one chooses systematic resampling, the resampling will sample every particle for every cycle. On the other hand, if it chooses the rounding copy resampling, the resampling will sample more than one of each cycle’s particle. This illustrates that the method (AMSSDR) being proposed is capable of switching resampling algorithms based on various physical memory requirements. The aim of the authors is to extend this research in the future by applying their proposed method in various emerging applications such as real-time locator systems or medical applications

    Computational approaches for engineering effective teams

    Full text link
    The performance of a team depends not only on the abilities of its individual members, but also on how these members interact with each other. Inspired by this premise and motivated by a large number of applications in educational, industrial and management settings, this thesis studies a family of problems, known as team-formation problems, that aim to engineer teams that are effective and successful. The major challenge in this family of problems is dealing with the complexity of the human team participants. Specifically, each individual has his own objectives, demands, and constraints that might be in contrast with the desired team objective. Furthermore, different collaboration models lead to different instances of team-formation problems. In this thesis, we introduce several such models and describe techniques and efficient algorithms for various instantiations of the team-formation problem. This thesis consists of two main parts. In the first part, we examine three distinct team-formation problems that are of significant interest in (i) educational settings, (ii) industrial organizations, and (iii) management settings respectively. What constitutes an effective team in each of the aforementioned settings is totally dependent on the objective of the team. For instance, the performance of a team (or a study group) in an educational setting can be measured as the amount of learning and collaboration that takes place inside the team. In industrial organizations, desirable teams are those that are cost-effective and highly profitable. Finally in management settings, an interesting body of research uncovers that teams with faultlines are prone to performance decrements. Thus, the challenge is to form teams that are free of faultlines, that is, to form teams that are robust and less likely to break due to disagreements. The first part of the thesis discusses approaches for formalizing these problems and presents efficient computational methods for solving them. In the second part of the thesis, we consider the problem of improving the functioning of existing teams. More precisely, we show how we can use models from social theory to capture the dynamics of the interactions between the team members. We further discuss how teams can be modified so that the interaction dynamics lead to desirable outcomes such as higher levels of agreement or lesser tension and conflict among the team members

    A System for Programming Anisotropic Physical Behaviour in Cloth Fabric

    Get PDF
    We propose a method to alter the tensile properties of cloth in a user defined and purposeful manner with the help of computer controlled embroidery. Our system is capable of infusing non-uniform stiffening in local regions of the cloth. This has numerous applications in the manufacturing of high performance smart textiles for the medical industry, sports goods, comfort-wear, etc where pressure needs to be redistributed and the cloth needs to deform correctly under a given load. We make three contributions to accomplish this: a decomposition scheme that expresses user-desired stiffness as a density map and a directional map, a novel stitch planning algorithm that produces a series of stitches adhering to the input stiffness maps and an inverse design based optimization driven by a cloth simulator that automatically computes stiffness maps based on user specified performance criteria. We perform multiple tests on physically manufactured cloth samples to show how embroidery affects the resultant fabric to demonstrate the efficacy of our approach

    Learn to Unlearn: A Survey on Machine Unlearning

    Full text link
    Machine Learning (ML) models have been shown to potentially leak sensitive information, thus raising privacy concerns in ML-driven applications. This inspired recent research on removing the influence of specific data samples from a trained ML model. Such efficient removal would enable ML to comply with the "right to be forgotten" in many legislation, and could also address performance bottlenecks from low-quality or poisonous samples. In that context, machine unlearning methods have been proposed to erase the contributions of designated data samples on models, as an alternative to the often impracticable approach of retraining models from scratch. This article presents a comprehensive review of recent machine unlearning techniques, verification mechanisms, and potential attacks. We further highlight emerging challenges and prospective research directions (e.g. resilience and fairness concerns). We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems and help them "learn to unlearn".Comment: 10 pages, 5 figures, 1 tabl

    Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

    Get PDF
    Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

    Enhancing soft computing techniques to actively address imbalanced regression problems

    Get PDF
    This paper has been supported in part by the ERDF A way of making Europe/Health Institute Carlos III/Spanish Ministry of Science, Innovation and Universities (grant number PI20/00711), by the ERDF A way of making Europe/Regional Government of Andalusia/Ministry of Economic Transformation, Industry, Knowledge and Universities (grant numbers P18-RT-2248 and B-CTS-536-UGR20) and by the MCIN/AEI/10.13039/50110001103 (grant numbers PID2019-107793GB-I00 and PID2020-119478GB-I00). Funding for open access charge: Universidad de Granada / CBUA

    Improving the matching of registered unemployed to job offers through machine learning algorithms

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceDue to the existence of a double-sided asymmetric information problem on the labour market characterized by a mutual lack of trust by employers and unemployed people, not enough job matches are facilitated by public employment services (PES), which seem to be caught in a low-end equilibrium. In order to act as a reliable third party, PES need to build a good and solid reputation among their main clients by offering better and less time consuming pre-selection services. The use of machine-learning, data-driven relevancy algorithms that calculate the viability of a specific candidate for a particular job opening is becoming increasingly popular in this field. Based on the Portuguese PES databases (CVs, vacancies, pre-selection and matching results), complemented by relevant external data published by Statistics Portugal and the European Classification of Skills/Competences, Qualifications and Occupations (ESCO), the current thesis evaluates the potential application of models such as Random Forests, Gradient Boosting, Support Vector Machines, Neural Networks Ensembles and other tree-based ensembles to the job matching activities that are carried out by the Portuguese PES, in order to understand the extent to which the latter can be improved through the adoption of automated processes. The obtained results seem promising and point to the possible use of robust algorithms such as Random Forests within the pre-selection of suitable candidates, due to their advantages at various levels, namely in terms of accuracy, capacity to handle large datasets with thousands of variables, including badly unbalanced ones, as well as extensive missing values and many-valued categorical variables
    • …
    corecore