31 research outputs found

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Dynamic Algorithms and Asymptotic Theory for Lp-norm Data Analysis

    Get PDF
    The focus of this dissertation is the development of outlier-resistant stochastic algorithms for Principal Component Analysis (PCA) and the derivation of novel asymptotic theory for Lp-norm Principal Component Analysis (Lp-PCA). Modern machine learning and signal processing applications employ sensors that collect large volumes of data measurements that are stored in the form of data matrices, that are often massive and need to be efficiently processed in order to enable machine learning algorithms to perform effective underlying pattern discovery. One such commonly used matrix analysis technique is PCA. Over the past century, PCA has been extensively used in areas such as machine learning, deep learning, pattern recognition, and computer vision, just to name a few. PCA\u27s popularity can be attributed to its intuitive formulation on the L2-norm, availability of an elegant solution via the singular-value-decomposition (SVD), and asymptotic convergence guarantees. However, PCA has been shown to be highly sensitive to faulty measurements (outliers) because of its reliance on the outlier-sensitive L2-norm. Arguably, the most straightforward approach to impart robustness against outliers is to replace the outlier-sensitive L2-norm by the outlier-resistant L1-norm, thus formulating what is known as L1-PCA. Exact and approximate solvers are proposed for L1-PCA in the literature. On the other hand, in this big-data era, the data matrix may be very large and/or the data measurements may arrive in streaming fashion. Traditional L1-PCA algorithms are not suitable in this setting. In order to efficiently process streaming data, while being resistant against outliers, we propose a stochastic L1-PCA algorithm that computes the dominant principal component (PC) with formal convergence guarantees. We further generalize our stochastic L1-PCA algorithm to find multiple components by propose a new PCA framework that maximizes the recently proposed Barron loss. Leveraging Barron loss yields a stochastic algorithm with a tunable robustness parameter that allows the user to control the amount of outlier-resistance required in a given application. We demonstrate the efficacy and robustness of our stochastic algorithms on synthetic and real-world datasets. Our experimental studies include online subspace estimation, classification, video surveillance, and image conditioning, among other things. Last, we focus on the development of asymptotic theory for Lp-PCA. In general, Lp-PCA for p\u3c2 has shown to outperform PCA in the presence of outliers owing to its outlier resistance. However, unlike PCA, Lp-PCA is perceived as a ``robust heuristic\u27\u27 by the research community due to the lack of theoretical asymptotic convergence guarantees. In this work, we strive to shed light on the topic by developing asymptotic theory for Lp-PCA. Specifically, we show that, for a broad class of data distributions, the Lp-PCs span the same subspace as the standard PCs asymptotically and moreover, we prove that the Lp-PCs are specific rotated versions of the PCs. Finally, we demonstrate the asymptotic equivalence of PCA and Lp-PCA with a wide variety of experimental studies

    Enabling Deep Learning on Edge Devices

    Full text link
    Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.Comment: PhD thesis at ETH Zuric

    Cavity-based negative images in molecular docking

    Get PDF
    In drug development, computer-based methods are constantly evolving as a result of increasing computing power and cumulative costs of generating new pharmaceuticals. With virtual screening (VS), it is possible to screen even hundreds of millions of compounds and select the best molecule candidates for in vitro testing instead of investing time and resources in analysing all molecules systematically in laboratories. However, there is a constant need to generate more reliable and effective software for VS. For example, molecular docking, one of the most central methods in structure-based VS, can be a very successful approach for certain targets while failing completely with others. However, it is not necessarily the docking sampling but the scoring of the docking poses that is the bottleneck. In this thesis, a novel rescoring method, negative image-based rescoring (R-NiB), is introduced, which generates a negative image of the ligand binding cavity and compares the shape and electrostatic similarity between the generated model and the docked molecule pose. The performance of the method is tested comprehensively using several different protein targets, benchmarking sets and docking software. Additionally, it is compared to other rescoring methods. R-NiB is shown to be a fast and effective method to rescore the docking poses producing notable improvement in active molecule recognition. Furthermore, the NIB model optimization method based on a greedy algorithm is introduced that uses a set of known active and inactive molecules as a training set. This approach, brute force negative image-based optimization (BR-NiB), is shown to work remarkably well producing impressive in silico results even with very limited active molecule training sets. Importantly, the results suggest that the in silico hit rates of the optimized models in docking rescoring are on a level needed in real-world VS and drug discovery projects.Tietokoneiden laskentatehojen ja lääketutkimuksen tuotekehityskulujen kasvaessa tietokonepohjaiset menetelmät kehittyvät jatkuvasti lääkekehityksessä. Virtuaaliseulonnalla voidaan seuloa jopa satoja miljoonia molekyylejä ja valita vain parhaat molekyyliehdokkaat laboratoriotestaukseen sen sijaan, että tuhlattaisiin aikaa ja resursseja analysoimalla järjestelmällisesti kaikki molekyylit laboratoriossa. Tästä huolimatta on koko ajan jatkuva tarve kehittää luotettavampia ja tehokkaampia menetelmiä virtuaaliseulontaan. Esimerkiksi telakointi, yksi keskeisimmistä työkaluista rakennepohjaisessa lääkeainekehityksessä, saattaa toimia erinomaisesti yhdellä kohteella ja epäonnistua täysin toisella. Ongelma ei välttämättä ole telakoitujen molekyylien luonnissa vaan niiden pisteytyksessä. Tässä väitöskirjassa tähän ongelmaan esitellään ratkaisuksi uudenlainen pisteytysmenetelmä R-NiB, jossa verrataan ligandinsitomisalueen negatiivikuvan muodon ja sähköstaattisen potentiaalin samankaltaisuutta telakoituihin molekyyleihin. Menetelmän suorituskykyä testataan usealla eri molekyylisarjalla, lääkeainekohteella, telakointiohjelmalla ja vertaamalla tuloksia muihin pisteytysmenetelmiin. R-NiB:n näytetään olevan nopea ja tehokas menetelmä telakointiasentojen pisteytykseen tuottaen huomattavan parannuksen aktiivisten molekyylien tunnistukseen. Tämän lisäksi esitellään ns. ahneeseen algoritmiin perustuva negatiivikuvan optimointimenetelmä, joka käyttää sarjaa tunnettuja aktiivisia ja inaktiivisia molekyylejä harjoitusjoukkona. Tämän BR-NiB-menetelmän näytetään toimivan ainakin tietokonemallinnuksessa todella hyvin tuottaen vaikuttavia tuloksia jopa silloin, kun harjoitusjoukko koostuu vain muutamista aktiivisista molekyyleistä. Mikä tärkeintä, in silico -tulokset viittaavat optimointimenetelmän osumaprosentin telakoinnin uudelleenpisteytyksessä olevan riittävän korkea myös oikeisiin virtuaaliseulontaprojekteihin

    Enhanced applicability of loop transformations

    Get PDF

    International Conference on Civil Infrastructure and Construction (CIC 2020)

    Get PDF
    This is the proceedings of the CIC 2020 Conference, which was held under the patronage of His Excellency Sheikh Khalid bin Khalifa bin Abdulaziz Al Thani in Doha, Qatar from 2 to 5 February 2020. The goal of the conference was to provide a platform to discuss next-generation infrastructure and its construction among key players such as researchers, industry professionals and leaders, local government agencies, clients, construction contractors and policymakers. The conference gathered industry and academia to disseminate their research and field experiences in multiple areas of civil engineering. It was also a unique opportunity for companies and organizations to show the most recent advances in the field of civil infrastructure and construction. The conference covered a wide range of timely topics that address the needs of the construction industry all over the world and particularly in Qatar. All papers were peer reviewed by experts in their field and edited for publication. The conference accepted a total number of 127 papers submitted by authors from five different continents under the following four themes: Theme 1: Construction Management and Process Theme 2: Materials and Transportation Engineering Theme 3: Geotechnical, Environmental, and Geo-environmental Engineering Theme 4: Sustainability, Renovation, and Monitoring of Civil InfrastructureThe list of the Sponsors are listed at page 1

    Cartographic modelling for automated map generation

    Get PDF
    corecore