30 research outputs found

    Error Resilient Video Coding Using Bitstream Syntax And Iterative Microscopy Image Segmentation

    Get PDF
    There has been a dramatic increase in the amount of video traffic over the Internet in past several years. For applications like real-time video streaming and video conferencing, retransmission of lost packets is often not permitted. Popular video coding standards such as H.26x and VPx make use of spatial-temporal correlations for compression, typically making compressed bitstreams vulnerable to errors. We propose several adaptive spatial-temporal error concealment approaches for subsampling-based multiple description video coding. These adaptive methods are based on motion and mode information extracted from the H.26x video bitstreams. We also present an error resilience method using data duplication in VPx video bitstreams. A recent challenge in image processing is the analysis of biomedical images acquired using optical microscopy. Due to the size and complexity of the images, automated segmentation methods are required to obtain quantitative, objective and reproducible measurements of biological entities. In this thesis, we present two techniques for microscopy image analysis. Our first method, “Jelly Filling” is intended to provide 3D segmentation of biological images that contain incompleteness in dye labeling. Intuitively, this method is based on filling disjoint regions of an image with jelly-like fluids to iteratively refine segments that represent separable biological entities. Our second method selectively uses a shape-based function optimization approach and a 2D marked point process simulation, to quantify nuclei by their locations and sizes. Experimental results exhibit that our proposed methods are effective in addressing the aforementioned challenges

    An Experiment to Create Awareness in People concerning Social Engineering Attacks

    Get PDF
    Social Engineering is the technique of obtaining confidential information from users, in a fraudulent way, with the purpose of using it against themselves, or against the organizations where they work. This study presents an experiment focused on raising awareness about the consequences of this type of attack, by executing a controlled attack on trustworthy people. To accomplish this, we have carried out a set of activities or tricks that attackers use to obtain information, inspiring the curiosity of social network contacts to visit a personal blog with fictitious information. In addition to this human interaction, a hidden plug-in has been installed to collect user information such as his IP address, country, operative system, and browser type. With the information collected, a pentesting attack has been done to ports 80 and 22, in order to collect more information. Finally, the results were shown to the victims. In addition, after the attack, users were surveyed about their knowledge of Phishing or Social Engineering. The results demonstrate that only 2% of people suspected or asked about the real reason to visit the Blog. Furthermore, it reveals that the people, who visited the blog, don not have any knowledge and awareness of how to steal sensitive information in a relatively simple way.La Ingeniería Social es la técnica que permite obtener información confidencial de los usuarios, de manera fraudulenta, con la finalidad de usarla en contra de ellos mismos, o de las organizaciones en las que laboran.  Este estudio presenta un experimento enfocado a crear conciencia acerca de las consecuencias de este tipo de ataque, mediante la ejecución de un ataque controlado a personas de confianza. Para lograrlo, se han llevado a cabo un conjunto de engaños y actividades, que los atacantes usan comúnmente para obtener información sensible, incentivando la curiosidad de los contactos de las redes sociales para que visiten un blog personal con información ficticia. A más de esta interacción humana, se ha instalado un complemento oculto y no deseado, para recolectar información del usuario tales como: su dirección IP, país de origen, sistema operativo y tipo de navegador. Con la información recolectada, se realizó un ataque de escaneo a los puertos 80 (Web server) y 22 (SSH Server), para encontrar más información sensible. Posteriormente, se muestran los resultados a las víctimas. Además, luego del ataque se realizó una encuesta a los usuarios acerca de su conocimiento de Phishing y de Ingeniería Social.  Los resultados muestran que únicamente el 2% de las personas, sospecharon o preguntaron acerca del verdadero motivo para visitar el Blog. Más aún, demuestra que las personas que visitaron el blog, no tienen conocimiento y conciencia de cómo se puede vulnerar información sensible de una forma relativamente sencilla

    Progressively communicating rich telemetry from autonomous underwater vehicles via relays

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2012As analysis of imagery and environmental data plays a greater role in mission construction and execution, there is an increasing need for autonomous marine vehicles to transmit this data to the surface. Without access to the data acquired by a vehicle, surface operators cannot fully understand the state of the mission. Communicating imagery and high-resolution sensor readings to surface observers remains a significant challenge – as a result, current telemetry from free-roaming autonomous marine vehicles remains limited to ‘heartbeat’ status messages, with minimal scientific data available until after recovery. Increasing the challenge, longdistance communication may require relaying data across multiple acoustic hops between vehicles, yet fixed infrastructure is not always appropriate or possible. In this thesis I present an analysis of the unique considerations facing telemetry systems for free-roaming Autonomous Underwater Vehicles (AUVs) used in exploration. These considerations include high-cost vehicle nodes with persistent storage and significant computation capabilities, combined with human surface operators monitoring each node. I then propose mechanisms for interactive, progressive communication of data across multiple acoustic hops. These mechanisms include wavelet-based embedded coding methods, and a novel image compression scheme based on texture classification and synthesis. The specific characteristics of underwater communication channels, including high latency, intermittent communication, the lack of instantaneous end-to-end connectivity, and a broadcast medium, inform these proposals. Human feedback is incorporated by allowing operators to identify segments of data thatwarrant higher quality refinement, ensuring efficient use of limited throughput. I then analyze the performance of these mechanisms relative to current practices. Finally, I present CAPTURE, a telemetry architecture that builds on this analysis. CAPTURE draws on advances in compression and delay tolerant networking to enable progressive transmission of scientific data, including imagery, across multiple acoustic hops. In concert with a physical layer, CAPTURE provides an endto- end networking solution for communicating science data from autonomous marine vehicles. Automatically selected imagery, sonar, and time-series sensor data are progressively transmitted across multiple hops to surface operators. Human operators can request arbitrarily high-quality refinement of any resource, up to an error-free reconstruction. The components of this system are then demonstrated through three field trials in diverse environments on SeaBED, OceanServer and Bluefin AUVs, each in different software architectures.Thanks to the National Science Foundation, and the National Oceanic and Atmospheric Administration for their funding of my education and this work

    Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data

    Get PDF
    Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads. This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements. This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications. In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks. Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures

    Proposta de preservació de dades científiques en accés obert mitjançant tècniques d’anàlisi forense digital

    Get PDF
    [cat][eng] It has long been that funding agencies for research require researchers to facilitate the sharing of research data produced in funded projects which must be open-access available, generally through a repository. Therefore, digital preservation centres are facing the challenge of preservation and long-term storage of research data. The purpose of this thesis is to prove that digital forensics techniques are valid to preserve effectively research data in the social sciences and humanities. To prove this hypothesis, a preservation workflow has been created to provide a technical solution to centres without the means to use data repositories, since the model uses the DSpace open source software. The methodology has involved, firstly, analysing of the bibliography on open research data, on funding agencies for research, on digital forensics use cases in libraries and archives and on organizations specialized on deposit of data. Secondly, a series of interviews to responsible people for DSpace repositories have been conducted to know their opinions regarding the application of the model. Lastly, a series of tests have been done to develop the proposal. Once these tests have been completed, the workflow of the preservation model was defined in which the OAIS terminology was used. The theoretical basis of the model was the study of diverse use cases of digital forensics, of which different methods were adapted. The last step was the study of the DSpace software, in which some tests on a local repository were done. The final conclusions are that the preservation model meets the different requirements of research funding agencies regarding open access, while digital forensic analysis techniques allow to safeguard the integrity of the data, perform diverse data analyses and identify and block personally identifiable information. DSpace software allows the intake of large volumes of data, but it is necessary to enable the FTP ingest function

    Less is More: Restricted Representations for Better Interpretability and Generalizability

    Get PDF
    Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery. Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets. This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime. We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3. We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability. In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6. In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5. Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available. In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8. The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore