9,857 research outputs found

    Procedures and tools for acquisition and analysis of volatile memory on android smartphones

    Get PDF
    Mobile phone forensics have become more prominent since mobile phones have become ubiquitous both for personal and business practice. Android smartphones show tremendous growth in the global market share. Many researchers and works show the procedures and techniques for the acquisition and analysis the non-volatile memory inmobile phones. On the other hand, the physical memory (RAM) on the smartphone might retain incriminating evidence that could be acquired and analysed by the examiner. This study reveals the proper procedure for acquiring the volatile memory inthe Android smartphone and discusses the use of Linux Memory Extraction (LiME) for dumping the volatile memory. The study also discusses the analysis process of the memory image with Volatility 2.3, especially how the application shows its capability analysis. Despite its advancement there are two major concerns for both applications. First, the examiners have to gain root privileges before executing LiME. Second, both applications have no generic solution or approach. On the other hand, currently there is no other tool or option that might give the same result as LiME and Volatility 2.3

    STT-MRAM Based NoC Buffer Design

    Get PDF
    As Chip Multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) is a major bottleneck in CMP design. An emerging non-volatile memory - STT MRAM (Spin-Torque Transfer Magnetic RAM) which provides substantial power and area savings, near zero leakage power, and displays higher memory density compared to conventional SRAM. But STT-MRAM suffers from inherit drawbacks like multi cycle write latency and high write power consumption. So, these problem have to addressed in order to have an efficient design to incorporate STT-MRAM for NoC input buffer instead of traditional SRAM based input buffer design. Motivated by short intra-router latency, previously proposed write latency reduction technique is explored by sacrificing retention time and a hybrid design of input buffers using both SRAM and STT-MRAM to "hide" the long write latency efficiently is proposed. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer is also proposed

    Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators

    Get PDF
    We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively

    Characterizing Deep-Learning I/O Workloads in TensorFlow

    Full text link
    The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

    Full TCP/IP for 8-Bit architectures

    Get PDF
    We describe two small and portable TCP/IP implementations fulfilling the subset of RFC1122 requirements needed for full host-to-host interoperability. Our TCP/IP implementations do not sacrifice any of TCP's mechanisms such as urgent data or congestion control. They support IP fragment reassembly and the number of multiple simultaneous connections is limited only by the available RAM. Despite being small and simple, our implementations do not require their peers to have complex, full-size stacks, but can communicate with peers running a similarly light-weight stack. The code size is on the order of 10 kilobytes and RAM usage can be configured to be as low as a few hundred bytes
    • …
    corecore