16 research outputs found
Towards a set of metrics to guide the generation of fake computer file systems
Fake file systems are used in the field of cyber deception to bait intruders and fool forensic investigators. File system researchers also frequently generate their own synthetic document repositories, due to data privacy and copyright concerns associated with experimenting on real-world corpora. For both these fields, realism is critical. Unfortunately, after creating a set of files and folders, there are no current testing standards that can be applied to validate their authenticity, or conversely, reliably automate their detection. This paper reviews the previous 30 years of file system surveys on real world corpora, to identify a set of discrete measures for generating synthetic file systems. Statistical distributions, such as size, age and lifetime of files, common file types, compression and duplication ratios, directory distribution and depth (and its relationship with numbers of files and sub-directories) were identified and the respective merits discussed. Additionally, this paper highlights notable absences in these surveys, which could be beneficial, such as analysing, on mass, the text content distribution, file naming habits, and comparing file access times against traditional working hours
Big data analytics in computational biology and bioinformatics
Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference.
The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures.
The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone.
Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment
Exodisk--maximizing application control over storage management
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 67-72).by Robert Grimm.M.Eng
Recommended from our members
Improving the Performance of Wide Area Networks
Research in to the performance of wide area data networks is described in this thesis. A model of wide area network packet delays is developed and used to direct the research in to methods of improving performance.
Wide area networks are slow and expensive compared to the computer systems that rely on them for communication. Typically data networks are packet switched in order to make efficient use of resources. This can lead to contention, and the mechanisms for resolving contention can bring about further delays when demand for resources is high. In this thesis, network users are viewed as interacting decision makers with conflicting interests, and Game Theory is used to analyse the effects users have on each other’s performance. It is asserted in this thesis that wide area network performance is an ethical issue as well as a technical one.
Compression is examined as a technique for reducing network traffic load. While load reductions can reduce the time packets spend waiting in buffer queues experimental results show the compression process itself can present a bottleneck if CPU resources are limited.
The other inhibiting factor with regard to wide area network performance is the time it takes for a signal to propagate through a transmission medium. Propagation delays are bounded by the speed of light and becomes significant as the distance between computer systems increases. Mirrors and Caches are methods of bringing data closer to the user, thereby reducing propagation delays and capping traffic loads on long haul communication facilities. The performance benefits of replicating data within a wide area network environment are studied in this thesis
Virtual Online Worlds: Towards a Collaborative Space for Architects
Although research has been trickling forth in the last eight years about online collaboration and use of virtual online worlds (VOW) amongst architects and architectural students (2006-2010), little discussion is dedicated to how the use of VOWs have improved collaboration, communication and quality of design for those that have used it. Researching VOWs and their use in architecture was a difficult task since much of what needed to be found was scattered amongst the fields of education, construction engineering, computer science and even online blogs dedicated to architecture in video games. An analysis of those findings has contributed to the development of a pilot project conducted in a VOW called Blue Mars. The project was set up in order to discover how VOWs improve communication skills of its users and analyze what happens when architecture students are allowed to virtually experience their designs as avatars. This study is part of a growing body of research on the exploration of virtual online worlds in the practice of architecture both in the classroom and out in the field
The Design of a High-Integrity Disk Management Subsystem
This dissertation describes and experimentally evaluates the design of the Logical Disk, a disk management subsystem that guarantees the integrity of data stored on disk even after system failures, while still providing performance competitive to other storage systems. Current storage systems that use the hard disk as storage medium, such as file systems, often do not provide sufficient protection against loss of data after a system failure. The designers of such systems are afraid that the amount of effort necessary for data protection would also result in too much loss of performance. The Logical Disk uses many different techniques to guarantee data integrity, including the support to execute multiple commands as one atomic action and avoiding `in-place updates' at all times. The techniques used to provide competitive performance include the technique of combining many, small write commands into one large, sequential, and thus efficient, write to disk, and clustering the data on disk continuously and automatically.Tanenbaum, A.S. [Promotor]Jonge, W. de [Copromotor
Pyxis : um sistema de arquivos distribuido
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro TecnologicoEste texto apresenta o PYXIS, um sistema de arquivos distribuído portável com alto grau de paralelismo interno, desenhado para ser flexível no que diz respeito ao ambiente sobre o qual seus componentes são distribuídos, possibilitando sua execução em multicomputadores ou em redes de computadores. O projeto foi desenvolvido no Curso de Pós-Graduação em Ciências da Computação da Universidade Federal de Santa Catarina (CPGCC/UFSC) e deverá integrar um projeto coletivo das universidades federais de Santa Catarina (UFSC), do Rio Grande do Sul (UFRGS) e de Santa Maria (UFSM), que visa desenvolver um multicomputador e um ambiente para programação paralela sobre ele