5 research outputs found

    Resurrection: Rethinking Magnetic Tapes For Cost Efficient Data Preservation

    Get PDF
    With the advent of Big Data technologies-the capacity to store and efficiently process large sets of data, doors of opportunities for developing business intelligence that was previously unknown, has opened. Each phase in the processing of this data requires specialized infrastructures. One such phase, the preservation and archiving of data, has proven its usefulness time and again. Data archives are processed using novel data mining methods to elicit vital data gathered over long periods of time and efficiently audit the growth of a business or an organization. Data preservation is also an important aspect of business processes which helps in avoiding loss of important information due to system failures, human errors and natural calamities. This thesis investigates the need, discusses possibilities and presents a novel, highly cost-effective, unified, long- term storage solution for data. Some of the common processes followed in large-scale data warehousing systems are analyzed for overlooked, inordinate shortcomings and a profitably feasible solution is conceived for them. The gap between the general needs of 'efficient' long-term storage and common, current functionalities is analyzed. An attempt to bridge this gap is made through the use of a hybrid, hierarchical media based, performance enhancing middleware and a monolithic namespace filesystem in a new storage architecture, Tape Cloud. The scope of studies carried out by us involves interpreting the effects of using heterogeneous storage media in terms of operational behavior, average latency of data transactions and power consumption. The results show the advantages of the new storage system by demonstrating the difference in operating costs, personnel costs and total cost of ownership from varied perspectives in a business model.Computer Science, Department o

    Metaserver locality and scalability in a distributed NFS

    Get PDF
    The p(2) model is a statistical model for the analysis of binary relational data with covariates, as occur in social network studies. It can be characterized as a multinomial regression model with crossed random effects that reflect actor heterogeneity and dependence between the ties from and to the same actor in the network. Three Markov chain Monte Carlo (MCMC) estimation methods for the p2 model are presented to improve iterative generalized least squares (IGLS) estimation developed earlier, two of which use random walk proposals. The third method, an independence chain sampler, and one of the random walk algorithms use normal approximations of the binary network data to generate proposals in the MCMC algorithms. A large-scale simulation study compares MCMC estimates with IGLS estimates for networks with 20 and 40 actors. It was found that the IGLS estimates have a smaller variance but are severely biased, while the MCMC estimates have a larger variance with a small bias. For networks with 20 actors, mean squared errors are generally comparable or smaller for the IGLS estimates. For networks with 40 actors, mean squared errors are the smallest for the MCMC estimates. Coverage rates of confidence intervals are good for the MCMC estimates but not for the IGLS estimates
    corecore