45 research outputs found

    Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

    Get PDF
    The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of versionsversions of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the storagerecreation  tradeoffstorage-recreation\;trade-off: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. We have built a prototype version management system, that aims to serve as a foundation to our DATAHUB system for facilitating collaborative data science. We demonstrate, via extensive experiments, that our proposed heuristics provide efficient solutions in practical dataset versioning scenarios

    Delta-based Storage and Querying for Versioned Datasets

    Get PDF
    Data-driven methods and products are becoming increasingly common in a variety of communities, leading to a huge diversity of datasets being continuously generated, modified, and analyzed. An increasingly important consideration for the underlying data management systems is that, all of these datasets and their versions over time need to be stored and queried for a variety of reasons including, but not limited to, reproducibility, collaboration, provenance, auditing, introspective analysis, and backups. However, most solutions today resort to highly ad hoc and manual version management and sharing techniques, that leads to friction when managing collaborative data science workflows, while also introducing inefficiencies. In this dissertation, we introduce a framework for dataset version management, and address the systems building, operator design, and optimization challenges involved in building a dataset version control system. We describe the various challenges and solutions in the context of our system, called DEX, that we have developed to support increasingly complex version management tasks. We show how to use delta-encoding, a key component in managing redundancy, to provide efficient storage and retrieval for the thousands of dataset versions, and develop a formalism to understand the various trade-offs in a principled manner. We study the storage--recreation trade-off in detail and provide a suite of inexpensive heuristics to obtain high-quality solutions under different settings. In order to provide a rich interface to specify version management tasks, we design a new query language, called VQUEL, with the ability to query dataset versions and provenance in a unified manner. We study how assumptions on the delta format can help in the design of a logical algebra, which we then use to execute increasingly complex queries efficiently. A key characteristic of our query execution methods is that the computational cost is primarily dependent on the size and the number of deltas in the expression (typically small), and not the input dataset versions (which can be very large). Finally, we demonstrate the effectiveness of our developed techniques by extensive evaluation of DEX on a mixture of real-world and synthetic datasets

    EFFECT OF BASTI KARMA IN GRIDHRASI-A CASE STUDY

    Get PDF
    Gridhrasi is one of the Nanatmajavyadhis of Vatadosha. The term Gridhrasi indicates the typical gait that resembles of Gridhra i.e. vulture. Ruka (pain), Toda (pricking sensation), Stambha (stiffness) in waist, hip, back of the thigh, knee, calf and foot respectively are the main symptoms. Gridhrasi can be correlated with sciatica in modern science. Improper sitting posture, continuous and over exertion, jerking movements produce structural abnormality in spine may cause sciatica. A 48 years old female patient approached the OPD with radiating pain from lumbar region to left lower limb and difficulty in walking since one year and was diagnosed with Gridhrasi. As Gridhrasi is Vatajavyadhi, Basti is the best treatment for Gridhrasi. Hence for this patient line of treatment was Sarvangaabhyanga with Sahachartail, Sarvangabashpaswed with Dashamoolkwath, Basti in the form of Erandmooladiniruhabasti and Sahachar tail Anuvasanbasti followed by Panchatikta ksheer basti with Guggultikta ghrut is chosen here along with some oral medications Sahacharadikashay Ghana vati, Prasarnyadikashay Ghana vati, Vishatindukvati, Guggultiktakashay was given. This treatment provided marked improvement in signs and symptoms of Gridhrasi. Before treatment Ruka was 4, Aruchi was 1, Toda was 3, Stambha was 4, Gaurav was 2, Spandana was 2, SLRT left side was 4 and right side was 1, walking distance was 3 which turns after treatment to 2,0,1,0,1,1,0, left side-1, right side-0,1 respectively

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    Effective data versioning for collaborative data analytics

    Get PDF
    With the massive proliferation of datasets in a variety of sectors, data science teams in these sectors spend vast amounts of time collaboratively constructing, curating, and analyzing these datasets. Versions of datasets are routinely generated during this data science process, via various data processing operations like data transformation and cleaning, feature engineering and normalization, among others. However, no existing systems enable us to effectively store, track, and query these versioned datasets, leading to massive redundancy in versioned data storage and making true collaboration and sharing impossible. In this thesis, we develop solutions for versioned data management for collaborative data analytics. In the first part of this thesis, we extend a relational database to support versioning of structured data. Specifically, we build a system, OrpheusDB, on top of a relational database with a carefully designed data representation and an intelligent partitioning algorithm for fast version control operations. OrpheusDB inherits much of the same benefits of relational databases, while also compactly storing, keeping track of, and recreating versions on demand. However, OrpheusDB implicitly makes a few assumptions, namely that: (a) the SQL assumption: a SQL-like language is the best fit for querying data and versioning information; (b) the structural assumption: the data is in a relational format with a regular structure; (c) the from-scratch assumption: users adopt OrpheusDB from the very beginning of their project and register each data version along with full metadata in the system. In the second part of this thesis, we remove each of these assumptions, one at a time. First, we remove the SQL assumption and propose a generalized query language for querying data along with versioning and provenance information. Second, we remove the structural assumption and develop solutions for compact storage and fast retrieval of arbitrary data representations. Finally, we remove the “from-scratch” assumption, by developing techniques to infer lineage relationships among versions residing in an existing data repository

    Augmented Reality for Information Kiosk

    Get PDF
    Nowadays people widely use internet for purchasing a home, car, furniture etc.  In order to obtain information for purchasing that product user prefer advertisements, pamphlets, and various sources or obtain the information by means of Salesperson. Though, to receiving such product information on computer or any device, users have to use  lots of mouse and keyboard actions again and again, which is wastage of time and inconvenience. This will reduce the amount of time to gather particular information regarding the particular product. User is also unable to determine its inner dimensions through images. These dimensions can be predicted by using 3D motion tracking of human movements and Augmented Reality. Based on 3D motion tracking of human movements and Augmented Reality application, we introduce a such kind of interaction that is not seen before . In the proposed system, the main aim is to demonstrate that with better interaction features in showrooms as well as online shopping could improve sales by demonstrating the purchasing item more wider. With the help of the our project the customer will be able to view his choices on screen according to him and thereby can make better decisions. In this paper, we proposed hand gesture detection and recognition method to detect hand movements , and then through the hand gestures, control commands are sent to the system that enable user to retrieve data and access from Information Kiosk for better purchase decision. Keywords: 3D motion tracking, Augmented Reality, Hand Gestures, Information Kiosk. Introductio

    Improved Bounds in Stochastic Matching and Optimization

    Get PDF
    We consider two fundamental problems in stochastic optimization: approximation algorithms for stochastic matching, and sampling bounds in the black-box model. For the former, we improve the current-best bound of 3.709 due to Adamczyk et al. (2015), to 3.224; we also present improvements on Bansal et al. (2012) for hypergraph matching and for relaxed versions of the problem. In the context of stochastic optimization, we improve upon the sampling bounds of Charikar et al. (2005)

    Comparison of locking compression plating vs retrograde intramedullary nailing in distal femur extra-articular fractures

    Get PDF
    Background: The purpose of the study was to compare the outcome of distal femur extra articular fractures treated with locking plate and retrograde intramedullary nail.Methods: 86 patients’ distal femur extra-articular fractures were included in the study. 44 patients were operated with intramedullary nailing; 42 patients were operated with locking plate. Results of the 2 groups were compared with regards to clinical and radiological outcome, intraoperative timing and blood loss. Post-operative status of the patients was evaluated using the visual analogue scale, neer score, knee range of motion and radiological union on plain radiographs. Patients were followed-up at 4 weekly intervals from 8 to 28 weeks and then at 1 year.Results: Mean operative time and blood loss was less in intramedullary nailing group whereas intraoperative blood loss was less in the plating group. 6 patients developed surgical site infection in the plating group. Mean-time till radiological union was significantly better in intramedullary nailing group. 7 patients in plating group had issues with union (5 non-union, 2 delayed union) whereas 1 patient in IMN group had nonunion. 93% of intramedullary nailing cases were able to bear full weight at 12 weeks compared to 66% cases in plate group. Knee pain at 6 months was more in intramedullary nailing group.Conclusions: IMN proved to be a better modality of fixation of distal femur fracture fixation in our study in terms of operative time, union rates, infection rates and overall patient outcome if done with proper principles and techniques of intramedullary fixation

    Application of High-Intensity Ultrasound to Improve Food Processing Efficiency: A Review

    Get PDF
    The use of non-thermal processing technologies has grown in response to an ever-increasing demand for high-quality, convenient meals with natural taste and flavour that are free of chemical additions and preservatives. Food processing plays a crucial role in addressing food security issues by reducing loss and controlling spoilage. Among the several non-thermal processing methods, ultrasound technology has shown to be very beneficial. Ultrasound processing, whether used alone or in combination with other methods, improves food quality significantly and is thus considered beneficial. Cutting, freezing, drying, homogenization, foaming and defoaming, filtration, emulsification, and extraction are just a few of the applications for ultrasound in the food business. Ultrasounds can be used to destroy germs and inactivate enzymes without affecting the quality of the food. As a result, ultrasonography is being hailed as a game-changing processing technique for reducing organoleptic and nutritional waste. This review intends to investigate the underlying principles of ultrasonic generation and to improve understanding of their applications in food processing to make ultrasonic generation a safe, viable, and innovative food processing technology, as well as investigate the technology’s benefits and downsides. The breadth of ultrasound’s application in the industry has also been examined. This will also help researchers and the food sector develop more efficient strategies for frequency-controlled power ultrasound in food processing applications
    corecore