332 research outputs found

    Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,

    Full text link
    The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience. Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats. PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd

    Device-based decision-making for adaptation of three-dimensional content

    Get PDF
    The goal of this research was the creation of an adaptation mechanism for the delivery of three-dimensional content. The adaptation of content, for various network and terminal capabilities - as well as for different user preferences, is a key feature that needs to be investigated. Current state-of-the art research of the adaptation shows promising results for specific tasks and limited types of content, but is still not well-suited for massive heterogeneous environments. In this research, we present a method for transmitting adapted three-dimensional content to multiple target devices. This paper presents some theoretical and practical methods for adapting three-dimensional content, which includes shapes and animation. We also discuss practical details of the integration of our methods into MPEG-21 and MPEG-4 architecture

    Granite: A scientific database model and implementation

    Get PDF
    The principal goal of this research was to develop a formal comprehensive model for representing highly complex scientific data. An effective model should provide a conceptually uniform way to represent data and it should serve as a framework for the implementation of an efficient and easy-to-use software environment that implements the model. The dissertation work presented here describes such a model and its contributions to the field of scientific databases. In particular, the Granite model encompasses a wide variety of datatypes used across many disciplines of science and engineering today. It is unique in that it defines dataset geometry and topology as separate conceptual components of a scientific dataset. We provide a novel classification of geometries and topologies that has important practical implications for a scientific database implementation. The Granite model also offers integrated support for multiresolution and adaptive resolution data. Many of these ideas have been addressed by others, but no one has tried to bring them all together in a single comprehensive model. The datasource portion of the Granite model offers several further contributions. In addition to providing a convenient conceptual view of rectilinear data, it also supports multisource data. Data can be taken from various sources and combined into a unified view. The rod storage model is an abstraction for file storage that has proven an effective platform upon which to develop efficient access to storage. Our spatial prefetching technique is built upon the rod storage model, and demonstrates very significant improvement in access to scientific datasets, and also allows machines to access data that is far too large to fit in main memory. These improvements bring the extremely large datasets now being generated in many scientific fields into the realm of tractability for the ordinary researcher. We validated the feasibility and viability of the model by implementing a significant portion of it in the Granite system. Extensive performance evaluations of the implementation indicate that the features of the model can be provided in a user-friendly manner with an efficiency that is competitive with more ad hoc systems and more specialized application specific solutions

    Model-Based Time Series Management at Scale

    Get PDF

    Mapping Myocardial Elasticity with Intracardiac Acoustic Radiation Force Impulse Methods

    Get PDF
    <p>Implemented on an intracardiac echocardiography transducer, acoustic radiation force methods may provide a useful means of characterizing the heart's elastic properties. Elasticity imaging may be of benefit for diagnosis and characterization of infarction and heart failure, as well as for guidance of ablation therapy for the treatment of arrhythmias. This thesis tests the hypothesis that with appropriately designed imaging sequences, intracardiac acoustic radiation force impulse (ARFI) imaging and shear wave elasticity imaging (SWEI) are viable tools for quantification of myocardial elasticity, both temporally and spatially. Multiple track location SWEI (MTL-SWEI) is used to show that, in healthy in vivo porcine ventricles, shear wave speeds follow the elasticity changes with contraction and relaxation of the myocardium, varying between 0.9 and 2.2 m/s in diastole and 2.6 and 5.1 m/s in systole. Infarcted tissue is less contractile following infarction, though not unilaterally stiffer. Single-track-location SWEI (STL-SWEI) is proven to provide suppression of speckle noise and enable improved resolution of structures smaller than 2 mm in diameter compared to ARFI and MTL-SWEI. Contrast to noise ratio and lateral edge resolution are shown to vary with selection of time step for ARFI and arrival time regression filter size for STL-SWEI and MTL-SWEI. </p><p>In 1.5 mm targets, STL-SWEI achieves alternately the tightest resolution (0.3 mm at CNR = 3.5 for a 0.17 mm filter) and highest CNR (8.5 with edge width = 0.7 mm for a 0.66 mm filter) of the modalities, followed by ARFI and then MTL-SWEI.</p><p>In larger, 6 mm targets, the CNR-resolution tradeoff curves for ARFI and STL-SWEI overlap for ARFI time steps up to 0.5 ms and kernels ≤\leq1 mm for STL-SWEI. STL-SWEI can operate either with a 25 dB improvement over MTL-SWEI in CNR at the same resolution, or with edge widths 5×\times as narrow at equivalent CNR values, depending on the selection of regression filter size. Ex vivo ablations are used to demonstrate that ARFI, STL-SWEI and MTL-SWEI each resolve ablation lesions between 0.5 and 1 cm in diameter and gaps between lesions smaller than 5 mm in 3-D scans. Differences in contrast, noise, and resolution between the modalities are discussed. All three modalities are also shown to resolve ``x''-shaped ablations up to 22 mm in depth with good visual fidelity and correspondence to surface photographs, with STL-SWEI providing the highest quality images. Series of each type of image, registered using 3-D data from an electroanatomical mapping system, are used to build volumes that show ablations in in vivo canine atria. In vivo images are shown to be subject to increased noise due to tissue and transducer motion, and the challenges facing the proposed system are discussed. Ultimately, intracardiac acoustic radiation force methods are demonstrated to be promising tools for characterizing dynamic myocardial elasticity and imaging radiofrequency ablation lesions.</p>Dissertatio

    Time Series Management Systems: A 2022 Survey

    Get PDF
    • …
    corecore