474 research outputs found

    Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,

    Full text link
    The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience. Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats. PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd

    Interactive Programming by Example

    Get PDF
    As of today, programming has never been so accessible. Yet, it remains a challenge for end-users: students, non-technical employees, experts in their domains outside of computer science, and so on. With its forecast potential for solving problems by only observing inputs and outputs, programming-by-example was supposed to alleviate complex tasks requiring programming for end-users. The initial ideas of macro-based editors paved the way to subsequent practical solutions, such as spreadsheet transformations from examples. Finding the right program is the core of the programming-by-example systems. However, users find it difficult to trust such generated programs. In this thesis, we contribute to proving that some forms of interaction alleviate, by having users provide examples, the problem of finding correct and reliable programs. We first report on two experiments that enable us to conjecture what kind of interaction brings benefits to programming-by-example. First, we present a new kind of game engine, Pong Designer. In this game engine, by using their finger, users program rules on the fly, by modifying the game state. We analyze its potential, and its eventual downsides that have probably prevented its wide adoption. Second, we present StriSynth, an interactive command-line tool that uses programming-by-example to transform string and collections. The resulting programs can also rename or otherwise manage files. We obtained the result that confirms that many users preferred StriSynth over usual programming languages, but would appreciate to have both. We then report on two new exciting experiments with verified results, using two forms of interaction truly benefiting programming-by-example. Third, on top of a programmingby- example-based engine for extracting structured data out of text files, in this thesis we study two interaction models implemented in a tool named FlashProg: a view of the program with notification about ambiguities, and the asking of clarification questions. In this thesis, we prove that these two interaction models enable users to perform tasks with less errors and to be more confident with the results. Last, for learning recursive tree-to-string functions (e.g., pretty-printers), in this thesis we prove that questioning breaks down the learning complexity from a cubic to a linear number of questions, in practice making programming-by-example even more accessible than regular programming. The implementation, named Prosy, could be easily added to integrated development environments

    User Interaction Models for Disambiguation in Programming by Example

    Get PDF
    Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though often easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system. We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program. Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result

    Electronic, structural, and optical properties of Y2WO6, a host material for inorganic phosphors

    Get PDF
    Optimization by first principles DFT-based electronic structure methods of the crystal structures for the five polymorphs of Y2WO6 reported in the literature yields results in good agreement with those determined experimentally by X-ray diffraction. The monoclinic P2/c phase appears to be the most stable one at ambient conditions, although high temperature orthorhombic phases with larger molar volumes could be favoured upon replacement of Y3þ cations by larger Ln3þ ones, and hence, provide plausible structures for Y2WO6:Ln3þ phosphors at ambient conditions. For all polymorphs the top of the valence band is dominated by O2p orbitals with a relatively narrow WO6-centred conduction band appearing just below a broad Y4d-centred band. Insertion energies for Eu3þ replacing Y3þ are estimated to be in the range of 3e4eV per cation, with the smaller values corresponding to substitutions into the larger octacoordinated Y3þ sites

    INTEGRATED MODELING OF RELIABILITY AND PERFORMANCE OF 4H-SILICON CARBIDE POWER MOSFETS USING ATOMISTIC AND DEVICE SIMULATIONS

    Get PDF
    4H-Silicon Carbide (4H-SiC) power MOSFET is a promising technology for future high-temperature and high-power electronics. However, poor device reliability and performance, that stem from the inferior quality of 4H-SiC/SiO2 interface, have hindered its development. This dissertation investigates the role of interfacial and near-interfacial atomic defects as the root cause of these key concerns. Additionally, it explores device processing strategies for mitigating reliability-limiting defects. In order to understand the atomic nature of material defects, and their manifestations in electrical measurements, this work employs an integrated modeling approach together with experiments. Here, the electronic and structural properties of defects are analyzed using first-principles hybrid Density Functional Theory (DFT). The insights from first-principles calculations are integrated with conventional physics-based modeling techniques like Drift-Diffusion and Rate equation simulations to model various device characteristics. Subsequently, the atomic-level models are validated by comparison with experiments. From device reliability perspective, this dissertation models the time-dependent worsening of threshold voltage (Vth) instability in 4H-SiC MOSFETs operated under High-Temperature and Gate-Bias (HTGB) conditions. It proposes a DFT-based oxygen-vacancy hole trap activation model, where certain originally ‘electrically inactive’ oxygen vacancies are structurally transformed under HTGB stress to form electrically ‘active’ switching oxide hole traps. The transients of this atomistic process were simulated with inputs from DFT. The calculated time-evolution of the buildup of positively charged vacancies correlated well with the experimentally measured time-dependence of HTGB-induced Vth instability. Moreover, this work designates near-interfacial single carbon interstitial defect in SiO2 as an additional switching oxide hole trap that could cause room-temperature Vth instability. This work employs DFT-based molecular dynamics to develop device processing strategies that could mitigate reliability-limiting defects in 4H-SiC MOSFETs. It identifies Fluorine treatment to be effective in neutralizing oxygen vacancy and carbon-related hole traps, unlike molecular hydrogen. Similarly, Nitric Oxide passivation is found to eliminate carbon-related defects. From device performance perspective, this dissertation proposes a methodology to identify and quantify channel mobility-limiting interfacial defects by integrating Drift-Diffusion simulations of 4H-SiC power MOSFET with DFT. It identifies the density of interface trap spectrum to be composed of three atomically distinct defects, one of which is potentially carbon di-interstitial defect
    • …
    corecore