474 research outputs found
Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,
The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience.
Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats.
PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd
Interactive Programming by Example
As of today, programming has never been so accessible. Yet, it remains a challenge for end-users: students, non-technical employees, experts in their domains outside of computer science, and so on. With its forecast potential for solving problems by only observing inputs and outputs, programming-by-example was supposed to alleviate complex tasks requiring programming for end-users. The initial ideas of macro-based editors paved the way to subsequent practical solutions, such as spreadsheet transformations from examples. Finding the right program is the core of the programming-by-example systems. However, users find it difficult to trust such generated programs. In this thesis, we contribute to proving that some forms of interaction alleviate, by having users provide examples, the problem of finding correct and reliable programs. We first report on two experiments that enable us to conjecture what kind of interaction brings benefits to programming-by-example. First, we present a new kind of game engine, Pong Designer. In this game engine, by using their finger, users program rules on the fly, by modifying the game state. We analyze its potential, and its eventual downsides that have probably prevented its wide adoption. Second, we present StriSynth, an interactive command-line tool that uses programming-by-example to transform string and collections. The resulting programs can also rename or otherwise manage files. We obtained the result that confirms that many users preferred StriSynth over usual programming languages, but would appreciate to have both. We then report on two new exciting experiments with verified results, using two forms of interaction truly benefiting programming-by-example. Third, on top of a programmingby- example-based engine for extracting structured data out of text files, in this thesis we study two interaction models implemented in a tool named FlashProg: a view of the program with notification about ambiguities, and the asking of clarification questions. In this thesis, we prove that these two interaction models enable users to perform tasks with less errors and to be more confident with the results. Last, for learning recursive tree-to-string functions (e.g., pretty-printers), in this thesis we prove that questioning breaks down the learning complexity from a cubic to a linear number of questions, in practice making programming-by-example even more accessible than regular programming. The implementation, named Prosy, could be easily added to integrated development environments
User Interaction Models for Disambiguation in Programming by Example
Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though often easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system. We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program. Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result
Recommended from our members
Molecular Vibrations and Shape-Selectivity: A Computational Model of Biofuel Precursors in Zeolites
We have used Density Functional Theory (DFT) to model acyclic and cyclic olefins in acidic zeolites. We have studied the impact of host-guest interactions between adsorbed molecules and zeolite frameworks through the lens of molecular vibrations and shape-selectivity. This work considered three zeolite frameworks with varying pore structures and environments: large pore zeolite HMOR and medium pore zeolites HZSM-5 and HZSM-22. A key finding is that for acyclic olefins in acidic zeolites there exists two regimes of host-guest interaction: a strong interaction leading to protonation and a weak interaction between charged guest and zeolite framework. We found that these interactions manifest in the IR spectra such that protonation leads to significant changes in band position for allylic vibrations, vam(C=C─C+), and in contrast these band positions are left substantially unchanged due to the weaker Coulombic interaction. These results indicate that to model acyclic olefins in acidic zeolites one only need to consider the protonated state in the gas phase.
We worked in close collaboration with zeolite experimentalists E. Hernandez and F. Jentoft at the University of Massachusetts Amherst Chemical Engineering department to investigate the presence of shape-selectivity during the formation of alkylcyclopentenyl cations from acyclic precursors in acidic zeolites. We incorporated DFT models and configurational sampling to establish band positions associated with allylic stretching vas(C=C─C+) in cyclopentenyl cations. We found that the band position of this stretch was sensitive to the substitution pattern on the allylic system of the ring, such that a methyl substitution instead of a hydrogen at the center carbon (C-2) resulted in a ~ 20 cm-1 red-shift in the IR band. Our collaborative efforts also found that the formation of these alkylcyclopentenyl cations in zeolites is shape-selective; the C-2 methyl-substituted alkylcyclopentenyl cation forms in larger pore HMOR whereas in medium pore zeolites, HZSM-5 and HZSM-22, the C-2 hydrogen-substituted alkylcyclopentenyl appears to be the main product. We performed DFT-based thermodynamics calculations and found that the relative stability of the methyl-substituted alkylcyclopentenyl cation remained unchanged in all three zeolites. This suggests that the formation of these alkylcyclopentenyl cations is not under thermodynamic control.
We used DFT calculations to build a microkinetic model of the isomerization of alkylcyclopentenyl cations in HZSM-5 and HZSM-22 zeolites, using both finite-temperature dynamics and zero-Kelvin path methods to compute barriers. We found that the isomerization leading to experimentally relevant alkylcyclopentenyl cations with C-2 methyl (T-type) or hydrogen (K-type) substitutions occurs through a multi-pathway reaction network. We found that the pathways were similar in the two zeolites, but the populations at equilibrium differed such that one T-type product formed in HZSM-5 and two formed in HZSM-22 with evidence of kinetic control of product formation
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Electronic, structural, and optical properties of Y2WO6, a host material for inorganic phosphors
Optimization by first principles DFT-based electronic structure methods of the crystal structures for the five polymorphs of Y2WO6 reported in the literature yields results in good agreement with those determined experimentally by X-ray diffraction. The monoclinic P2/c phase appears to be the most stable one at ambient conditions, although high temperature orthorhombic phases with larger molar volumes could be favoured upon replacement of Y3þ cations by larger Ln3þ ones, and hence, provide plausible structures for Y2WO6:Ln3þ phosphors at ambient conditions. For all polymorphs the top of the valence band is dominated by O2p orbitals with a relatively narrow WO6-centred conduction band appearing just below a broad Y4d-centred band. Insertion energies for Eu3þ replacing Y3þ are estimated to be in the range of 3e4eV per cation, with the smaller values corresponding to substitutions into the larger octacoordinated Y3þ sites
INTEGRATED MODELING OF RELIABILITY AND PERFORMANCE OF 4H-SILICON CARBIDE POWER MOSFETS USING ATOMISTIC AND DEVICE SIMULATIONS
4H-Silicon Carbide (4H-SiC) power MOSFET is a promising technology for future high-temperature and high-power electronics. However, poor device reliability and performance, that stem from the inferior quality of 4H-SiC/SiO2 interface, have hindered its development. This dissertation investigates the role of interfacial and near-interfacial atomic defects as the root cause of these key concerns. Additionally, it explores device processing strategies for mitigating reliability-limiting defects.
In order to understand the atomic nature of material defects, and their manifestations in electrical measurements, this work employs an integrated modeling approach together with experiments. Here, the electronic and structural properties of defects are analyzed using first-principles hybrid Density Functional Theory (DFT). The insights from first-principles calculations are integrated with conventional physics-based modeling techniques like Drift-Diffusion and Rate equation simulations to model various device characteristics. Subsequently, the atomic-level models are validated by comparison with experiments.
From device reliability perspective, this dissertation models the time-dependent worsening of threshold voltage (Vth) instability in 4H-SiC MOSFETs operated under High-Temperature and Gate-Bias (HTGB) conditions. It proposes a DFT-based oxygen-vacancy hole trap activation model, where certain originally ‘electrically inactive’ oxygen vacancies are structurally transformed under HTGB stress to form electrically ‘active’ switching oxide hole traps. The transients of this atomistic process were simulated with inputs from DFT. The calculated time-evolution of the buildup of positively charged vacancies correlated well with the experimentally measured time-dependence of HTGB-induced Vth instability. Moreover, this work designates near-interfacial single carbon interstitial defect in SiO2 as an additional switching oxide hole trap that could cause room-temperature Vth instability.
This work employs DFT-based molecular dynamics to develop device processing strategies that could mitigate reliability-limiting defects in 4H-SiC MOSFETs. It identifies Fluorine treatment to be effective in neutralizing oxygen vacancy and carbon-related hole traps, unlike molecular hydrogen. Similarly, Nitric Oxide passivation is found to eliminate carbon-related defects.
From device performance perspective, this dissertation proposes a methodology to identify and quantify channel mobility-limiting interfacial defects by integrating Drift-Diffusion simulations of 4H-SiC power MOSFET with DFT. It identifies the density of interface trap spectrum to be composed of three atomically distinct defects, one of which is potentially carbon di-interstitial defect
- …