1,325 research outputs found
Assessing and Remedying Coverage for a Given Dataset
Data analysis impacts virtually every aspect of our society today. Often,
this analysis is performed on an existing dataset, possibly collected through a
process that the data scientists had limited control over. The existing data
analyzed may not include the complete universe, but it is expected to cover the
diversity of items in the universe. Lack of adequate coverage in the dataset
can result in undesirable outcomes such as biased decisions and algorithmic
racism, as well as creating vulnerabilities such as opening up room for
adversarial attacks.
In this paper, we assess the coverage of a given dataset over multiple
categorical attributes. We first provide efficient techniques for traversing
the combinatorial explosion of value combinations to identify any regions of
attribute space not adequately covered by the data. Then, we determine the
least amount of additional data that must be obtained to resolve this lack of
adequate coverage. We confirm the value of our proposal through both
theoretical analyses and comprehensive experiments on real data.Comment: in ICDE 201
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
Circles of Privacy
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163476/2/jcpy1188_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163476/1/jcpy1188.pd
The influence of atmosphere on the performance of pure-phase WZ and ZB InAs nanowire transistors
We compare the characteristics of phase-pure MOCVD grown ZB and WZ InAs
nanowire transistors in several atmospheres: air, dry pure N and O, and
N bubbled through liquid HO and alcohols to identify whether
phase-related structural/surface differences affect their response. Both WZ and
ZB give poor gate characteristics in dry state. Adsorption of polar species
reduces off-current by 2-3 orders of magnitude, increases on-off ratio and
significantly reduces sub-threshold slope. The key difference is the greater
sensitivity of WZ to low adsorbate level. We attribute this to facet structure
and its influence on the separation between conduction electrons and surface
adsorption sites. We highlight the important role adsorbed species play in
nanowire device characterisation. WZ is commonly thought superior to ZB in InAs
nanowire transistors. We show this is an artefact of the moderate humidity
found in ambient laboratory conditions: WZ and ZB perform equally poorly in the
dry gas limit yet equally well in the wet gas limit. We also highlight the
vital role density-lowering disorder has in improving gate characteristics, be
it stacking faults in mixed-phase WZ or surface adsorbates in pure-phase
nanowires.Comment: Accepted for publication in Nanotechnolog
RRR: Rank-Regret Representative
Selecting the best items in a dataset is a common task in data exploration.
However, the concept of "best" lies in the eyes of the beholder: different
users may consider different attributes more important, and hence arrive at
different rankings. Nevertheless, one can remove "dominated" items and create a
"representative" subset of the data set, comprising the "best items" in it. A
Pareto-optimal representative is guaranteed to contain the best item of each
possible ranking, but it can be almost as big as the full data. Representative
can be found if we relax the requirement to include the best item for every
possible user, and instead just limit the users' "regret". Existing work
defines regret as the loss in score by limiting consideration to the
representative instead of the full data set, for any chosen ranking function.
However, the score is often not a meaningful number and users may not
understand its absolute value. Sometimes small ranges in score can include
large fractions of the data set. In contrast, users do understand the notion of
rank ordering. Therefore, alternatively, we consider the position of the items
in the ranked list for defining the regret and propose the {\em rank-regret
representative} as the minimal subset of the data containing at least one of
the top- of any possible ranking function. This problem is NP-complete. We
use the geometric interpretation of items to bound their ranks on ranges of
functions and to utilize combinatorial geometry notions for developing
effective and efficient approximation algorithms for the problem. Experiments
on real datasets demonstrate that we can efficiently find small subsets with
small rank-regrets
Electrical isolation of GaN by MeV ion irradiation
The evolution of sheet resistance of n-type GaN epilayers exposed to irradiation with MeV H, Li, C, and O ions is studied in situ. Results show that the threshold dose necessary for complete isolation linearly depends on the original free electron concentration and reciprocally depends on the number of atomic displacements produced by ion irradiation. Furthermore, such isolation is stable to rapid thermal annealing at temperatures up to 900 °C. In addition to providing a better understanding of the physical mechanisms responsible for electrical isolation, these results can be used for choosing implant conditions necessary for an effective electrical isolation of GaN-based devices.This work was partly supported by Conselho Nacional
de Pesquisas (CNPq, Brazil) under Contract No. 200541/
99-4
Database Management for Life Science Research: Summary Report of the Workshop on Data Management for Molecular and Cell Biology at the National Library of Medicine, Bethesda, Maryland, February 2–3, 2003
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63107/1/153623103322006797.pd
- …