164 research outputs found
Integrating Cultural Knowledge into Artificially Intelligent Systems: Human Experiments and Computational Implementations
With the advancement of Artificial Intelligence, it seems as if every aspect of our lives is impacted by AI in one way or the other. As AI is used for everything from driving vehicles to criminal justice, it becomes crucial that it overcome any biases that might hinder its fair application. We are constantly trying to make AI be more like humans. But most AI systems so far fail to address one of the main aspects of humanity: our culture and the differences between cultures. We cannot truly consider AI to have understood human reasoning without understanding culture. So it is important for cultural information to be embedded into AI systems in some way, as well as for the AI systems to understand the differences across these cultures.
The main way I have chosen to do this are using two cultural markers: motifs and rituals. This is because they are both so inherently part of any culture. Motifs are things that are repeated often and are grounded in well-known stories, and tend to be very specific to individual cultures. Rituals are something that are part of every culture in some way, and while there are some that are constant across all cultures, some are very specific to individual ones. This makes them great to compare and to contrast.
The first two parts of this dissertation talk about a couple of cognitive psychology studies I conducted. The first is to see how people understood motifs. Is is true that in-culture people identify motifs better than out-culture people? We see that my study shows this to indeed be the case. The second study attempts to test if motifs are recognizable in texts, regardless of whether or not people might understand their meaning. Our results confirm our hypothesis that motifs are recognizable.
The third part of my work discusses the survey and data collection effort around rituals. I collected data about rituals from people from various national groups, and observed the differences in their responses. The main results from this was twofold: first, that cultural differences across groups are quantifiable, and that they are prevalent and observable with proper effort; and second, to collect and curate a substantial culturally sensitive dataset that can have a wide variety of use across various AI systems.
The fourth part of the dissertation focuses on a system I built, called the motif association miner, which provides information about motifs present in input text, like associations, sources of motifs, connotations, etc. This information will be highly useful as this will enable future systems to use my output as input for their systems, and have a better understanding of motifs, especially as this shows an approach of bringing out meaning of motifs specific to certain culture to wider usage.
As the final contribution, this thesis details my efforts to use the curated ritual data to improve existing Question Answering system, and show that this method helps systems perform better in situations which vary by culture. This data and approach, which will be made publicly available, will enable others in the field to take advantage of the information contained within to try and combat some bias in their systems
Compiler-directed Dynamic Linking for Mobile Programs
In this paper, we present a compiler-directed technique for safe
dynamic linking for mobile programs. Our technique guarantees that
linking failures can occur only when a program arrives at a new
execution site and that this failure can be delivered to the program
as an error code or an exception. We use interprocedural analysis to
identify the set of names that must be linked at the different sites
the program executes on. We use a combination of runtime and
compile-time techniques to identify the calling context and to link
only the names needed in that context. Our technique is able to handle
recursive programs as well as separately compiled code that may itself
be able to move. We discuss language constructs for controlling the
behavior of dynamic linking and the implication of some of these
constructs for application structure.
(Also cross-referenced as UMIACS-TR-96-81
A Study of Internet Round-Trip Delay
We present the results of a study of Internet round-trip delay. The
links chosen include links to frequently accessed commercial hosts as
well as well-known academic and foreign hosts. Each link was studied
for a 48-hour period. We attempt to answer the following questions:
(1) how rapidly and in what manner does the delay change -- in this
study, we focus on medium-grain (seconds/minutes) and coarse-grain
time-scales (tens of minutes/hours); (2) what does the frequency
distribution of delay look like and how rapidly does it change; (3)
what is a good metric to characterize the delay for the purpose of
adaptation. Our conclusions are: (a) there is large temporal and
spatial variation in round-trip time (RTT); (b) RTT distribution is
usually unimodal and asymmetric and has a long tail on the right hand
side; (c) RTT observations in most time periods are tightly clustered
around the mode; (d) the mode is a good characteristic value for RTT
distributions; (e) RTT distributions change slowly; (f) persistent
changes in RTT occur slowly, sharp changes are undone very shortly;
(g) jitter in RTT observations is small and (h) inherent RTT occurs
frequently.
(Also cross-referenced as UMIACS-TR-96-97
An Interprocedural Framework for Placement of Asychronous I/O Operations
Overlapping memory accesses with computations is a standard
technique for improving performance on modern architectures, which have
deep memory hierarchies. In this paper, we present a compiler technique
for overlapping accesses to secondary memory (disks) with computation. We
have developed an Interprocedural Balanced Code Placement (IBCP)
framework, which performs analysis on arbitrary recursive procedures and
arbitrary control flow and replaces synchronous I/O operations with a
balanced pair of asynchronous operations. We demonstrate how this
analysis is useful for applications which perform frequent and large
accesses to secondary memory, including applications which snapshot or
checkpoint their computations or out-of-core applications.
(Also cross-referenced as UMIACS-TR-95-114
Study of Scalable Declustering Algorithms for Parallel Grid Files
Efficient storage and retrieval of large multidimensional datasets is
an important concern for large-scale scientific computations such as
long-running time-dependent simulations which periodically generate
snapshots of the state.
The main challenge for efficiently handling such datasets
is to minimize response time for multidimensional range queries.
The grid file is one of the well known access methods for
multidimensional and spatial data.
We investigate effective and scalable declustering techniques
for grid files with the primary goal of minimizing response time
and the secondary goal of maximizing the fairness of data distribution.
The main contributions of this paper are (1) analytic and experimental
evaluation of existing index-based declustering techniques and their
extensions for grid files, and (2) development of a proximity-based
declustering algorithm called {\em minimax} which is experimentally
shown to scale and to consistently achieve better response time
compared to available algorithms while maintaining perfect disk distribution.
(Also cross-referenced as UMIACS-TR-96-4
Deferred Data-Flow Analysis : Algorithms, Proofs and Applications
Loss of precision due to the conservative nature of compile-time
dataflow analysis is a general problem and impacts a wide variety
of optimizations. We propose a limited form of runtime dataflow
analysis, called deferred dataflow analysis (DDFA), which
attempts to sharpen dataflow results by using control-flow
information that is available at runtime. The overheads of
runtime analysis are minimized by performing the bulk of the
analysis at compile-time and deferring only a summarized version
of the dataflow problem to runtime. Caching and reusing of
dataflow results reduces these overheads further.
DDFA is an interprocedural framework and can handle arbitrary
control structures including multi-way forks, recursion,
separately compiled functions and higher-order functions. It is
primarily targeted towards optimization of heavy-weight
operations such as communication calls, where one can expect
significant benefits from sharper dataflow analysis. We outline
how DDFA can be used to optimize different kinds of heavy-weight
operations such as bulk-prefetching on distributed systems and
dynamic linking in mobile programs. We prove that DDFA is safe
and that it yields better dataflow information than strictly
compile-time dataflow analysis. (Also cross-referenced as UMIACS-TR-98-46
A Customizable Simulator for Workstation Networks
We present a customizable simulator called netsim for
high-performance point-to-point workstation networks that is accurate
enough to be used for application-level performance analysis yet is
easy enough to customize for multiple architectures and software
configurations. Customization is accomplished without using any
proprietary information, using only publicly available hardware
specifications and information that can be readily determined using a
suite of test programs. We customized netsim for two platforms: a
16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha
Farm with an ATM switch. We show that netsim successfully models these
two architectures with a 2-6% error on the SP-2 and a 10% error on the
Alpha Farm for most test cases. It achieves this accuracy at the cost
of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8
fold slowdown with respect to the Alpha Farm. In addition, we show
that the cross-traffic congestion for today's high-speed
point-to-point networks has little, if any, effect on
application-level performance and that modeling end-point congestion
is sufficient for a reasonably accurate simulation.
(Also cross-referenced as UMIACS-TR-96-68
T2: A Customizable Parallel Database For Multi-dimensional Data
As computational power and storage capacity increase, processing and
analyzing large volumes of multi-dimensional datasets play an increasingly
important part in many domains of scientific research.
Several database research groups and vendors have developed
object-relational
database systems to provide some support for managing and/or visualizing
multi-dimensional datasets.
These systems, however, provide little or
no support for analyzing or processing these datasets -- the
assumption is that this is too application-specific to warrant common
support. As a result, applications that process these datasets are
analyzing large volumes of multi-dimensional datasets play an increasingly
important part in many domains of scientific research.
Several database research groups and vendors have developed
object-relational
database systems to provide some support for managing and/or visualizing
multi-dimensional datasets.
These systems, however, provide little or
no support for analyzing or processing these datasets -- the
assumption is that this is too application-specific to warrant common
support. As a result, applications that process these datasets are
usually decoupled from data storage and management, resulting in
inefficiency due to copying and loss of locality. Furthermore, every
application developer has to implement complex support for managing
and scheduling the processing.
Our study of a large set of scientific applications over the past three
years
indicates that the processing for such datasets
is often highly stylized and shares several important characteristics.
Usually, both the input dataset as
well as the result being computed have underlying multi-dimensional
grids. The basic processing step usually consists of transforming
individual input items, mapping the transformed items to the output
grid and computing output items by aggregating, in some way, all the
transformed input items mapped to the corresponding grid point.
In this paper,
we present the design of T2, a customizable parallel database
that integrates storage, retrieval and processing of multi-dimensional
datasets. T2 provides support for common operations including
index generation, data retrieval, memory management, scheduling of
processing across a parallel machine and user interaction. It
achieves its primary advantage from the ability to seamlessly
integrate data retrieval and processing for a wide variety of
applications and from the ability to maintain and jointly process
multiple datasets with different underlying grids.
(Also cross-referenced as UMIACS-TR-98-04
- …