11,448 research outputs found
Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages
Arrays are such a rich and fundamental data type that they tend to be built
into a language, either in the compiler or in a large low-level library.
Defining this functionality at the user level instead provides greater
flexibility for application domains not envisioned by the language designer.
Only a few languages, such as C++ and Haskell, provide the necessary power to
define -dimensional arrays, but these systems rely on compile-time
abstraction, sacrificing some flexibility. In contrast, dynamic languages make
it straightforward for the user to define any behavior they might want, but at
the possible expense of performance.
As part of the Julia language project, we have developed an approach that
yields a novel trade-off between flexibility and compile-time analysis. The
core abstraction we use is multiple dispatch. We have come to believe that
while multiple dispatch has not been especially popular in most kinds of
programming, technical computing is its killer application. By expressing key
functions such as array indexing using multi-method signatures, a surprising
range of behaviors can be obtained, in a way that is both relatively easy to
write and amenable to compiler analysis. The compact factoring of concerns
provided by these methods makes it easier for user-defined types to behave
consistently with types in the standard library.Comment: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June
11, 2014, Edinburgh, United Kingdo
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information
Search engines are the prevalently used tools to collect information about
individuals on the Internet. Search results typically comprise a variety of
sources that contain personal information -- either intentionally released by
the person herself, or unintentionally leaked or published by third parties,
often with detrimental effects on the individual's privacy. To grant
individuals the ability to regain control over their disseminated personal
information, the European Court of Justice recently ruled that EU citizens have
a right to be forgotten in the sense that indexing systems, must offer them
technical means to request removal of links from search results that point to
sources violating their data protection rights. As of now, these technical
means consist of a web form that requires a user to manually identify all
relevant links upfront and to insert them into the web form, followed by a
manual evaluation by employees of the indexing system to assess if the request
is eligible and lawful.
We propose a universal framework Oblivion to support the automation of the
right to be forgotten in a scalable, provable and privacy-preserving manner.
First, Oblivion enables a user to automatically find and tag her disseminated
personal information using natural language processing and image recognition
techniques and file a request in a privacy-preserving manner. Second, Oblivion
provides indexing systems with an automated and provable eligibility mechanism,
asserting that the author of a request is indeed affected by an online
resource. The automated ligibility proof ensures censorship-resistance so that
only legitimately affected individuals can request the removal of corresponding
links from search results. We have conducted comprehensive evaluations, showing
that Oblivion is capable of handling 278 removal requests per second, and is
hence suitable for large-scale deployment
dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors
The R functions .C() and .Fortran() can be used to call compiled C/C++ and
Fortran code from R. This so-called foreign function interface is convenient,
since it does not require any interactions with the C API of R. However, it
does not support long vectors (i.e., vectors of more than 2^31 elements). To
overcome this limitation, the R package dotCall64 provides .C64(), which can be
used to call compiled C/C++ and Fortran functions. It transparently supports
long vectors and does the necessary castings to pass numeric R vectors to
64-bit integer arguments of the compiled code. Moreover, .C64() features a
mechanism to avoid unnecessary copies of function arguments, making it
efficient in terms of speed and memory usage.Comment: 17 page
- …