25 research outputs found
Painolliset äärellistilaiset menetelmät oikaisulukuun
This dissertation is a large-scale study of spell-checking and correction using finite-state technology. Finite-state spell-checking is a key method for handling morphologically complex languages in a computationally efficient manner. This dissertation discusses the technological and practical considerations that are required for finite-state spell-checkers to be at the same level as state-of-the-art non-finite-state spell-checkers.
Three aspects of spell-checking are considered in the thesis: modelling of correctly written words and word-forms with finite-state language models, applying statistical information to finite-state language models with a specific focus on morphologically complex languages, and modelling misspellings and typing errors using finite-state automata-based error models.
The usability of finite-state spell-checkers as a viable alternative to traditional non-finite-state solutions is demonstrated in a large-scale evaluation of spell-checking speed and the quality using languages with morphologically different natures. The selected languages display a full range of typological complexity, from isolating English to polysynthetic Greenlandic with agglutinative Finnish and the Saami languages somewhere in between.Tässä väitöskirjassa tutkin äärellistilaisten menetelmien käyttöä oikaisuluvussa. Äärellistilaiset menetelmät mahdollistavat sananmuodostukseltaan monimutkaisempien kielten, kuten suomen tai grönlannin, sanaston sujuvan käsittelyn oikaisulukusovelluksissa. Käsittelen tutkielmassani tieteellisiä ja käytännöllisiä toteutuksia, jotka ovat tarpeen, jotta tällaisia sananmuodostukseltaan monimutkallisempia kieliä voisi käsitellä oikaisuluvussa yhtä tehokkaasti kuin yksinkertaisempia kieliä, kuten englantia tai muita indo-eurooppalaisia kieliä nyt käsitellään.
Tutkielmassa esitellään kolme keskeistä tutkimusongelmaa, jotka koskevat oikaisuluvun toteuttamista sanarakenteeltaan monimutkaisemmille kielille: miten mallintaa oikeinkirjoitetut sanamuodot äärellistilaisin mallein, miten soveltaa tilastollista mallinnusta monimutkaisiin sanarakenteisiin kuten yhdyssanoihin, ja miten mallintaa kirjoitusvirheitä äärellistilaisin mentelmin.
Tutkielman tuloksena esitän äärellistilaisia oikaisulukumenetelmiä soveltuvana vaihtoehtona nykyisille oikaisulukimille, tämän todisteena esitän mittaustuloksia, jotka näyttävät, että käyttämäni menetelmät toimivat niin rakenteellisesti yksinkertaisille kielille kuten englannille yhtä hyvin kuin nykyiset menetelmät että rakenteellisesti monimutkaisemmille kielille kuten suomelle, saamelle ja jopa grönlannille riittävän hyvin tullakseen käytetyksi tyypillisissä oikaisulukimissa
Generic programming in Scala
Generic programming is a programming methodology that aims at producing
reusable code, defined independently of the data types on which it is operating. To
achieve this goal, that particular code must rely on a set of requirements known
as concepts. The code is only functional for the set of data types that fulfill the
conditions established by its concepts. Generic programming can facilitate code reuse
and reduce testing.
Generic programming has been embraced mostly in the C++ community; major
parts of the C++ standard library have been developed following the paradigm. This
thesis is based on a study (by Garcia et al.) on generic programming applied to other
languages (C#, Eiffel, Haskell, Java and ML). That study demonstrated that those
languages are lacking in their support for generic programming, causing diffculties
to the programmer.
In this context, we investigate the new object-oriented language Scala. This
particular language appealed to our interest because it implements "member types"
which we conjecture to fix some of the problems of the languages surveyed in the
original study. Our research shows that Scala's member types are an expressive
language feature and solve some but not all of the problems identified in the original
study (by Garcia et al.).
Scala's members types did not resolve the problem of adding associated types to
the parameter list of generic methods. This issue led to repeated constraints, implicit instantiation failure and code verbosity increase. However, Scala's member types
enabled constraint propagation and type aliasing, two significantly useful generic
programming mechanisms
Minkowski Sum Construction and other Applications of Arrangements of Geodesic Arcs on the Sphere
We present two exact implementations of efficient output-sensitive algorithms
that compute Minkowski sums of two convex polyhedra in 3D. We do not assume
general position. Namely, we handle degenerate input, and produce exact
results. We provide a tight bound on the exact maximum complexity of Minkowski
sums of polytopes in 3D in terms of the number of facets of the summand
polytopes. The algorithms employ variants of a data structure that represents
arrangements embedded on two-dimensional parametric surfaces in 3D, and they
make use of many operations applied to arrangements in these representations.
We have developed software components that support the arrangement
data-structure variants and the operations applied to them. These software
components are generic, as they can be instantiated with any number type.
However, our algorithms require only (exact) rational arithmetic. These
software components together with exact rational-arithmetic enable a robust,
efficient, and elegant implementation of the Minkowski-sum constructions and
the related applications. These software components are provided through a
package of the Computational Geometry Algorithm Library (CGAL) called
Arrangement_on_surface_2. We also present exact implementations of other
applications that exploit arrangements of arcs of great circles embedded on the
sphere. We use them as basic blocks in an exact implementation of an efficient
algorithm that partitions an assembly of polyhedra in 3D with two hands using
infinite translations. This application distinctly shows the importance of
exact computation, as imprecise computation might result with dismissal of
valid partitioning-motions.Comment: A Ph.D. thesis carried out at the Tel-Aviv university. 134 pages
long. The advisor was Prof. Dan Halperi
Doctor of Philosophy
dissertationNetwork emulation has become an indispensable tool for the conduct of research in networking and distributed systems. It offers more realism than simulation and more control and repeatability than experimentation on a live network. However, emulation testbeds face a number of challenges, most prominently realism and scale. Because emulation allows the creation of arbitrary networks exhibiting a wide range of conditions, there is no guarantee that emulated topologies reflect real networks; the burden of selecting parameters to create a realistic environment is on the experimenter. While there are a number of techniques for measuring the end-to-end properties of real networks, directly importing such properties into an emulation has been a challenge. Similarly, while there exist numerous models for creating realistic network topologies, the lack of addresses on these generated topologies has been a barrier to using them in emulators. Once an experimenter obtains a suitable topology, that topology must be mapped onto the physical resources of the testbed so that it can be instantiated. A number of restrictions make this an interesting problem: testbeds typically have heterogeneous hardware, scarce resources which must be conserved, and bottlenecks that must not be overused. User requests for particular types of nodes or links must also be met. In light of these constraints, the network testbed mapping problem is NP-hard. Though the complexity of the problem increases rapidly with the size of the experimenter's topology and the size of the physical network, the runtime of the mapper must not; long mapping times can hinder the usability of the testbed. This dissertation makes three contributions towards improving realism and scale in emulation testbeds. First, it meets the need for realistic network conditions by creating Flexlab, a hybrid environment that couples an emulation testbed with a live-network testbed, inheriting strengths from each. Second, it attends to the need for realistic topologies by presenting a set of algorithms for automatically annotating generated topologies with realistic IP addresses. Third, it presents a mapper, assign, that is capable of assigning experimenters' requested topologies to testbeds' physical resources in a manner that scales well enough to handle large environments
Static Computation and Reflection
Thesis (PhD) - Indiana University, Computer Sciences, 2008Most programming languages do not allow programs to inspect their
static type information or perform computations on it. C++, however,
lets programmers write template metaprograms, which enable programs to
encode static information, perform compile-time computations,
and make static decisions about run-time behavior. Many C++ libraries
and applications use template metaprogramming to build specialized
abstraction mechanisms, implement domain-specific safety checks, and
improve run-time performance.
Template metaprogramming is an emergent capability of the C++ type
system, and the C++ language specification is informal and imprecise.
As a result, template metaprogramming often involves heroic
programming feats and often leads to code that is difficult to read and
maintain. Furthermore, many template-based code generation and
optimization techniques rely on particular compiler implementations,
rather than language semantics, for performance gains.
Motivated by the capabilities and techniques of C++ template
metaprogramming, this thesis documents some common programming patterns,
including static computation, type analysis, generative programming, and the
encoding of domain-specific static checks. It also documents notable
shortcomings to current practice, including limited support for reflection,
semantic ambiguity, and other issues that arise from the pioneering nature of
template metaprogramming. Finally, this thesis presents the design of a
foundational programming language, motivated by the analysis of template
metaprogramming, that allows programs to statically inspect type information,
perform computations, and generate code. The language is specified as a core
calculus and its capabilities are presented in an idealized setting
Structural Graph-based Metamodel Matching
Data integration has been, and still is, a challenge for applications processing multiple heterogeneous data sources. Across the domains of schemas, ontologies, and metamodels, this imposes the need for mapping specifications, i.e. the task of discovering semantic correspondences between elements. Support for the development of such mappings has been researched, producing matching systems that automatically propose mapping suggestions.
However, especially in the context of metamodel matching the result quality of state of the art matching techniques leaves room for improvement. Although the traditional approach of pair-wise element comparison works on smaller data sets, its quadratic complexity leads to poor runtime and memory performance and eventually to the inability to match, when applied on real-world data.
The work presented in this thesis seeks to address these shortcomings. Thereby, we take advantage of the graph structure of metamodels. Consequently, we derive a planar graph edit distance as metamodel similarity metric and mining-based matching to make use of redundant information. We also propose a planar graph-based partitioning to cope with large-scale matching. These techniques are then evaluated using real-world mappings from SAP business integration scenarios and the MDA community. The results demonstrate improvement in quality and managed runtime and memory consumption for large-scale metamodel matching
High-performance direct solution of finite element problems on multi-core processors
A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.Ph.D.Committee Chair: Will, Kenneth; Committee Member: Emkin, Leroy; Committee Member: Kurc, Ozgur; Committee Member: Vuduc, Richard; Committee Member: White, Donal