238,891 research outputs found
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
Analyzing collaborative learning processes automatically
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in
Globally Optimal Energy-Efficient Power Control and Receiver Design in Wireless Networks
The characterization of the global maximum of energy efficiency (EE) problems
in wireless networks is a challenging problem due to the non-convex nature of
investigated problems in interference channels. The aim of this work is to
develop a new and general framework to achieve globally optimal solutions.
First, the hidden monotonic structure of the most common EE maximization
problems is exploited jointly with fractional programming theory to obtain
globally optimal solutions with exponential complexity in the number of network
links. To overcome this issue, we also propose a framework to compute
suboptimal power control strategies characterized by affordable complexity.
This is achieved by merging fractional programming and sequential optimization.
The proposed monotonic framework is used to shed light on the ultimate
performance of wireless networks in terms of EE and also to benchmark the
performance of the lower-complexity framework based on sequential programming.
Numerical evidence is provided to show that the sequential fractional
programming framework achieves global optimality in several practical
communication scenarios.Comment: Accepted for publication in the IEEE Transactions on Signal
Processin
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Biomedical taxonomies, thesauri and ontologies in the form of the
International Classification of Diseases (ICD) as a taxonomy or the National
Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in
acquiring, representing and processing information about human health. With
increasing adoption and relevance, biomedical ontologies have also
significantly increased in size. For example, the 11th revision of the ICD,
which is currently under active development by the WHO contains nearly 50,000
classes representing a vast variety of different diseases and causes of death.
This evolution in terms of size was accompanied by an evolution in the way
ontologies are engineered. Because no single individual has the expertise to
develop such large-scale ontologies, ontology-engineering projects have evolved
from small-scale efforts involving just a few domain experts to large-scale
projects that require effective collaboration between dozens or even hundreds
of experts, practitioners and other stakeholders. Understanding how these
stakeholders collaborate will enable us to improve editing environments that
support such collaborations. We uncover how large ontology-engineering
projects, such as the ICD in its 11th revision, unfold by analyzing usage logs
of five different biomedical ontology-engineering projects of varying sizes and
scopes using Markov chains. We discover intriguing interaction patterns (e.g.,
which properties users subsequently change) that suggest that large
collaborative ontology-engineering projects are governed by a few general
principles that determine and drive development. From our analysis, we identify
commonalities and differences between different projects that have implications
for project managers, ontology editors, developers and contributors working on
collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic
- …