8,211 research outputs found
Instance and Output Optimal Parallel Algorithms for Acyclic Joins
Massively parallel join algorithms have received much attention in recent
years, while most prior work has focused on worst-optimal algorithms. However,
the worst-case optimality of these join algorithms relies on hard instances
having very large output sizes, which rarely appear in practice. A stronger
notion of optimality is {\em output-optimal}, which requires an algorithm to be
optimal within the class of all instances sharing the same input and output
size. An even stronger optimality is {\em instance-optimal}, i.e., the
algorithm is optimal on every single instance, but this may not always be
achievable.
In the traditional RAM model of computation, the classical Yannakakis
algorithm is instance-optimal on any acyclic join. But in the massively
parallel computation (MPC) model, the situation becomes much more complicated.
We first show that for the class of r-hierarchical joins, instance-optimality
can still be achieved in the MPC model. Then, we give a new MPC algorithm for
an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT}
\over p}), where \IN,\OUT are the input and output sizes of the join, and
is the number of servers in the MPC model. This improves the MPC version of
the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor.
Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN),
for every acyclic but non-r-hierarchical join. Finally, we give the first
output-sensitive lower bound for the triangle join in the MPC model, showing
that it is inherently more difficult than acyclic joins
The association between resilience and survival among Chinese elderly
Based on the unique longitudinal data of the elderly aged 65+ with a sufficiently large sub-sample of the oldest-old aged 85+ from the Chinese Longitudinal Healthy Longevity Survey, we construct a resilience scale with 7 indicators for the Chinese elderly, based on the framework of the Connor-Davidson Resilience Scale. Cox proportional hazards regression model estimates show that, after controlling for socio-demographic characteristics and initial health status, the total resilience score and most factors of the resilience scale are significantly associated with reduced mortality risk among the young-old and oldest-old. Although the causal mechanisms remain to be investigated, effective measures to promote resilience are likely to have a positive effect on longevity of the elderly in China.China, healthy life expectancy, mortality risk, residence, survival
Join Algorithms: From External Memory to the BSP
Database systems have been traditionally disk-based, which had motivated the extensive study on external memory (EM) algorithms. However, as RAMs continue to get larger and cheaper, modern distributed data systems are increasingly adopting a main memory based, shared-nothing architecture, exemplified by systems like Spark and Flink. These systems can be abstracted by the BSP model (with variants like the MPC model and the MapReduce model), and there has been a strong revived interest in designing BSP algorithms for handling large amounts of data.
With hard disks starting to fade away from the picture, EM algorithms may now seem less relevant. However, we observe that many of the recently developed join algorithms under the BSP model have a high degree of resemblance with their counterparts in the EM model. In this talk, I will present some recent results on join algorithms in the EM and BSP model, examine their relationships, and discuss a general theoretical framework for converting EM algorithms to
the BSP
The Effect of Firm-specific Factors on Firms' Decisions to Invest in Exploration and Exploitation
Prior theoretical and empirical research emphasizes the importance of allocating investment between exploratory and exploitative R&D (March, 1991; Mudambi & Swift, 2014). However, the firm-specific factors that determine exploratory and exploitative R&D investment have remained largely unexplored. We attempt to address this research gap by examining the effects of inter-organizational relationships (innovation collaboration and external information sourcing), R&D personnel educational level and internationalization statuses (exporting and geographic scope) on firm investment in exploratory and exploitative R&D.
Building on the organizational learning theory, we argue that different firm-specific factors generate different effects on firm investment in exploratory and exploitative R&D because they stimulate different learning mechanisms. We empirically test the model by using panel data on more than 4000 firms from Technological Innovation Panel, which is a Community Innovation Survey-based data, for the period 2006-2011. Our findings show that the influence of a determinant on exploratory R&D investment may be different from its influence on exploitative R&D investment, and the determinants of exploratory R&D investment may differ from the determinants of exploitative R&D investment. These findings stress on the need for future research to be careful in extrapolating conclusions from analysis that studies a specific type of R&D investment into studies that analyze on another type of R&D investment or into studies that analyze on the overall R&D investment. The study contributes to organizational learning theory by identifying direct factors and moderators that facilitate firm investment in activities of organizational learning
Clustering with diversity
We consider the {\em clustering with diversity} problem: given a set of
colored points in a metric space, partition them into clusters such that each
cluster has at least points, all of which have distinct colors.
We give a 2-approximation to this problem for any when the objective
is to minimize the maximum radius of any cluster. We show that the
approximation ratio is optimal unless , by providing a matching
lower bound. Several extensions to our algorithm have also been developed for
handling outliers. This problem is mainly motivated by applications in
privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation
algorithm, k-center, k-anonymity, l-diversit
Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks
We show that randomization can lead to significant improvements for a few
fundamental problems in distributed tracking. Our basis is the {\em
count-tracking} problem, where there are players, each holding a counter
that gets incremented over time, and the goal is to track an
\eps-approximation of their sum continuously at all times,
using minimum communication. While the deterministic communication complexity
of the problem is \Theta(k/\eps \cdot \log N), where is the final value
of when the tracking finishes, we show that with randomization, the
communication cost can be reduced to \Theta(\sqrt{k}/\eps \cdot \log N). Our
algorithm is simple and uses only O(1) space at each player, while the lower
bound holds even assuming each player has infinite computing power. Then, we
extend our techniques to two related distributed tracking problems: {\em
frequency-tracking} and {\em rank-tracking}, and obtain similar improvements
over previous deterministic algorithms. Both problems are of central importance
in large data monitoring and analysis, and have been extensively studied in the
literature.Comment: 19 pages, 1 figur
- …