485,006 research outputs found
Variation in How Cognitive Control Modulates Sentence Processing
Prior research suggests that cognitive control can assist the comprehension of sentences that create conflict between interpretations, at least under some circumstances. However, the mixed pattern of results suggests that cognitive control may not always be necessary for accurate comprehension. We tested whether cognitive control recruitment for language processing is systematically variable, depending on the type of sentential ambiguity or conflict, individual differences in cognitive control, and task demands. Participants completed two sessions in a web-based experiment. The first session tested conflict modulation using interleaved Stroop and sentence comprehension trials. Critical sentences contained syntax-semantics or phrase-attachment conflict. In the second session, participants completed three cognitive control and three working memory tasks. Exploratory factor analysis was used to index individual differences in a cognitive control factor and a working memory factor. At the group level, there were no significant conflict modulation effects for either syntax-semantics or phrase-attachment conflict. At the individual differences level, the cognitive control factor correlated with offline comprehension accuracy but not online processing measures for both types of conflict. Together, the results suggest that the role of cognitive control in sentence processing may vary according to task demands. When overt decisions are required, individual differences in cognitive control may matter such that better cognitive control results in better language comprehension performance. The results add to the mixed evidence on conflict modulation and raise questions about the situations under which cognitive control influences online processing
Fast, accurate and flexible data locality analysis
This paper presents a tool based on a new approach for analyzing the locality exhibited by data memory references. The tool is very fast because it is based on a static locality analysis enhanced with very simple profiling information, which results in a negligible slowdown. This feature allows the tool to be used for highly time-consuming applications and to include it as a step in a typical iterative analysis-optimization process. The tool can provide a detailed evaluation of the reuse exhibited by a program, quantifying and qualifying the different types of misses either globally or detailed by program sections, data structures, memory instructions, etc. The accuracy of the tool is validated by comparing its results with those provided by a simulator.Peer ReviewedPostprint (published version
Improvements in Hardware Transactional Memory for GPU Architectures
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level.
After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
An Efficient Multiway Mergesort for GPU Architectures
Sorting is a primitive operation that is a building block for countless
algorithms. As such, it is important to design sorting algorithms that approach
peak performance on a range of hardware architectures. Graphics Processing
Units (GPUs) are particularly attractive architectures as they provides massive
parallelism and computing power. However, the intricacies of their compute and
memory hierarchies make designing GPU-efficient algorithms challenging. In this
work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway
mergesort algorithm. MMS employs a new partitioning technique that exposes the
parallelism needed by modern GPU architectures. To the best of our knowledge,
MMS is the first sorting algorithm for the GPU that is asymptotically optimal
in terms of global memory accesses and that is completely free of shared memory
bank conflicts.
We realize an initial implementation of MMS, evaluate its performance on
three modern GPU architectures, and compare it to competitive implementations
available in state-of-the-art GPU libraries. Despite these implementations
being highly optimized, MMS compares favorably, achieving performance
improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art
algorithms are susceptible to bank conflicts. We find that for certain inputs
that cause these algorithms to incur large numbers of bank conflicts, MMS can
achieve up to a 37.6% speedup over its fastest competitor. Overall, even though
its current implementation is not fully optimized, due to its efficient use of
the memory hierarchy, MMS outperforms the fastest comparison-based sorting
implementations available to date
Static locality analysis for cache management
Most memory references in numerical codes correspond to array references whose indices are affine functions of surrounding loop indices. These array references follow a regular predictable memory pattern that can be analysed at compile time. This analysis can provide valuable information like the locality exhibited by the program, which can be used to implement more intelligent caching strategy. In this paper we propose a static locality analysis oriented to the management of data caches. We show that previous proposals on locality analysis are not appropriate when the proposals have a high conflict miss ratio. This paper examines those proposals by introducing a compile-time interference analysis that significantly improve the performance of them. We first show how this analysis can be used to characterize the dynamic locality properties of numerical codes. This evaluation show for instance that a large percentage of references exhibit any type of locality. This motivates the use of a dual data cache, which has a module specialized to exploit temporal locality, and a selective cache respectively. Then, the performance provided by these two cache organizations is evaluated. In both organizations, the static locality analysis is responsible for tagging each memory instruction accordingly to the particular type(s) of locality that it exhibits.Peer ReviewedPostprint (published version
Music and Reconciliation in Colombia: Opportunities and Limitations of Songs Composed by Victims
This is the final version of the article. Available from the publisher via the link in this record.Colombia is a war-torn society where an important number of conflict-related songs have been composed by victims at the grassroots level. In order to develop a better understanding of the scope of music as a tool for reconciliation, this paper examines some of these songs and analyzes the extent to which this music may or may not contribute to reconciliation in both, the audience and the composers. To do so, semi-structured interviews were conducted with the composers, and a focal group exercise with ex-combatants was organized in order to analyze the impact of these songs on the listeners.
The results of the analysis indicate that these songs entail opportunities but also limitations regarding reconciliation. On one hand, they have constituted storytelling tools that contribute to the historical memory of the conflict in Colombia in a way that is accessible for all types of public. In addition, the process of composition by victims and the musical activity itself embody an outlet through which composers release feelings and redefine identities. Moreover, in an audience made up ex-combatants there were some expressions of sympathy, understanding, and trust. However, the research shows contrary effects as well. The content of some songs may incite revenge, reinforce stereotypes and mistrust, and enlarge differences between the sides instead of reducing the distances. The results indicate that music may embody several opportunities but also limitations as a tool for reconciliation.The Japanese International Cooperation Agency and the “Tokyo University of Foreign Studies’
Advanced Training Programme for International Cooperation through Internship and Field
Research” are acknowledged for their kind financial assistance, which funded my graduate
studies and my field research
The role of executive control in resolving grammatical number conflict in sentence comprehension
In sentences with a complex subject noun phrase, like “The key to the cabinets is lost”, the grammatical number of the head noun (key) may be the same or different from the modifier noun phrase (cabinets). When the number is the same, comprehension is usually easier than when it is different. Grammatical number computation may occur while processing the modifier noun (integration phase) or while processing the verb (checking phase). We investigated at which phase number conflict and plausibility of the modifier noun as subject for the verb affect processing, and we imposed a gaze-contingent tone discrimination task in either phase to test whether number computation involves executive control. At both phases, gaze durations were longer when a concurrent tone task was present. Additionally, at the integration phase, gaze durations were longer under number conflict, and this effect was enhanced by the presence of a tone task, whereas no effects of plausibility of the modifier were observed. The finding that the effect of number match was larger under load shows that computation of the grammatical number of the complex noun phrase requires executive control in the integration phase, but not in the checking phase
The Parallel Persistent Memory Model
We consider a parallel computational model that consists of processors,
each with a fast local ephemeral memory of limited size, and sharing a large
persistent memory. The model allows for each processor to fault with bounded
probability, and possibly restart. On faulting all processor state and local
ephemeral memory are lost, but the persistent memory remains. This model is
motivated by upcoming non-volatile memories that are as fast as existing random
access memory, are accessible at the granularity of cache lines, and have the
capability of surviving power outages. It is further motivated by the
observation that in large parallel systems, failure of processors and their
caches is not unusual.
Within the model we develop a framework for developing locality efficient
parallel algorithms that are resilient to failures. There are several
challenges, including the need to recover from failures, the desire to do this
in an asynchronous setting (i.e., not blocking other processors when one
fails), and the need for synchronization primitives that are robust to
failures. We describe approaches to solve these challenges based on breaking
computations into what we call capsules, which have certain properties, and
developing a work-stealing scheduler that functions properly within the context
of failures. The scheduler guarantees a time bound of in expectation, where and are the work and
depth of the computation (in the absence of failures), is the average
number of processors available during the computation, and is the
probability that a capsule fails. Within the model and using the proposed
methods, we develop efficient algorithms for parallel sorting and other
primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same
nam
Brain Modularity Mediates the Relation between Task Complexity and Performance
Recent work in cognitive neuroscience has focused on analyzing the brain as a
network, rather than as a collection of independent regions. Prior studies
taking this approach have found that individual differences in the degree of
modularity of the brain network relate to performance on cognitive tasks.
However, inconsistent results concerning the direction of this relationship
have been obtained, with some tasks showing better performance as modularity
increases and other tasks showing worse performance. A recent theoretical model
(Chen & Deem, 2015) suggests that these inconsistencies may be explained on the
grounds that high-modularity networks favor performance on simple tasks whereas
low-modularity networks favor performance on more complex tasks. The current
study tests these predictions by relating modularity from resting-state fMRI to
performance on a set of simple and complex behavioral tasks. Complex and simple
tasks were defined on the basis of whether they did or did not draw on
executive attention. Consistent with predictions, we found a negative
correlation between individuals' modularity and their performance on a
composite measure combining scores from the complex tasks but a positive
correlation with performance on a composite measure combining scores from the
simple tasks. These results and theory presented here provide a framework for
linking measures of whole brain organization from network neuroscience to
cognitive processing.Comment: 47 pages; 4 figure
- …