The high-performance, highly parallel, fully programmable modern Graphics Processing Unit's high memory bandwidth, computing power, excellent peak throughput, and energy efficiency brings acceleration to regular applications that have extensive data parallelism, regular memory access patterns, and modest synchronizations. However, for graph analytics, the inherent irregularity of graph data structures leads to irregularity in data access and control flow, making efficient graph analytics on GPUs a significant challenge. Despite some promising specialized GPU graph algorithm implementations, parallel graph analytics on the GPU in general still faces two major challenges. The first is the programmability gap between low-level implementations of specific graph primitives and a general graph processing system. Programming graph algorithms on GPUs is difficult even for the most skilled programmers. Specialized GPU graph algorithm implementations do not generalize well since they often couple a specific graph computation to a specific type of parallel graph operation. The second is the lack of a GPU-specific graph processing programming model. High-level GPU programming models for graph analytics often recapitulate CPU programming models and do not compare favorably in performance with specialized implementations due to different kinds of overhead introduced by maintaining a high-level framework. This dissertation seeks to resolve the conflict of programmability and performance for graph analytics on the GPU by designing a GPU-specific graph processing programming model and building a graph analytics system on the GPU that not only allows quick prototyping of new graph primitives but also delivers the performance of customized, complex GPU hardwired graph primitives. To achieve this goal, we present a novel data-centric abstraction for graph operations that allows programmers to develop graph primitives at a high level of abstraction while simultaneously delivering high performance by incorporating several profitable optimizations, which previously were only applied to different individual graph algorithm implementations on the GPU, into the core of our implementation, including kernel fusion, push-pull traversal, idempotent traversal, priority queues, and various workload mapping strategies. We design and implement a new graph analytics system, Gunrock, which contains a set of simple and flexible graph operation APIs that can express a wide range of graph primitives at a high level of abstraction. Using Gunrock, we implement a large set of graph primitives, which span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. All of our graph primitives achieve comparable performance to their hardwired counterparts and significantly outperform previous programmable GPU abstractions