Rendering photorealistic and dynamically moving human heads is crucial for
ensuring a pleasant and immersive experience in AR/VR and video conferencing
applications. However, existing methods often struggle to model challenging
facial regions (e.g., mouth interior, eyes, hair/beard), resulting in
unrealistic and blurry results. In this paper, we propose {\fullname}
({\name}), a method that adopts the neural point representation as well as the
neural volume rendering process and discards the predefined connectivity and
hard correspondence imposed by mesh-based approaches. Specifically, the neural
points are strategically constrained around the surface of the target
expression via a high-resolution UV displacement map, achieving increased
modeling capacity and more accurate control. We introduce three technical
innovations to improve the rendering and training efficiency: a patch-wise
depth-guided (shading point) sampling strategy, a lightweight radiance decoding
process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By
design, our {\name} is better equipped to handle topologically changing regions
and thin structures while also ensuring accurate expression control when
animating avatars. Experiments conducted on three subjects from the Multiface
dataset demonstrate the effectiveness of our designs, outperforming previous
state-of-the-art methods, especially in handling challenging facial regions