In the context of smart environments, crafting re- mote monitoring systems that are efficient, cost-effective, user- friendly, and respectful of privacy is crucial for many scenar- ios. Recognizing and tracing individuals via markerless motion capture systems in multi-person settings poses challenges due to obstructions, varying light conditions, and intricate interactions among subjects. Nevertheless, methods based on data gathered by Inertial Measurement Units (IMUs) located in wearables grapple with other issues, including the precision of the sensors and their optimal placement on the body. We then argue that more accurate results can be achieved by mixing human pose estimation (HPE) techniques with information collected by wearables. Thus, this paper introduces a real-time platform to track and identify per- sons by fusing HPE and IMU data. It exploits a matching model that consists of two synergistic components: the first employs a geometric approach, correlating orientation, acceleration, and velocity readings from the input sources, while the second utilizes a Convolutional Neural Network (CNN) to yield a correlation coefficient for each HPE and IMU data pair. The proposed platform achieves promising results in tracking and identification, with an accuracy rate of 96.9%