GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm.

To this end, we propose GMI-DRL, the first systematic design to scale multi-GPU DRL performance via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns, and an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL/Horovod support in training throughput (up to 2.34x) and GPU utilization (up to 40.8% improvement) on the DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.

Reference Work

  • GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing [paper]