YinYangRAN: Resource Multiplexing in GPU-Accelerated Virtualized RANs

Abstract

RAN virtualization is revolutionizing the telco industry, enabling 5G Distributed Units to run using general-purpose platforms equipped with Hardware Accelerators (HAs). Recently, GPUs have been proposed as HAs, hinging on their unique capability to execute 5G PHY operations efficiently while also processing Machine Learning (ML) workloads. While this ambivalence makes GPUs attractive for cost-effective deployments, we experimentally demonstrate that multiplexing 5G and ML workloads in GPUs is in fact challenging, and that using conventional GPU-sharing methods can severely disrupt 5G operations. We then introduce YinYangRAN, an innovative O-RAN-compliant solution that supervises GPU-based HAs so as to ensure reliability in the 5G processing pipeline while maximizing the throughput of concurrent ML services. YinYangRAN performs GPU resource allocation decisions via a computationally-efficient approximate dynamic programming technique, which is informed by a neural network trained on real-world measurements. Using workloads collected in real RANs, we demonstrate that YinYangRAN can achieve over 50% higher 5G processing reliability than conventional GPU sharing models with minimal impact on co-located ML workloads. To our knowledge, this is the first work identifying and addressing the complex problem of HA management in emerging GPU-accelerated vRANs, and represents a promising step towards multiplexing PHY and ML workloads in mobile networks.

Publication
In International Conference on Computer Communications, IEEE.
Date
Links