Large-scale traffic control using autonomous vehicles and decentralized deep reinforcement learning

In this work, we introduce a scalable, decentralized deep reinforcement learning (RL) scheme for optimizing vehicle traffic consisting of both autonomous and human-driven vehicles. The control inputs to the system are the following distance and lane placement of the autonomous vehicles and the human-vehicles are uncontrolled. One point of novelty of the scheme is that it is trained on images of traffic, where pixels are colored based on how much they are occupied by humandriven or autonomous vehicles. Another point of novelty is in how the scheme achieves scalable decentralization; it does so by training multiple RL agents, each responsible for its own region of control, but able to at least partially observe neighboring agents’ regions of control. In this way, the scheme is infinitely scalable because an RL agent does not need to communicate with its neighbors, and can be applied in systems in which neighboring controllers are not RL-based. We perform a case study of two simulations on a two-lane oval highway in the Simulation of Urban MObility (SUMO) environment. In the first simulation, a single RL agent is applied so that its region of control coincides with an area of traffic congestion. In the second simulation, we apply multiple RL agents over the entire network. The results of the first simulation show that the single RL agent is able to improve traffic congestion. The results of the second simulation show that the decentralized RL scheme is able to achieve even better results.