Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity


In this paper, we address the problem of jointly determining the energy bid submitted to the wholesale electricity market (WEM) and the energy price charged in the retailed electricity market (REM) for a load serving entity (LSE). The joint bidding and pricing problem is formulated as a Markov decision process (MDP) with continuous state and action spaces, in which the energy bid and the energy price are two actions that share a common objective. We apply the deep deterministic policy gradient (DDPG) algorithm to solve this MDP for the optimal bidding and pricing policies. Yet, the DDPG algorithm typically requires a significant number of state transition samples, which is costly in this application. To this end, we apply neural networks to learn dynamical bid and price response functions from historical data to model the WEM and the collective behavior of the EUCs, respectively. These response functions explicitly capture the inter-temporal correlations of the WEM clearing results and the EUC responses, and can be utilized to generate state transition samples without any cost. More importantly, the response functions also inform the choice of states in the MDP formulation. Numerical simulations illustrated the effectiveness of the proposed methodology.