TR2020-003

Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity


    •  Xu, H., Sun, H., Nikovski, D.N., Kitamura, S., Mori, K., Hashimoto, H., "Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity", IEEE Transactions on smart grids, DOI: 10.1109/TSG.2019.2903756, Vol. 10, No. 6, pp. 6366-6375, January 2020.
      BibTeX TR2020-003 PDF
      • @article{Xu2020jan,
      • author = {Xu, Hanchen and Sun, Hongbo and Nikovski, Daniel N. and Kitamura, Shoichi and Mori, Kazuyuki and Hashimoto, Hiroyuki},
      • title = {Deep Reinforcement Learning for Joint Bidding and Pricing of Load Serving Entity},
      • journal = {IEEE Transactions on smart grids},
      • year = 2020,
      • volume = 10,
      • number = 6,
      • pages = {6366--6375},
      • month = jan,
      • doi = {10.1109/TSG.2019.2903756},
      • issn = {1949-3061},
      • url = {https://www.merl.com/publications/TR2020-003}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Data Analytics, Electric Systems, Machine Learning, Optimization

In this paper, we address the problem of jointly determining the energy bid submitted to the wholesale electricity market (WEM) and the energy price charged in the retailed electricity market (REM) for a load serving entity (LSE). The joint bidding and pricing problem is formulated as a Markov decision process (MDP) with continuous state and action spaces, in which the energy bid and the energy price are two actions that share a common objective. We apply the deep deterministic policy gradient (DDPG) algorithm to solve this MDP for the optimal bidding and pricing policies. Yet, the DDPG algorithm typically requires a significant number of state transition samples, which is costly in this application. To this end, we apply neural networks to learn dynamical bid and price response functions from historical data to model the WEM and the collective behavior of the EUCs, respectively. These response functions explicitly capture the inter-temporal correlations of the WEM clearing results and the EUC responses, and can be utilized to generate state transition samples without any cost. More importantly, the response functions also inform the choice of states in the MDP formulation. Numerical simulations illustrated the effectiveness of the proposed methodology.