TR2026-095

Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy


    •  Khan, N., Wichern, G., Laughman, C.R., "Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy", ICML Workshop on Reinforcement Learning from World Feedback (RLxF), July 2026.
      BibTeX TR2026-095 PDF
      • @inproceedings{Khan2026jul,
      • author = {Khan, Nibraas and Wichern, Gordon and Laughman, Christopher R.},
      • title = {{Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy}},
      • booktitle = {ICML Workshop on Reinforcement Learning from World Feedback (RLxF)},
      • year = 2026,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2026-095}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning

Abstract:

Neural Processes (NPs) provide a lightweight framework for uncertainty-aware regression by conditioning predictions on a compact context set of observed input-output examples in set- tings such as meta-regression, Bayesian optimization, and spatiotemporal prediction. In continuous learning settings, however, context selec- tion becomes an online memory problem: as new observations arrive, which examples should be retained? Since retaining every observation is intractable, bounded-memory implementations rely on fixed heuristics such as sliding windows, reservoir sampling, or surprise thresholds, each encoding a static memory prior. We introduce Reinforced Neural Processes (RNP), a backbone-agnostic memory framework that pairs a tiered context buffer with a gated two-branch encoder and learns an insertion/eviction policy from world feedback: the downstream predictive log-likelihood induced by each memory action relative to its counterfactual alternative. We instantiate RNP on attention (R-ANP) and convolutional (R-ConvCNP) backbones and evaluate on four streaming benchmarks (delay-differential sys- tems, regime-switching streams, abrupt-MNIST, and a wearable energy-expenditure dataset) across varying memory budgets. The best RNP variant attains the highest likelihood on 27 of 32 streams