具有期望总报酬判据的约束可数状态非平稳Markov决策过程

郭先平

(Constrained denumerable state non-stationary MDPs with expected total reward criterion)

Xian Ping GUO

应用数学学报(英文版) . 2000, (2): 205 -212 .