Variance reduction for policy gradient with action-dependent factorized baselines20 de março, 2018 às 04:00OpenAI BlogVer notícia original