Volume 30, Issue 4 p. 1273-1308
ORIGINAL ARTICLE

Continuous-time mean–variance portfolio selection: A reinforcement learning framework

Haoran Wang

Haoran Wang

CAI Data Science and Machine Learning, The Vanguard Group, Inc., Malvern, Pennsylvania

Search for more papers by this author
Xun Yu Zhou

Corresponding Author

Xun Yu Zhou

Department of Industrial Engineering and Operations Research, and Data Science Institute, Columbia University, New York, New York

Correspondence

Xun Yu Zhou, Department of Industrial Engineering and Operations Research, and Data Science Institute, Columbia University, New York, NY 10027.

Email: xz2574@columbia.edu

Search for more papers by this author
First published: 23 June 2020
Citations: 56

Abstract

We approach the continuous-time mean–variance portfolio selection with reinforcement learning (RL). The problem is to achieve the best trade-off between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm and its variant outperform both traditional and deep neural network based algorithms in our simulation and empirical studies.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.