On the Linearity of Q-Functions
A problem of particular focus for the past few days for me has been on attempting to establish when we can make the claim of linearity in the set of Q-functions, based on an outcome model and the relevant distributions of covariates which go into it.
Briefly, if we define a DTR over \(K\) stages, which we wish to fit using Q-learning, we know that the parameter estimates for the Q-functions will be consistent, so long as all of the models are correctly specified. Since each Q-function takes the form \(Q_k = E[V_{k+1}|H_k]\), then it is natural to wish to use linear regression models for these functions as they each take the form of a conditional expectation.
However, the quantities \(V_{k}\) are given based on the maximization of the previous Q-function (where here, previous actually refers to the \(k+1\)-st function), over the treatment parameter, which is binary. As such, these value functions will contain an indicator function times by some linear functional, which of course has the potential to induce severe non-linearity into the Q-functions (particularly as they propagate over time).
This problem is more formally discussed in (P. J. Schulte, A. A. Tsiatis, E. B. Laber, and M. Davidian, “Q-and a-learning methods for estimating optimal dynamic treatment regimes,” Statistical Science, vol. 29, no. 4, pp. 640–661,2014.), where they demonstrate that even in a simple scenario, this non-linearity is a problem.
So… what? There are two obvious questions that arise from the revelation that Q-functions may, in fact, be non-linear. (1) Are there more flexible models that we can use in place of linear regression? (2) Can we characterize precisely when Q-functions will be linear?
For (1) the answer is “yes” and it is done somewhat frequently. This is, of course, a more fruitful avenue of research moving forward, and will certainly be where I turn my attention next. The second question, however, is quite interesting to me.
It is certainly possible (in fact, upon some moderate thought, not all that difficult) to define DTRs which do indeed have linear Q-functions; I am interested in seeing whether sufficient (and, perhaps more aspirationally, necessary) conditions are possible.