Deep Q-learning

Previously we approximated the state or action value functions by a linear function $\hat{v}_\pi(s,{\bf w})={\bf w}^T{\bf x}(s),\;\;\;\;\;\;\;$or$\;\;\;\;\;\hat{q}_\pi(s,a,{\bf w})={\bf w}^T{\bf x}(s,a)$ parameterized by the weight vector ${\bf w}$ based on a set of features in ${\bf x}$. However, these features need to hand picked or designed based on the specific problem to solve.

Network!