Using the target Q-network to stabilize an agent's learning