Algorithm and Architecture
The DQN algorithm involves using a neural network as the underlying function approximator.
This function, called Q, is then used to select actions, using an epsilon greedy policy.
A second Q function, known as the target network, is also used in order to avoid correlation
when calculating the loss factor.
DQN also utilizes a technique known as Replay Memory, which involves first storing the experiences, obtained
from interacting with the environment, and later sampling them randomly and learning from them.
This further minmizes correlation and stabilizies the performance of the model.
To achieve our results, we used the following hyper parameters:
- GAMMA = 0.99
- TAU = 1e-3
- LR = 5e-4
- UPDATE_EVERY = 8
- N_EPISODES=2000
- MAX_TIMESTEPS=1000
- EPSILON_START=1.0
- EPSILON_END=0.01
- EPSILON_DECAY=0.995
In our case, the structure of the neural network consists of three linear layers, with an input equal to the state space (37)
and a final output corresponding to the number of available actions (4). In between, the hidden layer has a total of 64 neurons.
Finally, we decided to use the relu activation function.
Double DQN
We also experimented with an additional improvement on the original DQN algorithm known as Double DQN or DDQN.
Which in theory should prevent incidental high rewards that might not accurately reflect long term returns and
stop q-values from exploding in early stages of learning.
In our case, we simply edited our original model to instead choose action using our local network,
then evaluate the selected actions using the target network. This resulted in a slight performance bump, as shown below.
DQN
Environment solved in 606 episodes! Average Score: 13.03
Episode # | Average Score |
100 | 0.46 |
200 | 3.20 |
300 | 4.88 |
400 | 4.63 |
500 | 7.50 |
600 | 10.58 |
700 | 12.84 |
706 | 13.03 |
DDQN
Environment solved in 596 episodes! Average Score: 13.02
Episode # | Average Score |
100 | 0.32 |
200 | 1.31 |
300 | 5.75 |
400 | 7.11 |
500 | 8.96 |
600 | 11.09 |
696 | 13.02 |
Obstacles & Future improvements
The next step would involve modifying the algorithm with the techniques such as Dueling DQN and Prioritized Experience Replay, as well as trying my hand
at implementing the pixels to action version of the project.
We would also like to experiment with the neural network architecture by changing the number
of layers and neurons, and inspect how it would affect the output.
References
-
A link to the original paper on DQN can be found here.
-
The paper on Double DQN can be found here