Skip to content
Related Articles

Related Articles

Implementing Deep Q-Learning using Tensorflow

View Discussion
Improve Article
Save Article
  • Last Updated : 18 Jun, 2019

Prerequisites: Deep Q-Learning

This article will demonstrate how to do reinforcement learning on a larger environment than previously demonstrated. We will be implementing Deep Q-Learning technique using Tensorflow.

Note: A graphics rendering library is required for the following demonstration. For Windows operating system, PyOpenGl is suggested while for Ubuntu operating system, OpenGl is recommended.

Step 1: Importing the required libraries

import numpy as np
import gym
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

Step 2: Building the Environment

Note: A preloaded environment will be used from OpenAI’s gym module which contains many different environments for different purposes. The list of environments can be viewed from their website.

Here, the ‘MountainCar-v0’ environment will be used. In this, a car(the agent) is stuck between two mountains and has to drive uphill on one of them. The car’s engine is not strong enough to drive up on it’s own and thus the car has to build momentum to get uphill

# Building the environment
environment_name = 'MountainCar-v0'
env = gym.make(environment_name)
# Extracting the number of possible actions
num_actions = env.action_space.n

Step 3: Building the learning agent

The learning agent will be built using a deep neural network and for the same purpose, we will be using the Sequential class of the Keras module.

agent = Sequential()
agent.add(Flatten(input_shape =(1, ) + env.observation_space.shape))

Step 4: Finding the Optimal Strategy

# Building the model to find the optimal strategy
strategy = EpsGreedyQPolicy()
memory = SequentialMemory(limit = 10000, window_length = 1)
dqn = DQNAgent(model = agent, nb_actions = num_actions,
               memory = memory, nb_steps_warmup = 10,
target_model_update = 1e-2, policy = strategy)
dqn.compile(Adam(lr = 1e-3), metrics =['mae'])
# Visualizing the training, nb_steps = 5000, visualize = True, verbose = 2)

The agent tries different methods to reach the top and thus gaining knowledge from each episode.

Step 5: Testing the Learning Agent

# Testing the learning agent
dqn.test(env, nb_episodes = 5, visualize = True)

The agent tries to apply it’s knowledge to reach the top.

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!