A brief tutorial to OpenAI’s Gym: Part 1

4 minute read

Published:

I have been playing with OpenAI’s gym for a while and I’ll say it is one of the best library out there to implement control problems for reinforcement learning (Though I have not been using it for RL 🙂). In this short blog I’ll introduce the basic commands and blocks to make your own environment in OpenAI Gym. Let’s get started

Installation

Well, it is as easy as it can get.

pip install -U gym

Or if you are using a conda environment:

conda install -c conda-forge gym 

The structure of gym

The gym code has mainly four class: Environment: env, Spaces, Wrappers, and Vector. Most of you will only need Environment class to implement any RL algorithm, so you can skip the rest and go straight to the Environment class.

Environments

Environment class is the building block of the gym environment that basically simulates the environment you want to work on. It has a range of diverse simulation environments like Mountain Car, Lunar lander, Pendulum etc. With gym, it is also very easy to create your own custom environment which we will go over pretty soon.

For this tutorial, we will start with Lunar Lander environment where the goal is to land a lunar agent at specified position in upright position. The control commands in this case are throttle for agent’s engine.

Sample from gym’s Lunar lander.

Sample from gym’s Lunar lander.

Importing the environment

import gym 
env = gym.make("LunarLander-v2")

This will create an environment instance for lunar lander named env for us. If you want to import any other environment just change the environment string in env.make. If you are unsure of the environment string name for your environment, you can always refer to [registration.py](http://registration.py) file at https://github.com/openai/gym/blob/master/gym/envs/registration.py. Line 308-341 describes the names of the environment. For example you can call Cartpole environment by using the following command.

env = gym.make("CartPole-v0")

Let’s play with the environment

For any environment we are concerned with two set of questions.

  1. What is the current state of the system ?
  2. What is the control command given or should be given to change the state of the system?

The state of the system is implemented in the observation_space . It defines the valid values/states an agent can take in the environment. action_space defines the valid actions for the agent.

Let us now go over some functions to play with the environment.

  1. reset

    This function will return the environment to its initial state.

     obs = env.reset()
     print("The current observation is {}".format(obs))
    

    This will reset the environment (we called the lunar lander lander as env before) to its initial position and return the current observation.

  2. step

    This function will take an action input and apply it to the environment. The function returns four outputs: a. observation of the system, b. reward for executing that action, c. done: state of the episode and d. info: information for debugging.

Before we implement a basic program, let us go over render command first. Most of the time, you will like to see how the environment’s observation visually looks like. You can use the render command to capture the simulation at any point of time.

env.render(mode = "human")
env.close()

env.render will open a pop up window and display the current state of the agent. Call env.close to stop the rendering as it requires a lot of resources. If you want a image of the environment change the mode to “rgb_array”.

Basic Implementation

import gym 

env = gym.make("LunarLander-v2") # Make the gym environment 

obs = env.reset() # This will reset the environment state 

for step in range(num_steps):
	# For each step, take random action 
	# You can modify this to something else 
	 action = env.action_space.sample()  # Take a random action from the action space 
	 
	 obs, reward, done, info = env.step(action) # This will apply the action 

	 
   env.render(mode = 'rgb_array') # Render the env

env.close() # close the env once you are done