New Robotics Environments In OpenAI Gym

March 1, 2018, 8:27 p.m. By: Kirti Bakshi

OpenAI Gym

Gym that is written in Python is basically a collection of environments/problems that have been designed for testing as well as developing reinforcement learning algorithms—it saves the user from having to create environments that are complicated. It also has multiple environments.

Moving Onto What is OpenAI Gym:

The OpenAI Gym has recently gained popularity in the machine learning community and is a toolkit that is made use for research related to reinforcement learning.

OpenAI Gym puts more effort on the episodic setting of RL, therefore, in order to get an acceptable level of performance as fast as possible, aiming to maximize the expectation of total reward each episode. This toolkit further with robotic hardware aims to integrate the Gym API, in order to validate reinforcement learning algorithms in real environments.

Moving onto Ingredients for Robotics Research:

There is made a release of simulated robotics environments that are eight in number that include a Baselines implementation of Hindsight Experience Replay(HER), that over a little time has all been developed for this research. These environments have been used to train models which work on physical robots. There will also soon be a release of a set of requests for robotics research.

This current release includes four environments that make the use of the Fetch research platform and four environments that make the use of the ShadowHand robot. When compared to the MuJoCo continuous control environments that are currently available in Gym, the manipulation tasks that these environments comprise of are significantly more difficult, all of which are now easily solvable making the use of recently released algorithms like PPO. Furthermore, these newly released environments use models of real robots and require the agent to solve realistic tasks.


This release puts itself forward with eight robotics environments for Gym that make use of the MuJoCo physics simulator. The environments are:


  • FetchReach-v0: Fetch to the desired goal position has to move its end-effector.

  • FetchSlide-v0: Fetch across a long table has to hit a puck such that it comes to rest on the desired goal by sliding.

  • FetchPush-v0: Fetch until it reaches the desired goal position has to move a box by pushing it.

  • FetchPickAndPlace-v0: Fetch making the use of its gripper has to pick up a box from a table and move it to a desired goal above the table.


  • HandReach-v0: ShadowHand has to reach with its thumb and a selected finger until they meet at a desired goal position above the palm.

  • HandManipulateBlock-v0: ShadowHand until it achieves a desired goal position and rotation has to manipulate a block.

  • HandManipulateEgg-v0: ShadowHand until it achieves a desired goal position and rotation has to manipulate an egg.

  • HandManipulatePen-v0: ShadowHand has to manipulate a pen until it achieves a desired goal position and rotation.


All of the new tasks have the concept of a “goal”. All environments use a sparse reward of -1 if the desired goal was not yet achieved by default and 0 if it was achieved within some tolerance. This is in contrast to the shaped rewards that were used in the old set of Gym continuous control problems.

There is also an inclusion of a variant with dense rewards for each environment. However, when it comes to robotics applications it is believed that sparse rewards are more realistic and it is therefore suggested that everyone uses the sparse reward variant.

Open AI Gym

Hindsight Experience Replay:

Alongside these new robotics environments, There is also a release of a reinforcement learning algorithm that has the ability to be able to learn from failure: A code for Hindsight Experience Replay (HER). The results obtained also show that HER on most of the new robotics problems can learn successful policies from only sparse rewards.

For more understanding about HER, its results and more, one can refer to the link below.

OpenAI Gym


The results show that HER is capable to work extremely well in goal-based environments with sparse rewards. There is a comparison done with DDPG + HER and vanilla DDPG on the new tasks. This comparison includes the sparse and the dense reward versions related to each and every environment.

For More Information: GitHub

Ingredients for Robotics Research

Video Source: OpenAI