Explore Panda Gym: A Multi-Objective Reinforcement Learning Environment

0

With the latest advancements in artificial intelligence and more and more research being developed every day, it is very likely that intelligent and autonomous machines are on the horizon. Machines these days can understand verbal commands, distinguish pictures, drive cars and play games, sometimes even better than an average human. One can only wonder how much longer, and perhaps he will walk among us?

But, in the development of an artificially intelligent machine, reinforcement learning and the learning environment in which it is formed play a major role. The development environment used to train for machine learning is as important as the machine learning methods used to solve the predictive modeling problem. The environment constitutes the basic and fundamental elements of a reinforcement learning problem. Therefore, it is important to understand the underlying environment with which the RL agent must interact. This helps to find the right design and training technique for the agent to deliver.

Register for this Free Session>>

The environment is the agent’s world in which he lives, and the agent interacts with the environment by performing an action, but he has no right to influence the rules or dynamics of the environment by performing these actions. So, for example, just as humans are an agent of the earth’s environment and are limited by laws. We can interact with the environment with our actions but cannot change the laws. The environment also gives the agent a reward; a returned scalar value that acts as a return for the agent informing it if its action was good or bad. Within reinforcement learning, several paradigms achieve a winning strategy, that is, forcing the agent to perform the desired action in several ways. In complex situations, calculating the exact winning strategy or reward value function becomes difficult, especially when agents begin to learn from interactions rather than from previous experience.

There are several types of learning environments. The different types of reinforcement learning environments are:

  • Deterministic environment: An environment where the next state of the environment can always be determined based on the current state and the action of the agent.
  • Stochastic Reinforcement Learning Environment: An environment in which we cannot always determine what the next state of the environment will be from the current state by performing a certain action.
  • Single agent environment: where a single agent exists and interacts with the environment.
  • Multi-agent environment: when there are several agents present that interact with the environment.
  • Discrete Environment: A space of environmental action that is discrete in nature.
  • Continuous environment: Where the space of action of the environment is continuous in nature.
  • Episodic environment: Here, the agent’s actions are limited to the particular episode and not to previous actions.
  • Sequential environment: Here, the agent’s actions are linked to the previous actions he performed.

What is Open AI Gym?

The Gym is an open source toolkit for developing and comparing reinforcement learning algorithms. What makes the job easier is that it makes it easier to structure your environment using just a few lines of code and compatible with any digital calculation library, like TensorFlow or Theano. The Gym Library is a collection of problems and testing environments that one can use to practice and develop stronger reinforcement learning models. Current environments have a shared interface, which also makes it possible to write general algorithms. In addition, it provides a wide variety of simulated environments such as Atari games, board games, 2D and 3D physics simulations, and much more, so that you can train multiple agents, compare them or develop new algorithms. machine learning for reinforcement learning problems. OpenAI is an artificial intelligence research company funded in part by Elon Musk. Its goal is to promote and develop user-friendly AI systems that will benefit humanity and work for its improvement, rather than exterminate it!

About Panda-Gym

Panda-Gym is an open source library that provides a reinforcement learning (RL) environment for the Franka Emika Panda robot integrated with OpenAI Gym. The robot simulation environment consists of five tasks: reach, push, drag, pick and place and stack. It follows a multi-objective RL framework, allowing the use of objective-oriented RL algorithms. To promote open search, it also uses the open source physics engine PyBullet. The implementation chosen for this package allows us to easily define new tasks or even create new robots.

About simulation and challenges

The environments shown consist of a Panda robotic arm known as Franka Emika1, which is already widely used in simulation and real academic work. It was designed with 7 degrees of freedom and a parallel finger gripper to perform tasks. The robot is simulated with the PyBullet physics engine, which, being open source, allows the simulation performance to be shown. In addition, the environments are integrated with OpenAI Gym, enabling all API-based learning algorithms.

The simulation task consists of a challenge of moving the gripper or objects to a target position. A task is considered completed when the distance between the entity to be moved and the target position is less than 5 cm. The five tasks presented can be refined with an increasing level of difficulty. In the task PandaReach-v1, a target position must be reached with the gripper. This target position is randomly generated in a volume of 30 cm × 30 cm × 30 cm. For PandaPush-v1, a cube placed on a table must be pushed to a target position on the table surface while the gripper is locked. Here, the target position and the initial position of the cube are randomly generated in a 30cm × 30cm square around the neutral position of the robot. The PandaSlide-v1 simulation task consists of a flat cylinder that needs to be moved to a target position on the surface of a table while the gripper is locked. The target position is randomly generated in a 50 cm × 50 cm square located 40 cm in front of the robot’s neutral position.

Since the target positions are out of reach of the robot, it is necessary to give an impulse to the object instead of just pushing it. For the PandaPickAndPlace-v1 simulation, a cube needs to be brought to a generated target position in a volume 30cm × 30cm × 20cm above the table. To lift the cube, it must be picked up with the fingers of the pliers. PandaStack-v1 Two cubes should be stacked at a target position on the table surface. The target position is generated in a 30cm × 30cm square. Stacking must be done correctly: the red cube must be under the green cube. All of these simulation challenges are still being researched and have yet to be completely solved with a perfect solution.

Image source

Getting started with the code

In this article, we will try to perform two simulations of the Panda Gym Challenge and understand what it takes to develop and configure the environment. The following implementation is inspired by the creators of Panda Gym, whose official website link can be found here.

Library installation

To begin with, we will first install the panda-gym library; you can run the following code to do it,

!pip install panda-gym
Importing dependencies

We are now going to import the dependencies needed to configure the environment,

Collect, choose and place

#importing dependencies
 
import gym
import panda_gym
Environment configuration and simulation
#assigning the simulation task to environment
 
env = gym.make('PandaPickAndPlace-v1')
state = env.reset()
 
#setting the environment
done = False
 
#rendering agent learnings
images = [env.render('rgb_array')]
while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)
    images.append(env.render('rgb_array'))
 
env.close()

To configure the environments, you can run the following lines of code,

Hyperparameters can be fine-tuned based on required performance; here we are just going to do a basic demo simulation.

See also
Big Tech and their favorite deep learning techniques

Next, we’ll install the numpngw library, a python package that defines the write_png function which writes a NumPy array to a PNG file and write_apng to a sequence of f arrays from an animated PNG (APNG) file.

#installing numpngw
!pip3 install numpngw
from numpngw import write_apng
 
write_apng('anim.png', images, delay = 100) # real-time rendering = 40 ms between frames

Display of results,

#rendering the simulation
 
from IPython.display import Image
 
Image(filename="anim.png")

As you can see, the gripper moves the block! In addition, you can see the two positions of the block. Although the simulation may not be very clear, it can be further tuned by hyperparameter or run on an even better computing system for better rendering performance.

We can do the same for another gripping slide simulation task.

Collect and slide

import gym
import panda_gym
 
env = gym.make('PandaSlide-v1')
state = env.reset()
 
done = False
images = [env.render('rgb_array')]
while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)
    images.append(env.render('rgb_array'))
 
env.close()

!pip3 install numpngw
from numpngw import write_apng
 
write_apng('anim.png', images, delay = 70) # real-time rendering = 40 ms between frames

from IPython.display import Image
 
Image(filename="anim.png")

Render time and other settings and learning environment can be configured accordingly. This tool is very satisfactory for testing deep reinforcement learning algorithms. However, some points may be limited, such as the limitations of gripper control, as it can only be controlled by high level actions such as grabbing and moving. Additional work is needed to enable the deployment of a policy learned in simulation. In addition, the simulation is not completely realistic; the main concern is the shape of the gripper for gripping subjects in the environment.

End Notes

Through this article, we have understood the essence of a learning environment in the field of reinforcement learning. We also tried to understand the panda gym problem and performed a basic demo simulation of two tasks making the Panda robotic arm, Franka Emika1. The following implementation can be found as a Colab notebook accessible using the link here. Good learning!

The references


Join our Discord server. Be part of an engaging online community. Join here.


Subscribe to our newsletter

Receive the latest updates and relevant offers by sharing your email.


Source link

Share.

Leave A Reply