The goal of this project is to train an open-source 3D printed quadruped robot exploring Reinforcement Learning and OpenAI Gym. The aim is to let the robot learns domestic and generic tasks in the simulations and then successfully transfer the knowledge (Control Policies) on the real robot without any other manual tuning.
This project is mostly inspired by the incredible works done by Boston Dynamics.
This repository contains different OpenAI Gym Environments used to train Rex, the Rex URDF model, the learning agent and some scripts to start the training session and visualise the learned Control Polices.
Create a Python 3.7 virtual environment, e.g. using Anaconda
conda create -n rex python=3.7 anaconda
conda activate rex
Install the public rex-gym package:
pip install rex_gym
Install from source
Alternately, clone this repository and run from the root of the project:
In this very first experiment, I let the system learn from scratch: giving the feedback component large output bounds [−0.6,0.6] radians. The leg model (see galloping_env.py) forces legs and foots movements (positive or negative direction, depending on the leg) influencing the learning score and time. In this first version, the leg model holds the Shoulder motors in the start position (0 degrees).
As in the Minitaur example, I’m using the Proximal Policy Optimization (PPO).
The emerged galloping gait shows the chassis tilled up and some unusual positions/movements (especially starting from the initial pose) during the locomotion. The leg model needs improvements.
Galloping gait – bounded feedback
To improve the gait, in this second simulation, I’ve worked on the leg model:
I set bounds for both Leg and Foot angles, keeping the Shoulder in the initial position.
The emerged gait now looks more clear.
Galloping gait – balanced feedback
Another test was made using a balanced feedback:
The Action Space dimension is equals to 4, the same angle is assigned to both the front legs and a different one to the rear ones. The very same was done for the foot angles.
The simulation score is massively improved (about 10x) as the learning time while the emerged gait is very similar to the bounded feedback model. The Tensorflow score with this model, after ~500k attempts, is the same after ~4M attempts using any other models.
Basic Controls: Walk
Goal: how to walk straight on.
Starting from Minitaur Alternating Leg environment, I’ve used a sinusoidal signal as leg_model alternating the Rex legs during the locomotion. The feedback component has small bounds [-0.1,0.1] as in the original script.
Basic Controls: Turn left/right
Goal: How to reach a certain orientation turning on the spot.
In this environment the leg_model applies a ‘steer-on-the-spot’ gait, allowing Rex to moving towards a specific orientation. The reward function takes the chassis position/orientation and compares it with a fixed target position/orientation. When this difference is less than 0.1 radiant, the leg_model is set to the stand up. In order to make the learning more robust, the Rex starting orientation is randomly chosen (every ‘Reset’ step).
Basic Controls: Stand up
Goal: Reach the base standing position starting from the rest position
This environment introduces the rest_postion, ideally the position assumed when Rex is in stand-by. The leg_model is the stand_low position, while the signal function applies a ‘brake’ forcing Rex to assume an halfway position before completing the movement.