A Review of Nine Physics Engines for Reinforcement Learning Research

Share :

Table of Contents

Abstract

We present a review of popular simulation engines and frameworks used in reinforcement learning (RL) research, aiming to guide researchers in selecting tools for creating simulated physical environments for RL and training setups. It evaluates nine frameworks (Brax, Chrono, Gazebo, MuJoCo, ODE, PhysX, PyBullet, Webots, and Unity) based on their popularity, feature range, quality, usability, and RL capabilities. We highlight the challenges in selecting and utilizing physics engines for RL research, including the need for detailed comparisons and an understanding of each framework’s capabilities. Key findings indicate MuJoCo as the leading framework due to its performance and flexibility, despite usability challenges. Unity is noted for its ease of use but lacks scalability and simulation fidelity. The study calls for further development to improve simulation engines’ usability and performance and stresses the importance of transparency and reproducibility in RL research. This review contributes to the RL community by offering insights into the selection process for simulation engines, facilitating informed decision-making. Index Terms—Reinforcement Learning, Physics, Engine, Review

Introduction

Some of the more well-known research examples in reinforcement learning (RL) like Hide and Seek or the Sumo environment by OpenAI [3], [4] involved embodied agents in simulated 3D environments [14], [19]. According to Legg and Hutter [34] an agent’s intelligence can be defined by its “ability to achieve goals in a wide range of environments”. Taking this definition into account, the enablement of embodied agentenvironment interaction is crucial. While deep learning (DL) libraries like TensorFlow [1] or PyTorch [45] and environment frameworks such as OpenAI Gym [11] or PettingZoo [52] have lowered the entry barriers for RL research [60], RL research is still held back by the difficulties of choosing and handling physics engines for implementing these interactions. In theory, multiple frameworks and simulation engines exist for RL experiments in physically simulated environments, yet RL researchers often do not describe the used simulation pipeline and their decision in detail (e.g. [3], [19], [44]). Thoroughly testing and comparing various simulation frameworks before choosing the right one is prohibitively time-intensive. Hence, researchers turn towards ready-made solutions but may struggle to find suitable resources and tools for creating RL environments, especially if they lack in-depth domain knowledge. Quantitative performance comparisons in the context of RL are rare in the existing literature and often provided by the engine developers themselves [36], [53]. Accordingly, these evaluations can be biased when it comes to the advantages and disadvantages of the presented engine. Of the available comparisons, many are outdated [20], [24], that is, they do not describe the current state and variety of frameworks accurately. Thus, current trends and developments in the field (e.g. increased need for environment complexity, multiagent reinforcement learning, GPU-based simulation) are not sufficiently reflected in the literature. The remaining recent comparisons are largely focused on simulation capabilities for industrial robotics applications with RL [10], [12], [15], [33] or on the RL algorithms and specific environments [22], [29]. A general and systematic review of the underlying engines, particularly one that also considers capabilities for multi-agent reinforcement learning (MARL) research, is missing.

Our review will focus on the usability of different engines for basic RL simulations to support researchers in choosing the appropriate creative tools for designing better and increasingly more challenging RL environments, algorithms, and training setups. [26]. Therefore, we aim to review the most popular frameworks in RL research, regarding their popularity, feature range, feature quality, and usability. While most framework documentations offer simple environments like the one presented in Figure 1, we assess the engines’ capabilities to generate multi-agent-ready 3D environments that can give rise to high task-complexity and task combinations. For this, we chose frameworks specialized for physics simulation (Brax, Chrono, Gazebo, MuJoCo, ODE, PhysX, PyBullet, and Webots) as well as the more broadly applicable game engine Unity. We provide insights into our selection process and briefly mention excluded candidates like Unreal Engine and Project Malmo in the Honourable Mentions section.

The main contributions of this paper are:

• A review of the engines’ popularity in terms of citations.

• The evaluation of the engines’ feature range, feature quality, and usability.

• A detailed assessment of the engines’ RL and MARL capabilities.

(a) The ant environment in MuJoCo [46]

(b) The crawler environment in Unity [55]

Fig. 1: Two similar environments realized in different engines

Methodology

Our methodology comprises two major parts. First, we evaluate the popularity of different engines by looking at the number of citations of the respective publications, as well as the increase in that number of time. After that, we compare the physics engines along various relevant dimensions.

A. Popularity Analysis

The popularity analysis aims to evaluate the popularity of physics engines for RL by performing an analysis of the citations of certain frameworks in scientific databases. We compared the popularity of the individual physics engines in the field of reinforcement learning research via the overall number of citations and the number of ML-related citations. We describe the specific steps carried out for the popularity analysis in the appendix.

B. Feature Analysis

The feature analysis aims to evaluate the feature range, quality, and usability of the physics engines for RL. We assess these criteria based on publications using these engines, the documentation provided by the engine’s developers as well as previous papers reviewing performance and usability. We describe the specific steps carried out for the feature analysis in the appendix.

C. Comparison Criteria

We selected the following comparison criteria to reflect each framework’s feature range, feature quality, and usability for RL research.

1) Open source: Whether an engine is open-source, openaccess, or closed-access. Open-source frameworks allow for greater accessibility, customization and better integration with external tools and existing frameworks. We did not consider any paid features.

2) Documentation: The accessibility, extensiveness and visualization of the documentation, as well as the number and quality of provided examples.

3) Community resources: The extent of relevant forum entries, Q&As, and user-created models and environments.

4) 3D Model library: The availability and capabilities of general-purpose RL agent models such as ants and humanoids. 5) 3D Model creation: The ease of creating and customizing models for RL agents. Optimally, this is possible by manipulating objects within a well-interfaced editor. Model creation only via the handling of code in XML files and similar formats is rated unfavorably. 6) Environment library: The availability and usefulness of general-purpose 3D environments that can be used as training arenas for general purpose and RL.

7) Environment creation: The ease of creating and customizing environments for RL agents. Optimally, this is possible by manipulating objects within a well-interfaced editor. Environment creation only via the handling of code in XML files and similar formats is unfavorable.

8) Sensors: The range of available sensors, e.g. camera, touch sensors, or radar.

9) Gym Wrapper: The availability, usability, and feature range of gym wrappers for the particular engines.

10) Rigid body dynamics: The possibility and fidelity of simulating basic kinematic interactions.

11) Multi-joint dynamics: The possibility and fidelity of simulating complex multi-unit kinematic interactions in embodied agents.

12) File formats: Support for importing and exporting Unified Robot Description Format (URDF) and MuJoCo Modeling XML File (MJCF). Both are XML file formats used for representing robot models and virtual agents.

13) Visualization: The graphical fidelity of the presentation of simulation episodes and results, availability of in-built rendering solutions, as well as the functional aesthetics and usability of the visualization interface.

14) Performance: The optimization (or possibility of optimization) for training of RL agents by allowing for efficient parallel computing. This is a particularly important dimension, and we discuss the results in a dedicated Performance section.

TABLE I: Overall publications and ML publications citing each framework’s original paper since it was first released

Results

A. Popularity Analysis

Table 1 shows the number of overall publications and MLrelated publications citing each physics engine on the 6th of September 2023.

MuJoCo is the most popular physics engine in terms of citations, with over 3800 citations since its release in 2012. Gazebo follows closely, with 2698 citations since its release in 2004. Webots and PyBullet have also been well-cited, with over 988 and 1308 citations respectively. Brax, the newest of the engines listed, has received 166 citations since its release in 2021, which is relatively low but expected given its recent release. Unity’s proportion of ML-related citations to overall citations might be skewed since the associated publication by [26] addresses Unity as a platform for learning agents and is not an all-purpose introduction to the Unity engine as a game development toolkit. In this broader sense, Unity is more wellknown.

In terms of the number of ML papers that have used a particular physics engine, MuJoCo, and PyBullet are the most popular, with over 3541 and 1000+ papers citing them, respectively. Unity and Gazebo have also been widely used in ML, with over 528 and 1948 citations each. Brax, ODE, PhysX, and Webots have been used in a smaller number of ML papers, ranging between 143 and 548 citations.

Figure 2 shows, that the usage of Nvidia Isaac has seen substantial growth from 2021 to 2023, with mentions increasing from 20 to 286. Mujoco maintained its popularity from 2020 to 2023, with mentions remaining high at 695 and 880, respectively. Chrono’s usage has remained relatively small in comparison but stable over the years, with only minor fluctuations in mentions from 2016 to 2023. Gazebo experienced fluctuations in usage, but overall retained a stable number of citations between 374 in 2023 and 324 in 2019. Webots had a consistent number of mentions from 2011 to 2023, with 72 mentions in 2023. ML Agents saw a significant increase in mentions from 2019 to 2022, with a peak of 163 mentions. However in 2023, there was a slight decrease to 145 mentions. Brax’s usage has steadily been increasing over recent years to 107 mentions 2023 since its publication in 2021. These year-to-year differences in usage reflect the evolving preferences and trends within the robotics and RL research communities.

It is worth noting that the popularity of a physics engine can be influenced by factors such as ease of use, documentation, community support, and compatibility with other tools. Therefore, while the number of citations and ML papers is a useful measure of popularity, it may not necessarily reflect the best physics engine for a particular application.

Fig. 2: Yearly citations of the frameworks’ original publications

B. Feature Analysis

MuJoCo

MuJoCo (Multi-Joint dynamics with Contact; [53]) is an open-source physics simulation engine specialized for robotics, biomechanics, animation, and ML. It is owned and curated by Google DeepMind and has become popular with leading RL researchers, such as OpenAI’s multi-agent research [3], [41]. It provides rigid body dynamics in interaction with their environment, such as collision detection and contact resolution, support for various joint types as well as actuation options. This makes MuJoCo especially suitable for RL focussed on embodied movement. The simulation can be visualized interactively with a native graphical user interface (GUI) for rendering the simulation including meshes and textures, but not while training. Advanced rendering options like complex lighting and shaders are limited, but less relevant for RL. Furthermore, it allows users to selectively run parts of the computation pipeline for flexibility [53]. No direct gym environment integration is provided. MuJoCo can be accessed via DeepMind’s Control Suit (dm_control; [51]). However, this Python API is currently poorly documented and lacks transparency as well as usability. Accessing objects via dm_control can be difficult. We encountered bugs and unexpected behaviors. Furthermore, it restricts the creation and customization of XML models. MuJoCo has decent documentation that is only hard to navigate because of its large extent and lacking structure. The documentation itself is well presented with highlighted code, images, videos, and GIFs. Python bindings are taught in Google Colab notebooks. An overview and demo notebook is provided, but many functionalities found on the GitHub repository are not explained. Along with the engine, Google DeepMind offers a collection of pre-defined, importable models equipped with joints and limbs that are useful for embodied RL2 . Models and environment content can be defined and customized in XML files. The resulting physical model can be hard to pre-visualize from just code. Hence, additional 3D modeling tools that can export MJCFs or URDFs, like Blender, may be required for custom model creation in more complex projects. MuJoCo natively runs on a single thread, but multi-threading can also be implemented3 . Furthermore, there are multiple libraries like Envpool [58], which enable users to increase the sampling performance significantly by applying highly optimized vectorization techniques. At the same time it has to be mentioned that such libraries are often only optimized for classic single-agent gym tasks and are not directly usable for MARL setups. If such libraries can be used for custom multi-agent environments, they often require proficiency in programming languages like C or a generally deep understanding of the used framework. However, single thread performance is sufficient for the wide range of tasks and often not worth the additional core usage [53]. Despite its daunting entry barrier and lacking documentation, the range of features make MuJoCo a powerful and flexible framework for RL.

TABLE II: Feature and Usability Comparison (full description can be found in Chapter II). Legend: feature or a range of features is fully available and functional (+ +), feature is available, but lacking in some regards (+), feature is either available, but lacking or only available via workarounds (−), feature is not available or difficult to integrate (− −)

PyBullet

PyBullet [17] is an open-source Python module for robotics simulation and ML that allows users to dynamically create and simulate physics-based environments for RL. PyBullet wraps the C-API of Bullet and offers simple integration with TensorFlow and PyTorch. PyBullet supports loading URDFs and MJCFs with dedicated functions [17], which can be used to import and implement ant, humanoid, half-cheetah, and similar models as shown in the PyBullet Quickstart Guide [18], [16]. Notably, shapes and multi-body models cannot only be defined in external XML formats, but also directly via PyBullet functions. Users can equip agents with sensors to capture information such as position, orientation, velocity, or contact forces. Complex 3D environments are not provided. PyBullet has functional visualization but is not specialized for graphics rendering. Complex lighting, textures, and shaders are not supported [26]. PyBullet does not provide a prebuilt MARL environment. However, [43] developed an open-source OpenAI Gym-like environment called gym-pybullet-drones for multiple quadcopters. Several researchers utilized this framework to conduct MARL research [21], [47], [54], thus providing evidence for the capabilities of the underlying PyBullet engine in principle. PyBullet’s documentation is hard to access as the main site links to three different sources, which are limited to Google Docs and poorly formatted PDF files without code highlighting on their GitHub repository. This makes the needed information scattered and hard to connect. The main document, the PyBullet Quickstart Guide, provides somewhat extensive information over 75 pages that lacks in-depth use cases. Example applications and showcases are found on GitHub, however not in Python. Nevertheless, PyBullet has a large community 4 .

Unity

Unity [26] is a popular game development engine. In contrast to the other presented engines, Unity is not opensource, but rather open-access where not all features are available in the free version. Paid plans for professional and entrepreneurial use exist. Unity stands out compared to the other engines, due to its intuitive interface that combines all features in a single workspace. With Unity ML Agents, it offers a large open-source toolbox with 3D training arenas, model assets that are already equipped with RL algorithms, like Proximal Policy Optimization (PPO) and Soft-Actor Critic (SAC), that work out-of-the-box. Through its asset store, Unity offers a large array of official and community-built packages, that can be especially useful for the design of various environments. Many of these are free and continuously updated. Unity ML Agents also offers a Python API to integrate externally defined agents. Unity physics, the engine’s package for deterministic rigid body dynamics simulation, can be complemented with plug-ins for the engines Havok 5 and MuJoCo 6 . A wide range of sensors is available. Unity has highly accessible and extensive documentation, with wellstructured tables of content and hyperlinks to related sections. Code examples are well-highlighted and embedded in visually appealing tutorials that cover all aspects of the engine.

Many RL paradigms implant agents into video-game-like scenarios, where they have to solve tasks similar to those set for human players [26]. Historically, some of the most notable milestones of AI research have been performances in games. This includes digital versions of classical board games, like chess and go [49] as well as established video games, such as StarCraft II [56] and Dota 2 [42]. Furthermore, the emergence of generalizable skills in agents that are applicable to a range of different video games and RL environments is one of the core objectives of much of RL research [34], [48]. This has been tried and tested successfully [14] with AI benchmarks based on Unity, such as the Obstacle Tower Challenge [27]. For these reasons, Unity should, in theory, be the natural choice of engine for the implementation of any video-gamelike RL scenario. However, the fact that Unity is specialized for game development poses several disadvantages. Unity’s optimization for video games clashes with the RL training demands of maximizing frames, i.e. simulation steps per unit of time and computational resource [57]. Another hurdle is that Unity ML Agents is only convenient as long as the whole pipeline is assembled within Unity. There are significant hurdles when it comes to integrating a Unity environment into existing Python code. The limited development possibilities on top of Unity as opposed to within Unity can be identified as a core problem. Unity makes the setup of multi-agent scenarios quite practical and easily implementable. However, efficiency becomes even more of a problem for MARL than for single-agent training. If a simulation becomes too complex and computationally expensive, Unity increases the time between simulation frames. This hurts simulation fidelity and constrains MARL approaches, as MARL has typically a high amount of interacting units, especially with embodied setups. Workarounds to manually fix simulation fidelity and training efficiency problems exist (see [57]). Despite these disadvantages, Unity is used by leading researchers for complex and computationally demanding RL scenarios [19], [40]. However, both did not utilize the ML Agents toolkit but went for custom solutions based on [57]. Google DeepMind’s extensive resources have to be considered here, as this adaptation of Unity to specific RL needs might not be as easily imitated.

Gazebo

Gazebo [32] is an open-source robot simulation software for simulating and testing robotic systems developed by Open Robotics. It is the official simulation platform for the DARPA Robotics Challenge [24]. Gazebo offers rigid body dynamics, various types of joints, and sensors through multiple supported physics engines, including ODE, Bullet, Simbody, and DART, allowing users to easily switch between them. Users can utilize a wide range of sensors. Gazebo provides a wide range of pre-built models and environments designed for simulation purposes. With its own editor system, users can create and modify simple models directly in the GUI. The Gazebo GUI renders the 3D simulation in real time. Gazebo allows users to define and customize robot models using URDF or SDF. However, customization of a large environment could take a lot of time [24]. Gazebo provides the Python package sdformat-mjcf7 that allows bidirectional conversion between SDF and MJCF. Gazebo’s documentation consists of a tutorial section with explanatory images, examples, highlighted code, and some hidden automatically generated documents. The rudimentary are not explained at all in the documents, only somewhat in the tutorials. No Python bindings are explained or available apart from PyGazebo8 9 and Ignition10, where it is unclear to the user whether the information provided is official.

Gazebo does not provide an official gym wrapper. However, the Gazebo simulator offers a rich set of APIs and tools for simulation, physics-based modeling, and visualization, which can be used alongside the OpenAI Gym framework by creating a custom gym wrapper. There is open-source project called gym-gazebo2 [35] that provides a gym wrapper specifically designed for integrating Gazebo simulations with RL algorithms. Gym-Ignition 11 is a framework that provides reproducible robotic environments for RL and robotics research [22]. Users can create environments in either Python or C++. This feature combined with the multitude of supported engines enables effective randomization and helps prevent potential overfitting issues. Gym-Ignition currently has limited support for photorealistic rendering [22]. Although Gazebo itself does not provide an environment for setting up MARL, MultiRoboLearn [12] provides a framework to apply MARL to Gazebo, specialized for robotics simulation. Base Gazebo exhibits considerable performance loss with multi-agent setups [10]. Gazebo’s problematic usability makes the implementation of 3D environments difficult. However, users can trade-off simulation speed and computational cost for higher fidelity. Thus, it seems more appropriate for robotics RL, especially industrial applications [33].

PhysX/IsaacGym

Nvidia’s PhysX [39] is an SDK mainly used for visual effects, video game development, robotics and medical simulation12. Using Nvidia IsaacGym [36] as a gym environment, PhysX can also run RL algorithms in its virtual environment. Examples given by IsaacGym are implemented in PyTorch, but TensorFlow is equally feasible. While PhysX is open source, IsaacGym is not, which might hinder its customization [22]. Typical MuJoCo and RL Games13 models can be used. With IsaacSim in Nvidia’s Omniverse, an even more specialized toolkit for robotics simulation exists. IsaacGym provides a PPO implementation and supports MJCF and URDF. PhysX supports photorealistic rendering in an intuitive interface. Range, contact, force and camera sensors are available via extensions14. PhysX and Isaac Gym are excellently documented with a digestible structure, visual examples, extensive documents, explanatory text as well as video tutorials and GIFs.

as video tutorials and GIFs. IsaacGym’s distinguishing feature is that it leverages GPU acceleration to increase simulation speed compared to other engines’ CPU-based physics simulation. By directly connecting the simulation backend with PyTorch Tensors, IsaacGym aims to avoid CPU bottlenecks. If CPU power availability is an issue, this can be an immense advantage, as it potentially increases the number of RL environments that can run simultaneously on a single computer and decreases the need for costly computing clusters. Notably, ant, humanoid and hand movement benchmarks showed decreased training time [36]. Nevertheless, GPU-based simulation can be hindering to successful RL research as the GPU will often have to be fully dedicated to running the deep learning algorithm and the CPU is rarely fully occupied in MARL. Thus, PhysX might be more of a specialized tool for robotics RL and less suitable to basic RL research. At the same time, even though there are some examples of MARL setups in Nvidia Isaac [13], the implementation requires more in-depth programming knowledge than other comparable setups. However, as PhysX and IsaacGym represent one of the few high-usability, unified frameworks for RL and physics simulation [22] at the moment, the drawback might in some scenarios be worth the cost.

ODE

ODE15 (Open Dynamics Engine) provides access to an open-source C/C++ library designed for simulating rigid body dynamics. It supports advanced joint types and integrated collision detection with friction. It is commonly used for simulating vehicles and dynamic objects in 3D environments. The documentation is scattered across several different web pages and is hard to navigate. The single-page user manual and a dedicated tutorial section provide explanations of core functionalities and automatically generated documents with a severely dated appearance are provided. Some rather short code examples without highlighted code are hidden within ODE’s GitHub repository. The FAQ on GitHub is very thorough, however. Many features relevant to the criteria evaluation were not locatable or not documented. While ODE does not provide a Python API directly, there exists PyODE16, which is a set of open-source Python bindings for the Open Dynamics Engine. ODE does not provide direct support for URDF or MJCF format. Additionally, ODE does not include built-in sensor functionalities. Visualization of simulation results as well as the interface in which it is embedded were neither high-resolution nor up to modern UI standards. Overall, ODE is outdated and unwieldy on the usability side and makes for a strenuous implementation of state-of-the-art RL paradigms. Furthermore, it has little relevance in today’s RL research literature (see popularity comparison). Therefore, ODE seems only applicable to current RL research setups through its comparatively more modern front-ends and engine integrations in Gazebo and Webots.

Webots

Webots [37] is a widely used open-source robot simulation software developed by Cyberbotics, supporting C, C++ as well as Python. It simulates a wide range of robotic systems, relying on a customized version of the ODE 3D dynamics library. Webots makes highly specific sensors available, from camera and touch sensors to radar and lidar. Its GUI offers real-time 3D visualization and a front-end for modifying simulation models. Webots allows a robot controller to export URDFs. However, generated URDFs are currently limited to a few elements such as the definition of a box, cylinder, or sphere. Webots does not directly support MJCFs. It has its own native file format, PROTO, for defining the structure, appearance, and dynamics of robot models. The documentation of Webots is well-structured, providing user and installation guides that are easy to access. Its documentation makes good use of images, videos, and code chunks with highlighting. Both the reference manual and the user guide are quite extensive. They have a dedicated tutorials section that is extensive with great explanations, code, and images. Since Webots is built on top of ODE, users will have some inconvenience in checking the poorly structured ODE documentation for certain parameter or function explanations. The Webots environment library is limited to a few specific examples, such as an apartment and a factory. The available Webots model library is specialized for complex robotics simulation17 rather than general purpose RL. Simple models for embodied RL, like the typical ant, are possible to implement in Webots, but have to be made from scratch or imported as a third-party asset. Similarly, base Webots offers no integration for Tensorflow or PyTorch as well as no multi-agent simulation capabilities, but Deepbots18 [31] closes these gaps. Deepbots interfaces Webots with OpenAI Gym and adds functionalities necessary for controlling RL agents and gym environments while hiding Webots features that are not relevant for RL. Thus, the RL algorithm backend, TensorFlow or PyTorch is connected with the simulation side. However, Deepbots, as the name suggests, is specialized for robotics, and the complexity of the provided environments is achieved through complicated multi-joint robotics models, rather than tightly packed 3D worlds. Several simple ready-to-use environments, such as CartPole, PitEscape, and FindBall 19, can be used to benchmark RL algorithms in Webots [31]. However, no MARL algorithmic environments are provided [12]. Deepbots has not caught on yet with the RL research community (see citations of [31]). Generally, Webots appears to not lend itself to highly scalable training and therefore MARL [10], as it runs each simulation in its GUI and can only be parallelized by opening multiple instances of Webots manually [33]. Its high-fidelity simulation and user-friendly GUI, however, make it especially suitable for robotics RL setups that do not have high parallelization demands.

Brax

Brax, “a differentiable physics engine for large scale rigid body simulation” [23] is an open-source physics simulation engine written in JAX that is accessible via Google Colab 20. Brax simulates physical systems made up of rigid bodies, joints, and actuators and offers high flexibility for creating multi-agent environments with different physics properties, observation spaces, and action spaces. [23]. It is specifically designed for RL and optimized to efficiently run parallel physics simulations alongside the RL algorithm on a single accelerator. Brax specifically aims to solve similar problems and offer similar models to MuJoCo. Whereas most aforementioned simulation frameworks separate simulation (CPU) and RL algorithm (GPU/TPU), Brax brings both together on a single GPU or TPU chip in order to reduce latency. Brax is quite new and poorly documented. The documentation comprises only a short readme file and three example notebooks in Google Colab. No central webpage for information is available. No models or example environments are provided. Furthermore, community resources, like assets or helpful forum entries, are not to be found. Brax’s model library offers implementations of the basic MuJoCo models, such as the ant, humanoid, and half-cheetah21 22, but not much beyond. No complex training environments are provided. According to [10], Brax has problems with complex MARL, precisely because of its computationally expensive high-fidelity simulation. Scaling the number of agents increases this problem and after a threshold of only a low number of agents the simulation reaches a standstill. Its main selling point, GPUbased simulation, is also offered by PhysX/IsaacGym with a better feature range and usability. For these reasons, Brax in its current form does neither seem to be a platform for general RL, nor fill a more specific niche. Despite these criticisms, we recognize this innovative approach and the effort to make deep learning more accessible and less reliant on high-performance clusters.

Chrono

Chrono23 [50] is an open-source modeling and physics simulation engine for robotics and vehicle dynamics. It offers a wide range of physical simulation capabilities, including collision detection, rigid body dynamics, and various force elements. PyChrono24 [8] wraps the C++ simulation library and allows users to build physical models and exchange data between the simulation and ML framework. For RL setups, Chrono provides a custom PyTorch PPO implementation [8]. A Chrono-based simulation environment to design and test end-to-end exists [9]. However, it is mostly focused on training autonomous vehicles and robots in off-road settings [8], [59]. Gym Chrono25 is a set of PyChrono-based OpenAI Gym environments. Gym Chrono provides examples for training via TensorFlow and PyTorch. Chrono::Sensor26 provides a rich set of sensor modules which can simulate cameras, lidars, radars, gyroscopes etc. It does not directly support URDF or MJCF format natively and its model library mainly offers vehicles and robots. However, in Gym Chrono, users can utilize ant models [7] for RL setups. Chrono provides a limited environment library. Its GUI provides convenient control and monitoring of simulations. Also, Chrono integrates with various visualization libraries, such as Irrlicht, OpenGL, and Unity3D, to render the simulated systems in run-time. Chrono’s and PyChrono’s documentation is comprised of a poorly structured automatically generated document. The main document is somewhat extensive, but lacks explanation of fundemantals, while the dedicated tutorial section is code-only and does not explain anything on a conceptual level. Only sparse images and no explanatory videos are provided. We found Chrono’s negligible relevance in the ML literature (see popularity comparison), poor usabilty and focus on vehicle robotics [15] to indicate a limited usefulness as an engine for RL research and MARL purposes.

Honorable mentions

Unreal Engine is a popular openaccess game development engine. Recently, Unreal Engine introduced Learning Agents, a plugin geared towards game developers who want to write AI bots. The Learning Agents API can be accessed via Unreal Engine’s general user interface and can be used with C++ and Python. Agents can be trained with an existing PPO algorithm. Support for SAC and QLearning is provided. However, the Learning Agents API has been available for less than seven months as of December 2023 and has correspondingly not been widely cited in the relevant RL literature. For this reason and because Epic Games, the developers of Unreal Engine, state themselves, that Learning Agents is not a general purpose ML framework, we won’t go into detail comparing it to other engines. Third-party tools for RL with Unreal Engine, e.g. Mindmaker, are available via the Unreal Engine Marketplace. Godot is a open source game development engine that can be used for RL research via the framework Godot RL Agents [6]. However, Godot itself is not widely used and has even less relevance for RL [6]. More interesting for RL researchers is Generally Intelligent’s Avalon [2], a 3D simulator based on Godot that lets users plug RL agents into ready-made environments with complex task interaction possibilities. Project Malmo [25] is a useful platform for exploration-related RL experimentation that is based on the Minecraft engine. As such, it is constrained by the limitations of the underlying video game [26] and cannot provide complex, embodied physics simulation with high fidelity and it cannot be used to build scenarios that are not feasible in Minecraft. Similar limitations are true for ViZDoom [28] which is based on the underlying engine of the popular video game Doom and DeepMind Lab [5], based on Quake III. VMAS (Vectorized Multi-Agent Simulator; [10]) is a 2D physics engine written in PyTorch that is specifically designed with efficient MARL in mind. However, the lack of 3D implementations severely limit the possible complexity of the training environment as well as the agent-environment interaction.

Performance

[38] showed that MuJoCo is better than PyBullet and ODE at generalizing learning to other engines, i.e. agents who learned to solve a task in MuJoCo still perform when the same task is transferred and implemented in a different engine. Agents trained via PyBullet did not transfer their learning at all. Thus, it might be the case that, for example, agents trained on PyBullet just learn to navigate PyBullet environments, whereas agents trained on MuJoCo learn to navigate any similarly simulated environment. MuJoCo’s developers [20] compared the speed, simulation stability, and simulation accuracy of Bullet, MuJoCo, ODE, and PhysX by implementing the same scenario in each engine and measuring the time steps at which simulation errors occurred. They found MuJoCo to have the best performance out of all engines, especially in scenarios that simulate bodies with many joints or connected elements. [33] implemented a similar broad range of use cases with Gazebo, MuJoCo, PyBullet, and Webots and compared the ratio of simulation time that can be achieved in real-world time (RTF). MuJoCo was reported to have a high RTF across scenarios, at the cost of some accuracy. PyBullet achieved a lower RTF but was highlighted for its superior usability. Meanwhile, Gazebo was found to be unwieldy and most suitable for simulations that are intended to be transferred to real systems. Webots showed high stability and RTF even in the most complex scenarios but is criticized for its lack of native parallelization support. As already established, Brax scales poorly in MARL setups [10].

Limitations

To rigorously assess and compare the quantitative performance of the presented frameworks, one would have to implement the same scenarios for typical RL use cases in all engines. This goes beyond the scope of this paper and due to the sheer required effort has not been attempted to a sufficient degree by any other publication to the best of our knowledge. For statements on technical details of the engines we relied on information from the engine publishers and developers, as well as external researchers who used and evaluated them. Therefore, the performance evaluation is neither exhaustive nor compares all frameworks on equal footing. Correspondingly, the evaluation might be skewed by the availability of data on the engines. On the other hand, sparse information is a legitimate shortcoming

Conclusion

In this paper, we looked at 9 frameworks for RL research and reviewed them regarding their popularity, feature range, feature quality and usability and we contributed to the field by providing an overview of the engines that enables researchers to make informed decision when choosing their framework for RL simulation. We paid special attention to the engine’s MARL capabilities. We conclude, that for successful RL research, it is first necessary to sharply define the intended scenario and research whether a suitable implementation is not already available. For example, there is no reason to handle the usability inconveniences of MuJoCo if it is sufficient to have a MARL setup in 2D. This holds especially true for RL training on video game scenarios, where the selection of benchmarks is plentiful. For anything more specific, the choice of physics engines naturally depends on the defined needs and available resources of the project.

MuJoCo is currently the dominant framework for RL research due to its good performance and flexibility, even though its documentation is sometimes lacking and might make usage for smaller teams more difficult than with other competitors. Compared to the other engines, MuJoCo currently provides one of the best foundations for MARL due to its high simulation fidelity and high training efficiency. Nevertheless, the creation of complex training environments for MuJoCo can be comparatively strenuous. Notably, high-fidelity simulation might not be useful for all training setups, as it can massively increase the computational demands while adding little benefit to setups where accurate kinematics is not paramount. PyBullet offers similar features and usability as MuJoCo, but consistently rates worse in performance reviews [20], [33], [38]. For this, it makes up in a wide range of dedicated functions for loading and defining objects and models. Once the user has disentangled the documentation, RL scenarios are straightforward to implement in PyBullet.

While designing an environment is the easiest in Unity out of all frameworks, Unity is not optimized for parallel computing and large-scale training. Unity has various preimplemented MARL scenarios and can support simple multiagent interactions, but has problems with scaling complexity and simulation fidelity. One should also consider that low simulation fidelity impacts the reproducibility of results negatively [57]. Unity’s and Unity ML Agents’ strong suit is the RL implementations of video-game scenarios. Beyond this purpose, Unity seems most suitable for proofs of concept or RL experiments that are not intended to scale the training beyond a certain threshold. Right now, Brax fails to impress, due to its limited available resources and documentation and poor multi-agent performance. However, Brax is quite new and might be updated with more useful features in the near future. PhysX/IsaacGym, on the other hand, excels in terms of usability and provides a unified framework for scenario creation, simulation, and RL. Both Brax and IsaacGym rely on GPU-driven simulation which can be disadvantageous for large-scale RL research. Base ODE is outdated both in terms of feature range and usability and accordingly has limited impact on current RL research, while Chrono lacks important features such as URDF and MJCF support. We found, that Gazebo and Webots represent powerful tools for high-fidelity simulation robotics with decent usability. However, both are not geared towards MARL applications.

Custom creation of complex environments and corresponding libraries with pre-built solutions remain a gap in available simulation pipelines. Another research gap is the lack of technical training performance comparison for MARL in complex 3D environments as well as the implementation difficulty for typical scenarios in each engine. Further development and research is needed in these areas. Study-specific transparency and reproducibility remain a structural problem in RL research, with many leading institutes and research teams opting for closed access. Further guidance on environment creation and replication and better usability of the relevant tools is thus strongly necessary. Symptomatic for the field, the most performant engine (MuJoCo) has poor usability and the most user-friendly engine (Unity) suffers from poor performance. For significant progress in the field, a better combination of the best of the two worlds has to be achieved.

Author contribution

This research was completed within the scope of the MicrocosmAI project27, which made this project possible. M.K. made the main writing contribution and organized the research and writing process. C.W. contributed to the methodology, crawling algorithm and popularity comparison. H.H. made contributions to the Chrono, Gazebo, ODE and Webots chapters. J.M., E.B. and the larger MicrocosmAI research project contributed expertise, feedback and a framework for supervision. All authors researched data and literature and contributed substantially to the conceptualization of the submitted version.

Authors: Michael Kaup, Cornelius Wolff, Hyerim Hwang, Julius Mayer, Elia Bruni

Share :

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post

EBIMAS Student Project Presented at Osnabrück University’s 50th Anniversary