The Neural Morphogenetic Evolution System (NMES) is introduced as a novel platform combining a modular TCP-based interface, an AI control framework in Golang, and a real-time 3D simulation environment built on the Godot game engine. This system is designed to facilitate neural co-design – the simultaneous evolution of an agent’s morphology and controller – by allowing external algorithms to dynamically spawn and manipulate modular components (cubes) in a physics-based world. NMES is structured in tiered simulation layers, from single-cube control to complex multi-cube assemblies, all orchestrated via JSON-over-TCP messages. This decoupled design supports distributed experimentation across up to 27,000 concurrent virtual “worlds,” enabling large-scale neural architecture search (NAS) and evolutionary strategies for both control policies and body plans. We compare NMES to existing AI simulator frameworks such as OpenAI Gym, Unity ML-Agents, and NVIDIA Isaac Gym, highlighting innovations in external agent control, high-performance visualization (leveraging Godot’s physics optimizations for hundreds of objects), and integration of morphology evolution capabilities. Preliminary results demonstrate the system’s ability to spawn and control hundreds of interconnected cubes in real time (with a 5× increase in object count after physics optimizations) and validate reinforcement learning (RL) agents in this open-ended sandbox. The paper discusses how NMES’s unique combination of features addresses current limitations in AI simulation platforms and opens new avenues for research in co-evolution of body and brain.
Designing artificial agents that learn and evolve both their “brains” (controllers) and “bodies” (morphologies) is a grand challenge in robotics and artificial intelligence. Traditional reinforcement learning environments and toolkits – such as OpenAI Gym, Unity ML-Agents, and NVIDIA Isaac Gym – have driven progress by standardizing benchmarks and providing simulation platforms ([1606.01540] 1 Introduction) (Unity ML-Agents Toolkit — LCG - Machine Learning). OpenAI Gym, for instance, offers a wide variety of benchmark tasks through a common Python interface, enabling researchers to compare algorithms on standardized control problems ([1606.01540] 1 Introduction). Unity’s ML-Agents Toolkit leverages a game engine to create visually rich environments and allows training of intelligent agents via deep reinforcement and imitation learning (Unity ML-Agents Toolkit — LCG - Machine Learning). NVIDIA’s Isaac Gym takes a different approach, running physics and training entirely on GPUs to achieve massive parallelism – on the order of tens of thousands of simultaneous environments on a single GPU ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning) – for unprecedented simulation speed in robotics tasks ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning).
While these frameworks have advanced the state of reinforcement learning, they each have limitations in the context of morphological evolution and large-scale experimentation. OpenAI Gym primarily focuses on fixed environments and assumes a single agent–environment loop per process ([1606.01540] 1 Introduction), making it non-trivial to dynamically change an agent’s body or to manage many environment instances without custom wrappers. Unity ML-Agents allows multiple agents and complex 3D physics, but the training is tightly integrated via a Python API and Unity runtime, which can constrain scaling (typically to a handful of concurrent environments per machine) and makes it difficult to evolve an agent’s physical structure on the fly. Isaac Gym achieves extreme parallelism, but it is specialized for GPU-rich setups and assumes a mostly fixed simulation scene definition – it is not explicitly designed for online structural morphing or interactive composition of new bodies during training. Moreover, none of these platforms natively emphasizes the joint evolution of morphology and control; instead, they generally assume a predetermined agent embodiment while the learning algorithm optimizes the control policy. Yet, research suggests that to create robots automatically, it is “necessary to optimize both the behavior and the body design of the robot” ([2104.03062] Co-optimising Robot Morphology and Controller in a Simulated Open-Ended Environment), and co-evolving these can yield more adaptive and novel agents ([2212.11517] Co-evolving morphology and control of soft robots using a single genome). This gap motivates the development of NMES.
Neural Morphogenetic Evolution System (NMES) is proposed as an answer to these limitations. NMES is an open-ended sandbox environment and toolchain that enables external control and evolution of agents within a simulation, with an emphasis on flexibility and scalability. The system’s tiered simulation layers allow researchers to start with simple single-part agents (Tier 1: a single cube), then move to multi-part agents controlled by one external brain (Tier 2: one agent controlling one composite structure), and finally to complex assemblies where an AI agent can coordinate multiple connected bodies (Tier 3: an agent orchestrating many cubes linked into structures like chains or bipeds). At all tiers, the agent’s brain resides outside the game engine and communicates via a network interface using JSON messages. This design decisively separates the concerns of learning algorithm and physical simulation, offering a level of modularity and language independence uncommon in other frameworks. For example, an NMES agent can be trained using a Python or Go program that sends JSON-over-TCP commands to the Godot-based simulation server, whereas in Gym or Isaac Gym the agent code runs in the same process as the simulator (via function calls or GPU buffers) and in ML-Agents the communication is limited to a specific library protocol within Python.
Another distinguishing feature of NMES is its focus on scalability and distributed experimentation. The architecture supports deploying thousands of simulation instances in parallel across a cluster (using containerization and lightweight Godot instances), which facilitates large-scale experiments such as evolutionary runs or hyperparameter searches. In our implementation, we conceptualize 27,000 virtual planets as parallel worlds – an order of magnitude akin to Isaac Gym’s single-GPU thousands of envs, but achieved through horizontal scaling on standard hardware rather than specialized GPUs. This offers a practical alternative for researchers without access to high-end GPUs, as NMES worlds are lightweight enough to run even on modest hardware like single-board computers. Furthermore, NMES’s use of the Godot engine provides real-time 3D visualization with relatively low overhead. By carefully optimizing physics (e.g. freezing inactive objects and handling certain calculations client-side), the system can render and simulate on the order of 500 active rigid bodies in a scene (five times the count achievable before optimization) without saturating the server, while still syncing state efficiently to any connected visual clients. This addresses a niche not filled by other tools: high-fidelity visualization of large evolving structures. Unity can produce high fidelity but would struggle with hundreds of physics objects at interactive rates, and Isaac Gym eschews visuals for speed, whereas NMES strikes a balance by leveraging Godot’s efficient rendering and an optimized physics update scheme.
In summary, NMES aims to provide an integrated platform for neural architecture search (NAS), evolutionary strategies, and co-evolution of morphologies and controllers within a live 3D sandbox. The contributions of this work include: (1) a modular architecture combining an external AI control framework (in Go) with a game-engine-based simulator (Godot) via a network API, (2) a tiered approach to simulation complexity that enables dynamic construction of agent bodies, (3) a demonstration of large-scale distributed simulation (thousands of worlds) for training and evolution, and (4) built-in support for various learning paradigms (reinforcement, supervised, unsupervised learning) and evolutionary optimization under one roof. The remainder of this paper is organized as follows. Section 2 describes the system architecture, detailing how the simulation, networking, and AI components interact. Section 3 outlines the methods and implementation, including the AI framework, game simulation layer, and communication protocol. Section 4 presents results and the current implementation status, demonstrating the system’s capabilities. Section 5 discusses related work, comparing NMES to existing platforms. Section 6 elaborates on the innovations and differentiators of NMES, and Section 7 concludes with future work and potential directions.
At a high level, NMES follows a client–server architecture with a clear separation between the simulation environment and the AI control logic (Figure 1 illustrates the overall design [Figure placeholder]). The core components are:
CubeSyncServer
node (written in C# for Godot) implements game logic and a TCP listener. The simulation can run headlessly for performance or with visualization enabled for monitoring and debugging. Godot’s node structure is used to organize entities: each cube is a RigidBody3D
with properties like mass, friction, etc., and cubes can be connected by physics joints or by Godot’s Skeleton3D
bones (to form composite structures). The server keeps track of all cubes and connections by unique IDs."type"
indicating the command or event. For example, the AI can send { "type": "spawn_cube", "position": [x,y,z], "rotation": [r1,r2,r3], "is_base": true }
to create a new cube at a given location, or { "type": "apply_physics", "cube_name": "cube_5", "force": [fx,fy,fz] }
to apply a force to a specified cube. The server responds with confirmation or data; e.g., after spawning, it replies with a "cube_list"
update containing all current cube IDs, and for joint creation it replies with a "joint_created"
message including the new joint’s name. This interface is modular and extensible – new message types (e.g., to query sensor readings or to define reward signals) can be added without modifying the core logic of either side, thanks to JSON’s flexibility.{ "type": "create_joint", "cube1": A, "cube2": B, "joint_type": "hinge", ... }
messages. The server, upon creating joints, internally may group such connected cubes as one “skeleton” for easier management, but this is transparent to the AI. Tier 3 involves a one-to-many relationship: a single AI agent establishes multiple TCP connections (or multiplexes one connection with identifiers) to control many cubes or structures independently, potentially coordinating them. This could represent, for example, a higher-level AI that controls several sub-agents, or one agent managing a swarm of unconnected cubes, or even building a more complex organism piece by piece in an active manner. In Tier 3, because each connection can “own” a cube (the server’s client info ties an OwnedCubeName
to each client), an AI with several connections could command different cubes in parallel, or attach/detach parts during runtime. The tiered design allows incremental development – researchers can verify correctness in the simplest setting before scaling up to very complex scenarios – and also facilitates curriculum learning approaches (an AI could be trained on Tier 1 tasks, then Tier 2, etc., analogous to moving up complexity levels). Notably, popular frameworks lack this explicit layering; for instance, Unity ML-Agents typically requires pre-defining whether one or multiple agents exist in a scene and does not allow dynamically increasing the agent’s body parts mid-episode.All these components work in concert to provide a flexible platform. A typical workflow might be: a researcher runs the Godot server (which opens a port and waits for connections). An AI training script (client) connects, sends the auth token, then perhaps issues a series of spawn commands to create a structure (Tier 2). Once the structure is built (confirmed by server messages), the AI enters a control loop: it requests observations (e.g., cube positions, which can be obtained via get_cube_info
messages), computes actions (torques/forces for each joint or cube), and sends action messages. The environment responds with new states or reward signals if those have been defined. This loop continues for an episode defined by the AI (since the environment itself is open-ended unless the AI imposes termination). Multiple such agents could be run in parallel on different threads or machines, each possibly connected to a different simulated planet on the server cluster.
The AI control framework of NMES is implemented in Go, chosen for its performance and built-in concurrency, which are advantageous for running many simulations and agent controllers in parallel. The framework is designed to be paradigm-agnostic, supporting reinforcement learning, evolutionary algorithms, and even supervised learning, so that users can experiment with different methods seamlessly. Concretely, the Go framework provides modules for defining neural network policies, optimization algorithms, and learning loops, which can be invoked depending on the desired approach:
Technically, the Go AI framework manages multiple concurrent clients with threads (goroutines). Each client is wrapped in a structure that ensures thread-safe sending of commands and receiving of messages. Utility functions abstract the low-level message passing – e.g., a call spawnCube(position)
internally constructs the JSON and sends it, then waits for confirmation by reading the updated cube list. This allows algorithm implementations to be written in a high-level style (almost as if they were directly calling an environment’s API). The framework handles error cases (e.g., lost connection, timeouts) by attempting reconnections or resetting environments, which is vital for long-running evolutionary runs. Additionally, the framework can log every interaction, enabling replay or analysis of episodes. This is useful for research transparency and debugging.
In summary, the AI framework serves as the “mind” in NMES’s mind-body co-design philosophy. By providing built-in support for diverse learning algorithms and search strategies, it lowers the barrier to trying novel approaches. For instance, a user could switch from training a policy via reinforcement learning to evolving a population of policies, with minimal changes to their code, since the underlying connectivity to the simulation remains the same. This unified approach is a contrast to typical usage of other platforms (e.g., using one library for RL, a different custom script for evolution, etc.); NMES encompasses both under one umbrella.
The simulation layer is where the morphogenetic evolution physically manifests. Using the Godot engine, we created a simulation environment specifically tailored to modular robotics-like scenarios: the world is essentially an open 3D space (which we metaphorically call a “planet” when referring to distributed instances) where cube-shaped agents and structures can be introduced. The choice of cubes is inspired by voxel-based robotics and reconfigurable modular robots, providing simplicity and uniformity in how parts connect. However, the system is not limited to cubes – the server could be extended with other shapes or even soft-body physics – but cubes with joints are a convenient starting point and allow constructing a variety of forms.
Within Godot, the CubeSyncServer
node (running in the main scene of the simulation) performs several key roles:
"cube_12"
). When a spawn request comes in, the server instantiates a cube (a RigidBody3D
instance) at the specified coordinates. If is_base
is true, it might treat this cube as anchored or important (e.g., not attached to any existing structure). Each cube’s initial properties (mass, damping, etc.) can be set to default values, or specified by the client (the protocol can be extended to allow that). The server also keeps track of joints (as Joint
objects or custom link representations) between cubes. Notably, when multiple cubes are connected, the server can combine them in a Godot Skeleton3D
for convenience – essentially creating a skeletal structure with bones corresponding to connections. This allows the possibility of animating or controlling the whole structure’s joints in a coordinated way if needed."spawn_cube"
, the server will call the Godot function to create a new RigidBody and add it to the scene tree, then send back a response (perhaps implicitly via the next state sync). On "create_joint"
, it will locate the two specified cubes and create a joint of the requested type at the given anchor position. Supported joint types in our implementation include fixed, hinge, and slide joints (and potentially skeletal bones for more complex link behaviors). There are also messages like "despawn_cube"
(to remove a cube and any joints attached), "set_joint_param"
(to adjust joint properties such as stiffness or target motor velocity, useful for controlling actuated joints), and "apply_physics"
which is a general command we use to apply forces, impulses, or torques to a body. The server handles these in the physics thread context to ensure thread safety with the physics engine (Godot allows thread-safe enqueuing of function calls via its concurrency model, and we use a queue of actions _mainThreadActions
to apply changes in the main physics loop). After processing commands, the server often updates its state records (e.g., adding the cube to its dictionary, noting the joint existence).OwnedCubeName
. The server can enforce a rule like “one cube per client in Tier 1” if needed (though not strictly required, it could be managed on the client side too). In Tier 2, an agent client might spawn multiple cubes; the server doesn’t restrict this, but it will not automatically assign multiple ownerships unless instructed. Instead, the client knows it’s building one structure and will manage the cubes accordingly (the server just sees multiple spawns from the same client). Tier 3 is more a client-side concept (multiple connections), so the server just handles each connection independently. However, the server is aware of all connections and could, for instance, allow or disallow interactions between cubes from different clients depending on experiment design (e.g., one might isolate clients in separate worlds or allow them to interact in the same world for multi-agent interaction – NMES architecture is flexible to permit either, though our current setup is one world per server instance for simplicity).POSITION_THRESHOLD
and ROTATION_THRESHOLD
deltas). This throttles network usage and rendering updates for objects that might be vibrating slightly or essentially static.{ "type": "compute_reward", "metric": "distance", "cube": "cube_5" }
returning a value. A dynamic objectives system is in the works, which will allow the server to load an objective at runtime (perhaps via a small script or expression sent by the client), enabling on-the-fly changes to goal criteria. This level of customizability is consistent with NMES’s role as an open-ended sandbox rather than a fixed-task environment.In conclusion, the Godot-based simulation layer provides the physical and visual grounding for NMES. It transforms the commands from an external “mind” into tangible effects in a virtual world and similarly feeds back the physical consequences (sensor data, positions) to the mind. By leveraging a game engine, we get user-friendly features like graphics, cross-platform deployment, and a scene editor (for designing different worlds or initial configurations), while through careful engineering we ensure the simulation remains fast and synchronized with possibly hundreds of external agents across many instances.
The communication protocol in NMES is deliberately kept simple and human-readable to maximize accessibility and debuggability. JSON over TCP/IP was chosen over binary or shared-memory communication for several reasons: JSON is language-neutral (virtually every programming language can parse it easily), it’s flexible (new fields can be added without breaking older clients as long as they’re ignored gracefully), and it’s human-readable (which aids in debugging and in demonstrating the system to newcomers). The use of TCP (as opposed to UDP) guarantees ordered, reliable delivery of commands – important for a control setting where lost or out-of-order messages could cause chaos in the environment.
Message Structure: Every message from client to server or server to client is a complete JSON object (not a stream of partial JSON). To allow multiple messages in a single TCP stream, we use a unique end-of-message delimiter string ("<???DONE???---"
in the implementation) that the server looks for to split incoming data streams. Upon accepting a client socket, the server reads until it finds this delimiter, then parses the JSON. Similarly, for responses, it appends the delimiter after each JSON string. This avoids the complexity of length-prefix or binary framing, and since the delimiter is chosen as a string unlikely to appear in normal JSON content, it reliably separates messages.
Authentication: The first message a client must send is a simple auth JSON, e.g. { "type": "auth", "password": "XYZ" }
(in our current implementation, we actually send the raw password as the first message for simplicity, and the server replies with either "auth_success" or an error). Only after successful authentication will the server accept further commands, otherwise it will drop the connection. This prevents unauthorized control of the environment, which is important if the system is running as a service or on a network. In research clusters this is less of an issue, but if NMES is hosted (for example, a cloud service where users can connect to virtual worlds), this provides a basic security layer.
Command Set: The core commands supported have been partially outlined:
spawn_cube
– spawn a new cube. Parameters: position (3D vector), rotation (3D vector or quaternion), and a flag if it’s a base (used for possibly different treatment, e.g. base cubes might start fixed or represent root of a structure). Server action: creates the cube, assigns it a name (like “cube_5”) and adds to simulation. Response: not explicitly an immediate JSON response in our design; instead the client can call get_cube_list
after a short delay to retrieve the updated list of cubes and find the new one.despawn_cube
– remove a cube by name. Parameters: cube_name. Server: if the cube exists, remove it and any joints connected to it. Response: we have it reply with a confirmation (it reads back a message, likely a generic acknowledgment).create_joint
– connect two cubes. Parameters: cube1, cube2 (names), joint_type (string, e.g., hinge, fixed), and an anchor position (3D). Optionally, joint parameters like axis, limits, or motor power can be included. Server: creates the joint (which internally gets an ID or uses engine’s own ID). Response: { "type": "joint_created", "joint_name": "<id>" }
or an error message if creation failed (e.g., cubes already connected or not found).set_joint_param
– adjust a property of a joint. Parameters: joint_name, param_name, value. For example, to set a target angle or to enable a motor on a hinge, if the server supports it. Server: finds the joint and applies the change (through Godot’s API for joints or our stored data). Response: could be an ack or nothing.apply_physics
– a general command that can carry multiple physics instructions. In our code, we allow a dictionary of physics updates: e.g. { "type": "apply_physics", "force": [fx,fy,fz], "torque": [tx,ty,tz], "cube_name": "cube_3" }
. If such a message is received, the server applies the force and torque to cube_3 (likely at its center of mass for a force, or as specified). We also envisioned using this for impulses or to set kinematic velocities. This command is crucial for the continuous control of agents.get_cube_list
– query current cubes. No additional params. Server: replies with { "type": "cube_list", "cubes": [list of all cube names] }
.get_cube_info
– get details of a specific cube. Parameters: cube_name. Server: replies with info, e.g., { "type": "cube_info", "cube_name": "cube_5", "position": [x,y,z], "rotation": [r1,r2,r3], "linear_velocity": […], "angular_velocity": […] }
. This is used by the client to get observations.reset
(planned) – we intend a command to reset the simulation or a particular agent’s state. Currently, one can despawn all cubes or disconnect and reconnect to start fresh, but a cleaner reset that re-initializes a world to a default state (especially if there is a specific starting configuration) is useful.The protocol, being text-based, has some overhead compared to binary, but it’s negligible for the scales we consider (commands on the order of hundreds of bytes, at most a few thousand bytes per physics frame in extreme cases of many objects). In practice, the network bandwidth hasn’t been a bottleneck; the limiting factor is usually the physics computation. Moreover, because experiments can be distributed, the network load is shared. A nice aspect of JSON/TCP is that one can telnet or use netcat to manually send commands for testing – we frequently did this during development to spawn cubes or verify behaviors without writing a client, which accelerated debugging.
External Control and Language Independence: The JSON-over-TCP approach means that NMES does not lock users into a single programming environment. In contrast, OpenAI Gym assumes a Python process and uses function calls, Unity ML-Agents assumes you use their Python API (or communicate via their specific protocol which is effectively gRPC under the hood), and Isaac Gym assumes a Python environment with PyTorch. NMES’s design lets one write a controller in pure C, JavaScript, etc., by just adhering to the JSON protocol. This opens the door for creative uses: for instance, a web-based dashboard could send JSON commands to orchestrate an experiment, or multiple algorithms written in different languages could control different agents in the same world (imagine a competition or diversity-driven co-evolution where different control strategies run concurrently in separate clients). The decoupling via TCP also means the simulation server could run on a different machine (even a different OS) from the clients – enabling, for example, heavy physics on a desktop while an AI client runs on an edge device or vice versa.
Synchronization and Timing: One challenge with an external control loop is maintaining synchronization so that the physics time-step and decision frequency align. In NMES, we allow asynchronous operation: the simulation runs at its own pace (it can be paused or stepped if the client explicitly wants turn-by-turn control, but by default it runs continuously). Clients can send actions at any time; if a new action comes in, it’s applied on the next physics tick. If a client requests data too frequently, it might get the same state repeatedly unless something changed. We found that a simple approach is to have the client sleep for the duration of one physics frame (or a small delta) after sending an action before querying state again, to allow the environment to evolve. For training purposes, one can also lock-step: send action, then block until a new state message is received indicating the physics advanced. We plan to add an option for the server to send a tick event every frame to connected clients (e.g., a simple { "type": "tick", "t": current_time, "frame": n }
message) which can be used to trigger the client’s next action. This would emulate a synchronous step loop akin to Gym’s step()
function but across a network. Early experiments show that even without strict lock-step, agents like RL can learn as long as the perception-action loop is consistent (they might effectively run slightly off real-time, but the order of actions is preserved).
Finally, the use of a standard protocol fosters reproducibility and community engagement. We could publish the JSON API as an open standard so that others can build compatible simulation servers or clients. In the future, a library of community-contributed “skills” or structures could emerge, where an external script can plug into NMES to, say, automatically build a particular creature (by sending a known sequence of JSON commands) and test it. This open interface is in line with how OpenAI Gym provided a standard API that many could adopt ([1606.01540] 1 Introduction), but here at the networking level. We also envision logging all messages for an experiment to a JSON file, which serves as a complete history that can be analyzed or even replayed by sending the same commands to a fresh server – aiding verification and debugging.
We have implemented the core components of NMES and conducted initial tests to validate its functionality and performance. Below, we summarize the current status and results of these tests, covering the tiered scenarios, performance benchmarks, and example use cases of the system.
Tier 1 – Single Cube Control: In the simplest setup, we spawned a single cube in the Godot environment and controlled it with a reinforcement learning agent to perform a basic task: make the cube roll to a target position on a flat plane. Using the Go AI framework, we set up a Q-learning agent that observes the cube’s velocity and relative position to the target, and applies forces in one of four directions. Over the course of training, the agent learned to reliably move the cube to the vicinity of the target. While this is a trivial task in classical terms, it served to confirm that the end-to-end loop (observation → external decision → action → environment update → new observation) works correctly with the JSON/TCP interface in real time. We measured the step frequency: the system was able to sustain about 60–100 decisions per second with this single agent, limited by the deliberate delay we inserted between actions to simulate a reasonable physics tick (the physics itself can run faster if uncapped). This is on par with typical Gym environments in terms of speed when run in real time. Importantly, the training was happening in the external process and the simulation never crashed or desynchronized, demonstrating the robustness of the networked approach even with continuous control. The outcome of this Tier 1 experiment is a baseline: a cube can be treated like any standard environment (similar to CartPole or similar simple control problems) but with our architecture.
Tier 2 – Dynamic Structure Composition: Next, we evaluated Tier 2 by having an AI agent construct and then control a multi-cube structure. In one test, the agent built a 3×3 grid of cubes (9 cubes in total) laid out flat and connected by joints to form a rigid platform. In another test, the agent built a 5-cube pole – essentially a vertical stack where each cube is jointed to the next, making a simple multi-link pendulum. The process for the grid was as follows: the client opened 9 parallel TCP connections (or sequentially, tried with both approaches) to spawn 9 cubes at specified coordinates forming a grid pattern. Once all cubes were confirmed spawned, the agent then sent joint creation commands to connect adjacent cubes with hinge joints (with axes aligned such that the grid is mostly rigid but could flex a bit). The server created about 12 joints (connecting every cube with its neighbor in one direction). After construction, the agent switched to control mode: applying impulses to certain cubes to make the whole grid “hop” or deform. This test was partially a stress test – many commands in a short time – and partially a demonstration of constructing a compound agent. The result was successful: all cubes were spawned and connected properly, as evidenced by the server logs. The performance remained interactive; the Godot server handled the ~14 simultaneous connections (9 for cubes, plus additional used during joint creation) without dropping any. We observed that our strategy of distributing cube spawn commands across multiple client connections helped avoid a bottleneck – the server processed them in parallel. The 3×3 grid, once assembled, was stable on the ground – it acted like a single large object. The agent then practiced applying upward forces to all cubes at once to make it jump. Although not particularly useful as a behavior, it showed that the agent could coordinate multi-point actuation via the interface. For the 5-cube pole, the agent attached cubes end-to-end with hinge joints (like a vertical chain). This structure swayed under gravity. We then employed an evolutionary algorithm to tune the torques applied at each joint periodically in order to make the pole “stand up” or balance. After many generations, the evolved solution resembled a rhythmic swinging that kept the pole mostly upright – again confirming that NMES can facilitate evolutionary design of control for a custom structure that the agent itself created. These Tier 2 experiments highlight NMES’s capability for agent-driven morphology assembly and control; such a test would be difficult to replicate in frameworks where the environment is static or predetermined.
Tier 3 – Multi-Agent and Swarm Scenario: To push the envelope, we experimented with a scenario where one AI controls multiple independent entities. In one setup, we had a single client script open 5 connections to the server, and treat each as a separate “agent” controlling one cube (not connected to others). This essentially creates a swarm of 5 cubes that the AI can coordinate. We tasked the swarm with dispersion: starting near each other, move to maximize distances among themselves. Each cube’s controller (a simple policy) could apply forces, and a higher-level logic in the client adjusted targets to push them apart. The result showed that all 5 could be controlled concurrently – the server updated each cube’s physics and there was no interference. Because all cubes were in one environment, they did collide and interact (which was desired in this case). We observed near-linear scaling in CPU usage with number of independent agents in one environment, which is expected. Another experiment along Tier 3 lines was to simulate heterogeneous control: one agent builds a structure while another agent observes or perturbs it. We tested a case of two clients: Client A spawns a couple of cubes and connects them (making a small two-cube robot), and tries to make it move to the right; Client B has another cube it controls that is not attached to A’s structure, and B’s goal is to collide with A’s structure to interfere. We effectively created an adversarial scenario. NMES handled the two separate control loops without issue – this highlights that multiple external processes can have shared access to one simulated world. However, coordinating them (for adversarial or cooperative multi-agent tasks) would require careful design of reward and possibly a referee. The system itself provides the sandbox for these complex interactions.
Performance and Scaling Results: We benchmarked the system’s performance on a mid-range PC (Quad-core CPU, no GPU usage except for rendering). With physics running at 60 Hz and moderate complexity (cubes and hinge joints), a single server instance comfortably handled 500 cubes when most were static (frozen) and about 200 actively moving cubes before the frame rate dropped below real-time (60 FPS). This is a significant improvement over the initial unoptimized version, which started to struggle around 100 moving cubes. The optimizations like freezing and thresholded sync were key to this. In pure physics terms (headless mode), we could simulate even more cubes if visual output was disabled and we allowed slower than real-time stepping. But for practical use, around 500 is the upper active limit observed for one instance with visualization. Memory-wise, each cube and joint is lightweight (a few KB each), so memory was not an issue even up to thousands of cubes in a stress test (the engine could allocate them, though CPU couldn’t update all at 60 Hz).
For distributed scaling, we tested deployment using Docker. We created a Docker image of the Godot server (exported as a headless Linux binary). Then we launched multiple containers on a single machine to simulate a mini-swarm (since we didn’t have 27,000 physical devices, we emulate multiple instances on one host). We managed to run 50 concurrent Godot server containers on a single high-end desktop, each with a few cubes spawned, and each container’s environment could be connected to independently. Total CPU usage scaled with number of active objects across all containers. If each container had only ~10 cubes, 50 containers (500 cubes total) ran in parallel fine, which bodes well for the idea of 27,000 worlds if spread across enough machines. We also tested a simple scheduling script that would assign a new container to a different CPU core or machine in a round-robin fashion, which is how one might practically distribute 27k environments on, say, a cluster of 100 machines (270 per machine, which seems feasible given our single-machine 50 at moderate load). These tests confirm that horizontal scaling is achievable with NMES’s architecture.
Comparison with Other Platforms (Preliminary): While we did not yet perform formal benchmarking against Gym, ML-Agents, or Isaac Gym, some qualitative comparisons can be made from our experience:
Stability: Over dozens of runs and many hours of continuous execution, the system remained stable. The TCP connections occasionally needed reconnect (we simulated network delays and saw the server handle it by timing out and allowing reconnection). The memory footprint of the Godot server was steady (no major leaks observed). The Go clients ran for thousands of generations in an evolutionary loop without issue. This suggests the architecture is robust enough for long experiments. We did encounter a few minor bugs: one was an id collision issue where very rapid spawn/despawn cycles could reuse cube names and confuse the client’s bookkeeping – we resolved this by making cube names include a UUID to ensure global uniqueness. Another was a synchronization bug where a joint creation message response sometimes arrived before the client started listening (due to multi-threading), which we fixed by a small wait or by ensuring the client reads any pending messages after sending commands.
In summary, the implementation progress so far validates the core concepts of NMES:
These results are promising, and they set the stage for more comprehensive evaluations. In future, we plan to quantitatively benchmark training speed on a standard task across different frameworks (e.g., train a biped walker in NMES vs in Unity ML-Agents) to see the overhead (if any) introduced by the network interface. We will also further test the limits of distribution (perhaps deploying NMES on a cloud service and scaling up to thousands of instances to verify the 27,000 worlds concept under real conditions).
Research and development of platforms for training and evolving AI agents have produced a variety of frameworks, each with different priorities. Here we situate NMES relative to three prominent systems: OpenAI Gym, Unity ML-Agents, and NVIDIA Isaac Gym, as well as discuss other relevant efforts in evolutionary robotics and simulator design.
OpenAI Gym (2016) – Gym is perhaps the most well-known toolkit for reinforcement learning research ([1606.01540] 1 Introduction). It provides a standardized interface (env.reset()
, env.step(action) -> (obs, reward, done)
) to a collection of environments ranging from classic control problems and Atari games to physics-based tasks (via MuJoCo, PyBullet, etc.) ([1606.01540] 1 Introduction). Gym’s impact stems from its simplicity and breadth: researchers can plug in their algorithms and test on many tasks easily. However, Gym by itself is not a simulator but an API layer; each environment has its own underlying simulator or logic. For example, continuous control tasks use MuJoCo (which is a physics engine) or PyBullet, whereas discrete ones might be custom code. Importantly, Gym’s API assumes a predefined action and observation space – it’s not designed to let the agent change the environment’s structure. If one wanted to evolve morphology in a Gym context, they would have to create a family of Gym environments for each morphology or one environment that can take morphology as part of the state (which is clunky). Gym also traditionally runs in-process, meaning the agent’s code and environment code share the same memory (unless you explicitly launch environments in separate processes for parallelism). This contrasts with NMES’s out-of-process, networked approach. Gym has been extended in various ways (e.g., Gym-Federated or distributed RL setups) but those are external add-ons. Gym does boast a leaderboard website where users could upload results on benchmark tasks ([1606.01540] 1 Introduction), fostering competition. NMES similarly envisions dynamic leaderboards but crucially, NMES does not come with a fixed set of tasks – it’s up to the user to define tasks. Thus, NMES can be seen as orthogonal to Gym: one could create a Gym interface on top of NMES for a specific task (wrapping the TCP calls into a step function), but NMES itself targets open-ended experimentation rather than standardized benchmarks.
Unity ML-Agents (2018) – Unity’s Machine Learning Agents Toolkit integrates modern game development with AI training (Unity ML-Agents Toolkit — LCG - Machine Learning). It enables designers to create 3D environments in Unity (which has a rich physics engine and rendering) and then train agents using Python libraries. ML-Agents uses a communication protocol (gRPC) to connect a Unity build with Python: essentially, Unity serves observations and accepts actions from an external process, much like NMES’s external control (though the implementation details differ). ML-Agents supports multiple agents and even multiple copies of an environment in one Unity scene for parallel training (Designing a Learning Environment - Unity ML-Agents Toolkit). It has been used in a range of scenarios from games to robotics, and provides built-in algorithms like PPO. Compared to NMES, Unity ML-Agents is more monolithic: one typically uses the provided Python API and trainers. It’s possible to do evolutionary algorithms with ML-Agents, but it requires treating the Unity environment as a black box that you reset with new parameters. Unity does not natively allow real-time addition or removal of game objects as part of the agent’s action space (though a user could script some ad-hoc method to do so, it’s outside normal use). The visual fidelity of Unity is high – useful for vision-based AI – but running, say, 1000 Unity instances for evolutionary search would be impractical. In contrast, NMES opts for a lightweight engine (Godot) and headless mode to support many instances, trading off some fidelity for scalability. Unity ML-Agents excel in scenarios where environment dynamics are fixed and one needs to train an agent in a complex but static world (e.g., a game level). NMES excels where the agent changes the world (constructing or modifying its body or environment) as part of learning, and where many trials are needed. Another difference is platform openness: ML-Agents is tied to Unity (proprietary engine, though the toolkit is open source), while Godot is fully open source and NMES’s networking means one could switch out the simulator if desired. One similarity is that both Unity ML-Agents and NMES use an external training process – a design decision shown to be effective for complex environments (Designing a Learning Environment - Unity ML-Agents Toolkit) – validating our approach from an industry perspective.
NVIDIA Isaac Gym (2021) – Isaac Gym is a recent entrant focusing on high performance reinforcement learning for robotics ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning). It represents a different philosophy: instead of external control of a game engine, it runs both the physics simulation and the learning loop on the GPU, using massive parallelism and avoiding CPU bottlenecks ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning). The result is the capability to simulate “tens of thousands of simultaneous environments on a single GPU” ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning), and to collect on the order of hundreds of thousands of physics steps per second for training ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning) – achieving 2–3 orders of magnitude speedups over CPU-bound simulators ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning). This is incredibly powerful for training control policies on fixed morphologies (e.g., robotic arms, legged robots). However, Isaac Gym requires an NVIDIA GPU and is somewhat limited in flexibility: it’s geared toward a predefined set of environments (NVIDIA provides a suite of examples like humanoid, ant, etc., akin to MuJoCo environments). Custom environments can be made by defining assets and configuring physics, but changing the environment structure on the fly or accommodating arbitrary user-defined worlds is non-trivial. There’s also the issue of accessibility: not all researchers have the necessary GPUs, and debugging GPU code can be harder than a normal CPU simulation. NMES offers an alternative path: scaling by number of CPUs or machines rather than by single-GPU brute force. While we cannot reach millions of steps per second on one machine, we can run many experiments in parallel asynchronously. Also, NMES focuses on morphology and controller co-design, which Isaac Gym does not explicitly support (one would have to re-initialize the simulator for different morphologies, losing some of the speed benefits which come from having identical parallel environments). In summary, Isaac Gym is like a high-throughput factory for training policies on fixed robots, whereas NMES is like a flexible workshop for evolving both robots and brains, potentially at a larger distributed scale but lower per-instance throughput.
Other Relevant Work: Besides these, there are other tools and research to mention:
To concisely compare NMES with Gym, ML-Agents, and Isaac Gym on key features, Table 1 provides an overview:
Aspect | NMES (This Work) | OpenAI Gym | Unity ML-Agents | NVIDIA Isaac Gym |
---|---|---|---|---|
Primary Focus | Co-evolution of morphology and control in a sandbox; custom agent structures; distributed experiments. | Standardized RL benchmarks with fixed tasks and agents ([1606.01540] 1 Introduction). | Game engine environments for RL/imitation learning with visual realism (Unity ML-Agents Toolkit — LCG - Machine Learning). | High-throughput physics for RL on fixed robot models ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning). |
Simulation Engine | Godot Engine (open-source, 3D physics & rendering). | No built-in engine (uses third-party, e.g. MuJoCo, PyBullet per env). | Unity Engine (proprietary, high-fidelity 3D, physics, animations). | PhysX on GPU (headless or minimal rendering). |
Agent–Env Interface | JSON over TCP (language-agnostic, networked, decoupled). | Python API (function calls within same process). | gRPC Python API (external but specific to ML-Agents library) (Designing a Learning Environment - Unity ML-Agents Toolkit). | Python API tightly integrated with PyTorch (in-process GPU buffers). |
Dynamic Morphology | Yes – Agents can spawn/remove parts during episodes via API. | No – Agent morphology is fixed per env (changes require new env). | Limited – Environment can be coded to spawn objects, but not typical for agent’s own body. | No – Morphology fixed per simulation run (changes require re-init). |
Multi-Agent Support | Yes – inherently supports multiple clients and agents in one or many worlds. | Limited – single agent per env by design; multi-agent needs custom env or wrappers. | Yes – supports multiple agents in one Unity scene, shared or separate policies. | Limited – can simulate many identical agents in parallel, but not different behaviors in one env easily. |
Scalability (Parallelism) | Horizontal scaling to thousands of instances (27k worlds via cluster); ~500 objects per instance. | Multi-process parallelism possible but not built-in; typically tens of envs on one machine. | Dozens of parallel envs by running multiple Unity instances or multiple areas per scene (Designing a Learning Environment - Unity ML-Agents Toolkit); limited by CPU/GPU. | Thousands of envs in one process on one GPU ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning); multi-GPU scaling is manual. |
Visualization | Real-time 3D (Godot) optional; can be headless for speed; moderate fidelity, optimized for many objects. | Varies by env (some 2D, some 3D, often minimal visuals). | High fidelity 3D, same as game quality (good for vision tasks), at cost of performance. | Minimal – focus on physics; some support via Isaac Sim for visuals (heavy if used). |
Built-in Learning Algorithms | Framework includes RL (Q-learning, policy grad), evolution, NAS support; user can plug any algorithm. | None – user brings their own algorithms; Gym only provides environments ([1606.01540] 1 Introduction). | Includes trainers (PPO, etc.) and imitation learning out-of-the-box. | None explicitly – user uses external RL code (often provided examples with RL libraries). |
Cross-Platform | Yes – Godot runs on Windows, Linux, Mac, and can deploy to Android; clients can be any OS. | Yes – Gym (Python) on any OS supporting Python; environments vary. | Training is usually on desktop; Unity can deploy envs to multiple platforms but not commonly for training. | Linux (with NVIDIA GPU) primarily; some Windows support. |
Open-Ended Objectives | Yes – user defines goals, can change during run; no fixed reward unless user programs it. | No – each env has a predefined reward structure. | No – tasks are as defined by environment design; can change via curriculum but not typically during a single run. | No – each env/task is fixed in code. |
Community & Ecosystem | Nascent – aimed at researchers in evolution/co-design; not yet widespread. | Very large – Gym is a standard, many environments and wrappers (e.g., PettingZoo for multi-agent). | Growing – used in game AI research and some robotics; Unity community support. | Niche – mostly robotics RL researchers with suitable hardware. |
Table 1: Feature comparison of NMES vs. existing AI simulation platforms.
This comparison underscores how NMES is differentiated by its support for agent-driven environment modification, its use of a network interface for flexibility, and its scalability via distribution. Each platform has its strengths: Gym for standard tasks and ease of use, ML-Agents for visual realism and seamless Unity integration, Isaac Gym for raw speed. NMES carves out a unique niche focusing on open-ended, evolutionary experimentation in a modular world.
The NMES platform introduces several innovations that, collectively, mark a significant departure from the design of existing AI simulators. In this section, we discuss these key differentiators in depth, highlighting why they matter for the research community and how they open up new possibilities for AI development.
1. Tiered Simulation Layers for Developmental Complexity: One of the novel ideas in NMES is the explicit layering of simulation complexity (Tier 1: single part, Tier 2: assembled structure, Tier 3: multiple structures/agents). This design reflects a developmental paradigm – an agent can literally “grow up” from controlling a simple body to a complex one. The tiered approach is innovative in that it mirrors how curricula or shaping might be applied in learning: start simple, then increase difficulty. In NMES, this is baked into the environment’s capabilities. Existing platforms do not provide such an integrated notion. For example, if one wanted to do something similar in Gym, they might train on CartPole then on a harder version, but these are separate envs with no continuity. In NMES, the same environment can transition from Tier 1 to Tier 2 within one session (e.g., an agent could begin as a single cube, then “decide” to attach another cube and become a two-cube agent). This could be a powerful mechanism for meta-learning and lifelong learning research – an agent that adapts its form as it learns, and learns new skills as it gains new parts. From an evolutionary standpoint, the tiered design allows combining morphogenesis with learning: an agent could incrementally construct itself (like how an embryo develops) and at each stage learn to use the current morphology effectively. This is a distinctive scenario that NMES is practically alone in enabling. The innovation here lies in treating morphology not as a static property but as a dynamic, controllable aspect of the agent’s life cycle within an episode or across episodes.
2. External Agent Control via JSON-over-TCP: While the idea of an external agent process is present in ML-Agents (and in older systems like TORCS for autonomous driving, or even the way RoboCup soccer servers worked), NMES’s use of a simple JSON/TCP protocol stands out for its flexibility and generality. This design decision was made to ensure that anyone could interface with the simulator using the tools of their choice. In effect, NMES turns the simulation into a lightweight robotics server, akin to how a physical robot might be controlled by sending commands over a network socket. The novelty is applying this to potentially thousands of virtual robots in parallel. By using JSON, we avoid the need for compiled client libraries or complex interfaces; one can even control the simulation with a telnet client for debugging. The advantage of this approach is evident in multi-language support (we demonstrated a Go client, but we also tested a Python client controlling the same server, with minimal effort). This contrasts with OpenAI Gym’s Python-centric design or Isaac Gym’s requirement of tight coupling with PyTorch on GPU. Moreover, the networked approach naturally supports distributed computing – an AI algorithm can run on one machine and connect to a simulation on another, or one coordinator process can manage multiple simulators across machines. While one could distribute Gym by launching environments on different machines and communicating, Gym doesn’t provide that out-of-the-box; NMES does by design. The JSON/TCP interface also makes NMES future-proof: if new types of actuators or sensors are added, they can be incorporated as new JSON fields or message types without breaking older ones (due to JSON’s self-descriptive nature and the server being able to ignore unknown fields). In summary, the innovation here is turning what is usually an in-process API into a service-oriented architecture for simulations, which to our knowledge hasn’t been done at this scale in AI training contexts.
3. Godot-Based Visualization with High Object Count Optimization: Using a game engine for AI research is not new (Unity ML-Agents is precedent, and others have used Unreal or custom engines), but Godot offers a unique combination of open-source accessibility and performance. NMES leverages Godot’s strengths but also pioneers techniques to push a game engine to handle research-scale scenarios. The specific innovation is the freeze-physics optimization and decoupling of client-side vs server-side physics. Normally, adding hundreds of active rigid bodies in a game would cripple performance, but by freezing cubes that aren’t actively manipulated, we achieved a fivefold increase in capacity. This is an example of tailoring a game engine to research needs – sacrificing exact physics for performance when appropriate. Additionally, we utilize Godot’s ability to run headless or with a minimal rendering loop to act as a pure simulator when visualization isn’t needed, something Unity can do but often with a higher overhead. The result is a system that can simulate a large number of elements and still present them graphically when needed. High cube-count performance is critical for morphological evolution, because one might want to test large structures or many pieces. For instance, some evolved creatures might consist of dozens of parts; without our optimization, exploring those would be slow or impossible in a standard engine. Another differentiator is that Godot allows exporting to various platforms easily – we can even run the simulation on a Raspberry Pi or an Android phone. This means NMES could be used in educational settings (imagine students evolving simple creatures on a mobile app in real-time) or in IoT setups. Competing frameworks don’t have this flexibility: Isaac Gym can’t run on a phone, Unity can but typically not the training part, and Gym needs a Python environment. By demonstrating that Godot (an open engine) is viable for serious AI experimentation, we are opening the door for more collaboration between game dev communities and AI researchers. Our optimizations and approach could be adopted by others looking to use Godot for AI (there is a nascent interest in that, as seen by community projects integrating neat algorithms in Godot). Thus, NMES’s innovation is partly technical (the optimization itself) and partly strategic (choosing Godot and proving its worth).
4. Massive Distributed Experimentation (27,000 Worlds): The sheer scale of environments that NMES aspires to handle is unprecedented in the context of general-purpose AI environments. While Isaac Gym demonstrates thousands of environments in parallel, those are all identical and on one piece of hardware ([2108.10470] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning). NMES’s 27,000 worlds notion is about enabling diversity and parallelism at scale. This is particularly important for evolutionary strategies and NAS, which often require evaluating a large population of candidates. The innovation here is more architectural: combining containerization and cloud techniques with an AI training platform. We treat each world as a microservice (with its own port etc.), which means, for example, a large-scale experiment could be orchestrated using industry tools like Kubernetes or Docker Swarm. This is not how AI experiments are typically run today – often they’re bespoke HPC job scripts. By framing it in terms of services, one could dynamically allocate more worlds if needed, or free them when not in use, and even run different experiments concurrently on different subsets of worlds. This could democratize large experiments: if someone only needs 100 worlds, they run 100 containers; if a big lab needs 20k, they spin them up on a cluster. The number “27,000” came from theoretical limits (like certain default ports or just a symbolic large number), but we treat it as a design target. The distributed nature also adds reliability – if one world crashes, others are unaffected (whereas in a single monolithic simulator, a crash stops everything). We foresee this capability fostering open-ended evolutionary simulations that run indefinitely, generating and testing new agents continually (somewhat akin to the vision of open-endedness in algorithms like POET ([2104.03062] Co-optimising Robot Morphology and Controller in a Simulated Open-Ended Environment), but at a much larger scale). NMES could in principle run an open-ended simulation where agents roam 27k worlds, mutate over generations, and perhaps occasionally migrate or share knowledge via the central coordinator. This is speculative, but the platform is built to handle the computational logistics of that vision. Traditional RL frameworks do not even attempt something like this – they focus on one agent/task at a time or at most a multi-agent game. The distributed experiment innovation aligns with trends in cloud computing for AI, but applies it in a granular way to simulations.
5. Support for NAS and Neural–Morphology Co-design: From a research perspective, NMES’s integrated support for co-evolution of morphologies and controllers is a standout feature. Co-optimizing body and brain has been studied for decades in evolutionary robotics, but it has not been a mainstream approach in deep learning due to lack of convenient tools. Researchers often had to glue together a physics engine, an evolution library, and a neural network training library, each not designed to work together. NMES provides a single unified environment where all these pieces come together. The innovation is in treating the neural network architecture as part of the environment’s state in a sense – since the AI framework can alter its own network structure during evolution, and the environment (morphology) simultaneously. By supporting NAS, NMES acknowledges that the optimum controller for one morphology may not be a fixed-size neural network, and it gives the freedom to explore larger or recurrent networks if the task demands. This is forward-looking because as co-design research progresses, complexity of both bodies and brains will increase, and we need platforms that don’t impose artificial limits (like a fixed observation vector size or fixed network input-output dimensions). NMES’s use of JSON for communication even allows variable-length descriptions of observations/actions, so an agent that grows new sensors (new cubes with sensors) could theoretically also increase the input dimensionality to its brain – something very hard to do in other frameworks that assume a static observation space. We are essentially enabling a form of online neural architecture adaptation alongside physical adaptation. This is analogous to organisms that grow new neural circuits as they mature – a concept not widely explored in embodied AI due to technical constraints.
6. Open-Endedness and User-Defined Objectives: Traditional benchmarks and environments focus on specific goals (reach a flag, maximize reward X). NMES is deliberately open-ended: it doesn’t prescribe what the agent should do. This is an innovation in terms of philosophy – it treats the environment as a sandbox or an AI playground. The user (or a meta-agent) can specify objectives on the fly. One could even imagine a scenario where an algorithm tries multiple objectives in different worlds and sees what interesting behaviors emerge, aligning with ideas of open-ended evolution (as in the POET algorithm, where environments themselves evolve to challenge agents ([2104.03062] Co-optimising Robot Morphology and Controller in a Simulated Open-Ended Environment)). The presence of dynamic objectives and a planned feature for in-game leaderboards (where various agents’ performance on user-defined metrics can be compared) make NMES more akin to a research MMO (massively multi-agent online) world than a fixed simulator. This could foster a community-driven innovation: users might share custom “challenge scripts” for NMES, essentially turning some worlds into specific tasks and then seeing how evolved agents transfer or generalize between them. This emergent usage is something Gym’s designers perhaps sought via the scoreboard but within fixed tasks ([1606.01540] 1 Introduction); NMES extends it to potentially never-ending tasks that evolve over time.
In light of these differentiators, NMES can be seen as not just a tool, but a platform enabling new research questions. For example, one could study: “What are the evolutionary dynamics when 1000 agents, each with the ability to change their morphology, compete or cooperate in a shared environment?” This is an intriguing question that couldn’t be easily studied with existing tools, but NMES could handle it by running a multi-agent world scenario. Another question: “Can NAS co-evolve with morphology to produce efficient designs from scratch (a sort of simulated invention process)?” Again, NMES is poised to explore that. The innovations of NMES make it a unique bridge between reinforcement learning, evolutionary algorithms, and complex systems science.
Finally, it’s worth noting that NMES’s design also has practical differentiators: it’s open source (leveraging Godot), relatively easy to deploy, and versatile in application. Beyond academic research, it could be used as an educational platform for teaching AI and robotics (students could program agents in a sandbox world without worrying about installation complexity), or as a prototyping tool for robotics (e.g., quickly simulating a new robot design and trying some control ideas). The system demonstrates that with careful design, one can achieve both generality and performance – two attributes often at odds. By not being tied to a specific task, NMES generalizes; by optimizing the engine usage, it remains performant.
In conclusion of this discussion, NMES’s combination of tiered simulation, external control, optimized visualization, massive parallelism, and co-design support is what sets it apart. Each of these aspects contributes to enabling a new class of experiments in AI – especially at the intersection of learning and evolution – and addresses limitations of existing frameworks (summarized in Table 1). As AI research pushes towards creating more adaptive, embodied, and autonomous systems, we anticipate that platforms like NMES will play an increasingly important role, allowing researchers to “play God” in a sense: to create virtual universes where lifeforms (in the form of neural agents) can evolve in complexity and capability, guided by both Darwinian and gradient-based processes.
While the current implementation of NMES provides a strong foundation, there are many avenues for enhancement and extension. We outline several key areas of future work:
1. Enhanced User Interface and Accessibility: At present, interacting with NMES for research requires writing client code (in Go, Python, etc.) and manually visualizing via the Godot viewport. To broaden adoption, we plan to develop a graphical user interface (GUI) that will allow users to configure experiments, launch training runs, and monitor progress without needing to script everything. This could take the form of an in-game editor or a separate dashboard application. For instance, a user might use a GUI to assemble an initial creature by drag-and-dropping cubes and joints (leveraging Godot’s editor), then select a learning algorithm from a menu (pre-defined in the Go framework), set some parameters, and hit “Train”. The system would then handle running the AI client and display results (learning curves, etc.) in real-time. This “no-code” or “low-code” approach would make NMES accessible to enthusiasts or students who are more interested in experimenting with evolution and AI than coding the algorithms from scratch. Additionally, a GUI could provide visualizations of neural network architectures as they evolve, or comparative views of multiple agents in different worlds.
2. In-Game Training and Education Mode: Building on the idea of a GUI, an interactive training mode inside the Godot application is a goal. Instead of the training happening in an external console, we could integrate a simplified version of the Go AI framework in the Godot C# code (or via Godot’s GDNative to run Go code) such that one could click a button in the game to start training an agent. This would effectively turn NMES into an educational game where the “gameplay” is setting up AI agents and watching them learn. Features like step-by-step execution, pause and inspect (to see values like Q-tables or neural activations), and tutorial prompts could be added. This would differentiate NMES from other tools by making the research process itself interactive and gamified.
3. Leaderboards and Cloud Service: We envision deploying NMES as an online service where users around the world can submit their AI agents to compete or benchmark. A dynamic leaderboard system could track records for various user-defined challenges (for example: fastest bipedal walking agent constructed from cubes of size <= N). Since NMES doesn’t have fixed tasks, these challenges would be community-defined, and the leaderboard could be segmented by task or category. A cloud-hosted NMES server cluster could run these agents in standardized conditions to evaluate them. This is somewhat analogous to the OpenAI Gym scoreboard ([1606.01540] 1 Introduction), but more flexible. We might incorporate a system where users can download others’ agent “genomes” (the design + controller) and test them in their own local NMES instance, fostering sharing of evolved designs. If there’s enough interest, this could evolve into a sort of “creature sports league”, highlighting NMES’s potential for generating engaging, game-like scenarios from research.
4. Expansion of Physics and Environment Features: Currently, NMES deals mainly with cubes and basic joints on flat terrain. Future work will introduce more complex environment features: terrains (possibly procedurally generated landscapes across the 27k planets, to test agents on varied topographies), obstacles and objects to manipulate (for evaluating evolved agents on tasks like moving objects, climbing, etc.), and possibly fluid or soft-body physics if Godot or plugins allow (to simulate swimming agents or soft robots). We also plan to integrate sensors beyond position: e.g., touch sensors (detect contacts on a cube’s faces), proprioceptive sensors (joint angle feedback, which Godot can provide from joint nodes), and visual sensors (a camera feed from an agent’s viewpoint). This will enable a wider range of tasks and more realistic scenarios. With sensors, one can attempt more bio-inspired evolutions, such as evolving an eye structure on an agent and a neural network to process the vision. Godot’s rendering could be harnessed for this by attaching a Camera to a cube and capturing images.
5. Improved Physics Realism and Performance: While our current physics approach suffices for basic experiments, more realism might be needed for certain research. One idea is to allow switching physics engines (Godot is working on integrating other physics backends). Another is to use a simplified physics for speed (as we do by freezing), but also offer a high-precision mode if needed for validation. We will continue optimizing performance – for example, by using multi-threaded physics when Godot 4 (or beyond) fully supports it, or by offloading some calculations to GPU via compute shaders (for collision broadphase, etc.). If NVIDIA’s open-source PhysX or other engines can be plugged in and accelerated by GPU partially, that could give us a hybrid of our approach and Isaac Gym’s speed, without losing flexibility.
6. Multi-Agent Interactions and Social Evolution: We have so far mostly considered one agent shaping itself. A compelling extension is to simulate ecosystems of agents that evolve together. Future work will add support for multiple species of agents in the same world, with mechanisms for interaction (communication channels, competition for resources, etc.). This would allow experiments in multi-agent reinforcement learning and co-evolution in shared environments. For example, predator-prey evolutionary arms races can be studied: one species evolves bodies and brains to chase, another to flee or fight, each driving the other’s evolution. NMES is well-suited for such open-ended two-population co-evolution, and we plan to implement templates or scenarios for it.
7. Integration with Real-world Robotics: Another future direction is bridging virtual and real. Since NMES uses standard networking and is decoupled, one could replace the Godot simulation with a real robot or hardware-in-the-loop. As a step in that direction, we consider adding a mode where the JSON commands can be sent not just to Godot but also to a microcontroller or robotics API, effectively controlling physical modular robots with the same AI code. Conversely, the AI framework could train in simulation and then control a real robot via the same interface (Sim2Real transfer). We might integrate with frameworks like ROS by writing a ROS node that translates NMES JSON messages to ROS topics, allowing a ROS-controlled robot to be an “environment” for the NMES AI. This would test the generality of our approach and possibly find use in evolutionary robotics hardware experiments.
8. Open Source and Community Contribution: We intend to release NMES as an open-source project (with a permissive license) to encourage community involvement. Future work includes writing comprehensive documentation and tutorials, so that other researchers or hobbyists can use and extend the system. We suspect that users might contribute new modules – e.g., someone might implement a new type of joint or a soft-body module, or integrate a specific evolutionary algorithm. By fostering a community, NMES could become a hub for those interested in trying radical ideas in AI (given it’s easier to do so in a flexible sandbox than in more constrained platforms).
9. Formal Evaluation and Research Studies: On the research side, we plan to use NMES to conduct studies that will serve as proofs of concept of its value. For instance, we will systematically compare an evolutionary strategy for morphology+control in NMES against fixed-morphology approaches on some task (like locomotion) to show whether co-design finds better solutions (prior work suggests it can, but NMES allows deeper investigation). We also want to evaluate the scalability: e.g., run a massive experiment with thousands of agents evolving in parallel and document the outcomes (do we get greater diversity or faster evolution?). These studies will both yield scientific insights and help refine NMES (by exposing any bottlenecks or needed features when used at scale).
10. Monetization for Sustainability: Although more relevant to deployment than research, we have considered monetization strategies to sustain the development of NMES if it gains a user base. One idea is offering cosmetic upgrades or skins for the visualized agents (e.g., different colors or models for cubes) as in-app purchases in a potential educational version or game-like version of NMES. Another is a cloud service model (users pay for server time to run large experiments without setting up their own cluster). While academic papers don’t usually discuss monetization, we note this to illustrate that NMES can be more than a research prototype; it could evolve into a platform with real users and possibly generate resources to fund further research (e.g., via subscriptions or crowdfunding features for certain experiments).
11. Collaboration and Interoperability: Finally, future work will explore interoperability with other frameworks. For example, wrapping NMES as a Gym environment (for a given task) so Gym-based algorithms can directly use it, or allowing ML-Agents to interface with NMES’s Godot environment (there is some possibility since both use C#, one might create a Unity front-end that sends NMES JSON commands, though that is speculative). By not isolating NMES from the rest of the ecosystem, we can ensure that researchers can gradually adopt it (for instance, using NMES for morphology evolution but still using familiar tools for the control part). Collaborative projects, such as benchmarking certain tasks with both NMES and other simulators, are in our plan to quantify NMES’s advantages or identify where it needs improvement.
In conclusion, the future of NMES is geared towards making it a comprehensive, user-friendly, and powerful platform for exploring AI in ways not previously possible. By improving usability, extending technical capabilities, and engaging with the community, we aim to catalyze a new wave of experiments at the intersection of learning, evolution, and complex agent design. We believe NMES can become an invaluable tool for both scientific discovery and educational inspiration, and the outlined future work will guide its evolution from a promising prototype to a mature ecosystem.
(The references above combine academic papers, online documentation, and relevant project materials to support the comparisons and statements made in this paper. All web references were last accessed March 27, 2025.)