courses

Theses

articles

Questions

Thesis Supervision

If you’re interested in joining MicrocosmAI to write your thesis, you are welcome to propose your own idea or choose from topics we can suggest. Your thesis topic must be relevant to the MicrocosmAI project, and you should already have substantial experience in the subject area as well as proficiency in coding with Python. Our focus areas include Reinforcement Learning, Embodiment, Multi-Agent Systems, Language Emergence, and Physics Engines, with an emphasis on implementation and experimentation.

Before reaching out, please review our FAQ to understand both what you can expect from the supervision and what will be expected from you. Kindly note that our capacity for thesis supervision is limited, so we are unable to accept everyone. If you are interested, please contact us with a short motivational text introducing yourself, your experience, and what you are looking for in the collaboration.

3D-Aware Vision-Language-Action Models: Evaluating Spatial Reasoning and Language-Guided Manipulation with Depth Perception

– Elisa Hagensieker, in progress

Integrating 3D data into vision-language-action (VLA) and diffusion models presents an opportunity to enhance robotic manipulation, especially in unstructured and complex environments. While existing models such as Pi-Zero, RDT-1B, RoboAgent and ACT have shown strong capabilities in grounding language and vision for manipulation, they focus on 2D representations. This thesis explores how the integration of 3D spatial awareness into VLA and diffusion models can enhance scene understanding, spatial reasoning and object interaction in robotic manipulation tasks. Key aspects include:

(1) Language-grounded spatial manipulation – Can 3D data improve a robot’s ability to follow natural language instructions that involve spatial relationships (e.g., “stack the red block on top of the blue one”)?

Hypothesis: More precise grasping reduces failures in specific operations like stacking.

(2) Generalization in real-world environments – Does integrating 3D representation allow robots to generalize manipulation strategies across unseen environments and object configurations? Hypothesis: Enhanced scene adaptability across dynamic and unstructured environments.

To validate these hypotheses, benchmarking experiments will be conducted across a wide set of real-world robotic applications on a single-arm robotic system. The performance is evaluated in terms of task success rate and adaptability to new scenes. The findings will offer insights into the impact of 3D spatial awareness on robotic performance, determining whether integrating point clouds can support more precise, adaptable and intelligent robotic systems.

Depth-Fused World Model: Adapting Dreamer’s Sensory Input for LIDAR-Enhanced Navigation in MuJoCo

– Julian Blanke, in progress

The recent advancements in Large Language Models (LLMs) have demonstrated strong language understanding and the ability to generate coherent chains-of-thought, enabling them to plan for complex tasks by learning the concept of language itself. While progress in Reinforcement Learning (RL) has been notable, robotic control and navigation tasks remain challenging due to the need for precise modeling of environmental dynamics. One approach to addressing these environmental dynamics challenges is similar to how LLMs learn language, by capturing the underlying concept.

World models, which learn a latent representation of the environment’s structure and underlying physics, offer a promising direction for capturing the concept of the world. The Dreamer-V1 framework illustrates this approach by constructing a latent model from image data and incorporating an “imaginary” phase that allows the agent to predict near-future states based on its actions, guided by the concept of the world it learned beforehand. Although this method effectively captures the essential dynamics in simulated environments, its reliance on image inputs can limit its applicability to scenarios where other types of sensory data are more appropriate.

To bridge this gap, this thesis investigates the adaptation of the Dreamer-V1 algorithm to utilize LIDAR-based sensory data instead of images. In a simulated environment powered by the Mujoco physics engine, a LIDAR system provides detailed depth and spatial information for navigation tasks. This approach aims to assess how substituting image data with LIDAR inputs affects the performance of the world model and its ability to support autonomous navigation. By evaluating the modified framework on navigation tasks, this work contributes to a clearer understanding of how LIDAR-enhanced world models can capture complex environmental dynamics and potentially improve robotic control in diverse settings.

Analyzing Emergent Language in Open-Ended Situated Environments

– Cornelius Wolff, 2025

The emergence of language has long been a subject of inquiry in linguistics and cognitive science. Recent advancements in artificial intelligence and reinforcement learning have enabled researchers to explore language emergence through computational models. This thesis investigates how language emerges in multi-agent reinforcement learning (MARL) environments, focusing on open-ended, situated settings where communication is not always necessary for task completion. Unlike traditional reference games, which assume constant communication, our study introduces two novel situated reinforcement learning environments—Multi-Agent Pong and Collectors—where agents must learn when and how to communicate effectively.

We examine the role of environment complexity, reinforcement learning algorithms, and interpretability techniques in shaping emergent language. Agents are trained using Proximal Policy Optimization (PPO) and REINFORCE, allowing us to analyze the impact of different optimization strategies. To enhance our understanding of emergent communication, we apply explainable AI (XAI) techniques, including saliency maps, gradient-based methods (Vanilla Gradients, Integrated Gradients, SmoothGrad), perturbation analysis, and diagnostic classifiers. These techniques enable us to track the evolution of communication protocols and assess their functional relevance.

Our findings demonstrate that emergent language in open-ended environments exhibits properties that do not emerge in classical reference games. Notably, agents develop sparse language use and context-dependent communication. Furthermore, we show that interpretability methods provide valuable insights into the decision-making processes of agents, revealing how learned communication strategies correlate with task performance.

This work contributes to the intersection of language emergence, multi-agent reinforcement learning, and explainable AI, offering novel insights into how artificial agents develop communication in complex, dynamic settings. The results suggest that increasing environment complexity and leveraging interpretability tools are crucial for advancing research in emergent communication and artificial language systems.

Scientific Evaluation Of Physics Engines For Reinforcement Learning

– Sönke Lülf, 2024

In Reinforcement Learning (RL) applications within virtual environments, the primary computational bottleneck is often not the agent’s policy updates but rather the simulation of the environment itself. As such, selecting an appropriate physics engine and configuring it with optimal parameters is critical for enabling efficient training. While previous studies have benchmarked various physics engines, they have largely overlooked the impact of parallelization. This study addresses that gap by evaluating the parallelization capabilities of several physics engines commonly used in RL. All tested engines supported parallel physics simulation, which substantially reduced simulation time. Although reliable training could not be achieved using Unreal Engine, parallelizing simulation with 12 agents—compared to a single agent—yielded a 3.0× to 3.4× speedup in MuJoCo and a 3.5× to 4.4× speedup in Unity across different scenarios. Consistent with prior work, our results also confirm that MuJoCo offers significantly faster simulation performance than Unity when comparing across engines, independent of parallelization.

Assessing PAIRED, a Multi-Agent Reinforcement Learning Approach for Adversarial Environment Generation, in Frozen Lake

– Jens Huth, 2024

The effectiveness of reinforcement learning (RL) agents significantly hinges on the quality and diversity of their training environments. This thesis explores the Protagonist Antagonist Induced Regret Environment Design (PAIRED), a novel multi-agent RL approach focused on adversarial environment generation, evaluated within the Frozen Lake environment. By integrating insights from domain randomization and minimax adversarial strategies, PAIRED utilizes decision-theoretic principles to dynamically create structured and solvable environments. The study investigates whether PAIRED enhances agent adaptability and performance, particularly in sparse reward settings, compared to conventional methods. Findings indicate potential advantages in using PAIRED, as agents demonstrated increased complexity in learned behaviors and improved generalization to novel environments. However, challenges such as computational resource demands and the inherent difficulties posed by the Frozen Lake environment highlight areas for further research.

On Aligning Population Based Emergent Communication via Dynamic Connectivity

– Leon Schmid, 2024

Utilizing advances in Deep Learning, Emergent Communication studies the emergence of communication protocols in cooperating artificial agents. Population based Emergent Communication has recently shown promising results, especially towards more human-like language features and alignment of emergent with natural language protocols. Scaling Emergent Communication experiments to large populations however faces a challenging (dec-POMPD) optimization problem characterized by an upper-bound exponential increase in computational complexity with respect to the population size. We propose to further improve population based approaches in Emergent Communication by introducing the novel concept of continuously shaping the underlying population connectivity to favour the emergence of such language conventions, which are compatible with desired training efficiency and language effects. We denote this approach Compatible Conventions. This work provides two implementations of Compatible Conventions based on Teacher-Student Curriculum Learning and Commentary Learning, which we evaluate on a large-scale Emergent Communication task. We analyze learning efficiency, as well as language effects on semantic and syntactic drift. Our results show that our proposed algorithms fail to outperform the baseline in terms of learning efficiency, and show limited effects on language drift.

Hindsight Language Learning: Enhancing Multi-Agent Emergent Communication in Sparse Reward Environments

– Manar Ali, 2024

This thesis presents Hindsight Language Learning (HLL), a novel approach inspired by Hindsight Experience Replay (HER) to enhance multi-agent communication in sparse reward environments. The primary focus is on the Lewis reconstruction game, an environment that necessitates effective communication among agents to achieve a common goal. In HLL, various hindsight ratios and mechanisms are utilized to leverage unsuccessful communication attempts by relabeling them as successful under alternative conditions. This process enables agents to learn from both successes and failures, akin to human language acquisition. The experimental results confirm that HLL outperforms baseline methods across various settings. Lower hindsight ratios, which determine the proportion of unsuccessful communications treated as successful, were found to be particularly effective, enhancing accuracy and communication efficacy more than higher ratios and baseline methods. Specifically, Combined mechanisms that integrate relabeling for both the sender and receiver, as well as Turn-Taking strategies, have demonstrated the highest performance metrics. The Receiver Hindsight strategy was also notably effective in enhancing communication efficacy. While generalization was limited, likely due to the restricted input size, future research could explore larger input sizes to improve this aspect. This study provides valuable insights into optimizing multi-agent communication and lays a strong foundation for further exploration.

Natural Language Instruction-Following in a Simulated 3D World using MicrocosmAI

– Kamran Vatankhah-Barazandeh, 2024

This thesis aimed to systematically search for a Natural Language Instruction Following experiment set in a 3D environment, and to implement it using the MicrocosmAI framework, demonstrating the framework’s use for this kind of experimentation. A relevant paper was identified through reviewing recent literature on the topic, and the described experiment was adapted to the framework. Development was split up into three cycles with an evaluation following each, and informing the next. While implementing the environment and agent was successful, we could not achieve similar training results to the reference experiment. This may be amended by future research with fewer constraints on time and computing power. Despite the experimental result, the objectives of gaining an overview about current research, and demonstrating the framework’s functionality, could be achieved.

Race for Cooperation – Effects on Communication in Simulations with Situated Neural Network Agents

– Niklas Heidemann, 2023

This thesis investigates semi-competition in Language Emergence (LE), a discipline focused on understanding how Deep Reinforcement Learning (DRL) agents autonomously learn communication protocols in cooperative and competitive settings. Previous work in LE has predominantly focused on cooperative tasks, although many real-world settings are more complex. This work introduces the concept of Race for Cooperation (R4C) to harness competition as a catalyst rather than a hindrance. Given a task containing two agents learning to cooperate, R4C employs three agents instead. However, only two agents can cooperate, which means that in the case of R4C, one agent will not receive a reward. This creates additional pressure that is anticipated to foster LE. For this purpose, a novel environment has been developed. It allows for situated agents with symmetric communication across multiple turns and is extremely flexible for future research. However, neither SAC nor PPO, the state-of-the-art DRL algorithms implemented, were able to learn language protocols, even after extensive optimization and experimentation. As an alternative, the environment was adapted to simulate a referential game, wherein the simpler concept of onesided R4C (oR4C) was tested. Albeit LE occurred in the modified environment, the frequency was insufficient to draw conclusive results from experiments testing oR4C and R4C. Noteworthy minor findings include that agents trained with R4C exhibited higher Social Influence, and that an existing technique to encourage LE had detrimental effects. In conclusion, this study posits that vanilla implementations of SAC and PPO may not be well suited for developing LE in embodied environments such as the one developed in this thesis. Although a definitive assessment of R4C remains pending, it remains an intriguing question for future research.

courses

Theses

articles

Questions

Thesis Supervision

Projects

Blog

Teaching

About

Contact