Laser Learning Environment: A new environment for coordination-critical multi-agent tasks
We introduce the Laser Learning Environment (LLE), acollaborative multi-agent reinforcement learning environment in whichcoordination is central. In LLE, agents depend on each other to makeprogress (interdependence), must jointly take specific sequences of actionsto succeed (perfect coordination), and accomplishing those joint actionsdoes not yield any intermediate reward (zero-incentive dynamics). Thechallenge of such problems lies in the difficulty of escaping state spacebottlenecks caused by interdependence steps since escaping those bot-tlenecks is not rewarded. We test multiple state-of-the-art value-basedMARL algorithms against LLE and show that they consistently fail atthe collaborative task because of their inability to escape state spacebottlenecks, even though they successfully achieve perfect coordination.We show that Q-learning extensions such as prioritised experience replayand n-steps return hinder exploration in environments with zero-incentivedynamics, and find that intrinsic curiosity with random network distil-lation is not sufficient to escape those bottlenecks. We demonstrate theneed for novel methods to solve this problem and the relevance of LLEas cooperative MARL benchmark.