Mayalen Etcheverry, Bert Chan, Clément Moulin-Frier, Pierre-Yves Oudeyer
The idea of the Minecraft Open-Endedness Challenge is to try and think how would we artificially approach “open-endedness” in the Minecraft environment [Grbic et al., 2020], put in another words it asks the question: Can we build an artificial system that would be able to generate endless surprises if ran “forever” in Minecraft? While there is not a single path toward solving that grand challenge you’ve never heard of, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft.
Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search.
Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial “discovery assistant” incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity [Etcheverry et al. 2020].
To implement those ideas in the Minecraft environment, we simulate an artificial “chemistry” system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial “discovery assistant” (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.
In this blogpost, we start with the problem formulation for building an open-ended algorithm capable of endlessly generating Minecraft creations. Then, we present what we decided to include as desirata properties in our open-ended framework and then proposes a possible implementation of those ideas in the Minecraft environment. Finally, we provide the database of discoveries made by our algorithm.
Minecraft game environnement $E$ is made of $1m^3$ building blocks that players can use to construct any structure they desire (also called artifacts) with almost endless possibilities to express their creativity. The blocks exist in a spatio-temporal world which is 3-dimensional in space and where time advances with each gametick. There is a finite number of possible blocks (M=255) which are described by their type $m$ (eg: water, wood, air, etc.) and optionally by a state $s$ which further describes the block appearance (eg: texture, orientation) or behavior (eg: age, on/off state, counters).
Let’s define the spaces of our problem:
Following that formalism, our open-ended framework is composed of two main components:
How to artificially construct the genome $\theta$, the genotype-to-phenotype mapping $G$ and to how to evolve sets of artifacts $\{(\theta,a)\} \in W$ complying with open-ended problems remains an open question to date. However, interdisciplinary research at the crossroad of complex system science, machine learning and biology seem to suggest a list of necessary “ingredients” that such a system should comply with to have the hope of exhibiting a high level of open-ended dynamics [Stanley et al., 2020]. The next section details what we include as desirata properties in our open-ended framework.
We formulate our artifact-generator $G$ as an artificial complex system which should comply with the following general properties:
Discovered the first oblique glider rake in @conwaylife (based on a partial discovered by @inspirehep23) by running ikpx2 for 35000 core-hours on the under-utilised CPUs of a @LeaderGPU machine.
— Adam P. Goucher (@apgox) May 10, 2021
This wouldn't have been possible without @ArminBiere's SAT solvers (cadical/kissat). pic.twitter.com/wiupEPrSF9
The idea of implementing computational systems with the above desirata properties, whose complexity could grow automatically akin to real complex systems is not new.
The simulation of artificial worlds that exhibit life-like behaviors goes back to von Neumann’s seminal work on self-reproducing cellular automata in the 40’s.
and has been one of the core thematic of the Artificial Life research discipline since then.
Interestingly, von Neumann’s “proof of principle” showed that very simple machines could constitute universal constructors with very large expressive power.
30 years later, John Conway implemented a very simple version of von Neumann’s automata, the well known Game of Life that, despite its apparent simplicity,
can generate a very wide range of life-like structures and dynamics.
Yet, even for very simple systems like Conway’s Game of Life where one fully knows the basic rules at the local level,
we are still far from fully grasping what structures can self-organize, how to represent and classify them, and how to predict their evolution.
As an illustration of those challenges, more than 50 years after the creation of Conway’s game of like and thanks to today supercomputer’s power,
the community is still discovering novel patterns with impressive complexity and whose existence remained unknown to date.
Therefore, a more challenging question is:
Can we automate the long-term discovery of increasingly complex and divergent structures in such systems?
In the same way that Conway’s Game of Life together with the community of people searching it might form an example of open-ended system, our open-ended framework couples the artifact-generator $G$ with an artifact-discovery algorithm $D$ able to continuously search for novel and divergent artifacts in $W$.
In the next section, we briefly present the 3 main families of methods that propose to automatically explore the input space $\Theta$ of complex systems, namely
“naive” strategies (independent of the outcomes of the system), optimization-driven strategies and diversity-driven strategies.
We review why these approaches (in their standard definition) are limited to comply with open-ended problems and rather tend to converge toward a sub-region of the possible solution space $W$.
Our artifact-discovery algorithm $D$ search objective is formulated as the meta-diversity objective, an extension of the standard diversity objective that we proposed recently in the context of automated discovery in complex systems [Etcheverry et al. 2020].
An “naive” strategy consists of covering at best the known search space $\Theta$, for instance with grid-search or uniform sampling strategy $D: \theta \sim U(\Theta)$. While easy to implement, this is often very inefficient in covering the space of possible phenotypes, especially for complex genotype-to-phenotype developmental mappings where big regions of the input space $\Theta$ are mapped to small regions in the outcome space $A$ (we talk of “attractor” states).
Pure objective-driven search is another widely-used discovery strategy $D$, where each discovery $a$ gets scored with a fitness function $F$ and where $D$ aims to find individuals $\{(\theta_i, a_i)\}$ with the highest fitness scores $F(a_i)$. Different objective-driven techniques have been implemented as discovery tool in complex systems such as evolutionary population-based search strategies [Nichele 2016], gradient-based search strategies (if the generator $G$ and the fitness $F$ are differentiable) [Mordvintsev et al. 2020], and reinforcement learning strategies (if the system’s dynamics $f$ integrates a differentiable policy which actions influence the system next state) [Pathak et al. 2019]. While these techniques can be very powerful optimizers toward desired artifacts [Sudhakaran et al. 2021], they tend to converge toward “peaks” regions on the artifact fitness landscape and stop here once reached, making them poor-fit for open-ended problems. Moreover, they generally suffer from deceptive rewards and are hardly generalizable to complex artifact spaces with limited predictability.
Another family of approaches proposes to maximize the novelty of the discovered phenotypes instead of the fitness, and leverages population-based evolutionary and developmental curiosity-driven exploration methods like Novelty Search or Intrinsically Motivated Goal Exploration Processes. Getting rid of the efficiency objective has shown to be effective at creating behavioral diversity in robotic systems [Forestier et al., 2017] and a promising discovery tool in complex systems [Reinke et al. 2020]. Moreover, diversity-search can be coupled with optimization-based strategies to generate a set of diverse high-performing phenotype solutions to a problem, forming what we call Quality-Diversity methods. Yet standard approaches assume that the intuitive notion of diversity can be captured within a single representation space, which is generally called $BC$ for behavioral characterization. This limits the scope of the final discoveries: in the same way that the fitness function $f$ is constraining the reached area in the observation space $A$, being diverse in a specific $BC$ does not mean being diverse in $A$. Actually, there is not such thing as a unique ground truth “interesting” diversity in complex phenotypic spaces, and operating in a monolithic BC space might lead to discoveries that are highly-diverse in that space but poorly-diverse according to other potentially-interesting behavioral criteria [Etcheverry et al. 2020].
To address these limits, we recently formulated the problem of meta-diversity search where an artificial “discovery assistant” incrementally learns divergent feature spaces to characterize the different niches of diversities (outer loop) and searches to discover diverse patterns within each of them (inner loop). The objective of this process is to enable continuous seeking of novel source of diversities while being able to quickly adapt the search toward a new unknown type of diversity. Similarly, quality-guidance can be integrated in the framework: with minimal external feedback, a successful discovery assistant should be able to efficiently specialize the exploration strategy toward a particular type of “interesting” diversity, corresponding to the initially unknown preferences of an end-user and expressed through simple sparse feedback.
We simulate an artificial “chemistry” system based on Lenia continuous cellular automaton [Chan 2019, Chan 2020] and an artificial “discovery assistant” (HOLMES) which integrates a goal-based intrinsically-motivated exploration (inner loop’s diversity search) with the incremental learning of a growing hierarchy of behavioral characterization spaces (outer loop’s divergent knowledge accumulator) [Etcheverry et al. 2020].
While abstracting away from natural chemistry, we follow this metaphor to illustrate our open-ended system and implementation choices for the challenge.
Cellular automata are good models to study how significantly different dynamics can arise from uniform simple laws at a smaller scale.
Lenia is a recently proposed cellular automaton that extends traditional CA with continuous state spaces and rules generalized to higher dimensions, multiple kernels and multiple channels.
Our variant of Lenia (LeniaChem) used for the Minecraft environment is implemented in 3D and combines continuous channels (one channel per “block” in minecraft) and discrete visible states. Any number kernels can be sampled and the kernel matrices are generated with CPPN networks [Stanley, 2007], such that the CA update rule is not limited in its expressive power.
The CA’s 3D grid $a^t$ represents the “petri-dish” where evolved patterns will develop in time.
The state space $\Sigma$ is represented with a multi-channel continuous array $A$ where each channel $A_j$ represent the “chemical potential” of the associated specie $j \in \{1..M\}$.
The hyper-parameters constraining the range of possible patterns are the grid-size $(S_X, S_Y, S_Z)$ (dimensions of the “petri-dish”) and
the number of channels $M$ (number of “chemical species” represented with blocks in Minecraft). The grid here is a torus, meaning that voxels on the top borders are neighbors to bottom voxels and same for left and right borders.
Following the “artificial chemistry” metaphor, we can imagine the parameters $\theta = (\psi, \phi)$ as representing the quantity and disposition of each starting material
as well as the operating conditions (temperature, pressure, pH, etc.) influencing the development of the final product.
As an example, Grizou et al. 2020 recently propose to explore the dynamics of an oil-droplet system with a “curious robot” which could manipulate the initial mixture composition of chemicals.
The resulting “artifact” here would be the yield of each compound in the final product, the rest being considered as empty “air” invisible blocks in Minecraft.
The “physics” formulas $f$ of our petri-dish follow Lenia rules [Chan 2020] and apply the following updates to the state at each time step:
Example of chemical reactions from the chemical bouillon youtube channel.
Example of artifact-growth in the LeniaChem system.
Back to the desirata properties of our artifact-generator, LeniaChem is a dynamical system that shows growth, local interactions and non-linearity. LeniaChem is not open in a strictly-speaking sense but we do not impose any sort of mass-conservation law. Similarly, we do not explicitly implement memory in the CA but any value can be stored in the continuous channel.
To solve the previously-introduced meta-diversity objective, several key challenges need to be addressed. A first challenge is to unsupervisedly learn a diverse set of representations to characterize behaviors and a second challenge is to find a diverse set of patterns in each of those BC spaces.
To address the first challenge, we use a dynamic neural-network architecture where a hierarchy of module embedding networks $\mathcal{R}=\{\mathcal{R}_i\}$ is progressively expanded by the discovering agent
to characterize the different discovered niches of pattern. The architecture has 4 main components:
(i) a base module embedding neural network
(ii) a saturation signal that triggers the instantiating of new nodes in the hierarchy
(iii) a boundary criteria that unsupervisedly clusters the incoming patterns into the different modules
(iv) a connection-scheme that allows feature-wise transfer from a parent module to its children
In HOLMES, the clustering of the different patterns and the use of learnable connections is central to explore divergent search spaces and to aggregate the knowledge between the distributed modules. While HOLMES architecture is very general, the engineer’s choice for the module/connections training strategy and for the clustering algorithm will impact how patterns are separated and distributed into different niches, which will in turn greatly influence their “selectivity” by the population-based IMGEP.
Here, we use the same implementation choices that in Etcheverry et al. 2020:
(i) we use VAEs as base modules $\mathcal{R}_i$: the VAEs are incrementally trained to encode their own niche of patterns into a latent characterization space $BC_i$
(ii) we say that the representational capacity of a module saturates when the reconstruction loss of its VAE reaches a plateau (with additional conditions to prevent premature splitting such as minimal node population)
(iii) each time a node gets saturated, we freeze its latent space and use K-Means clustering to fit a boundary and redirect the incoming patterns (the boundary is kept fixed for the rest of the exploration)
(iv) we instantiate learnable layers called “lateral connections” between the parent and child VAE feature-maps allowing the child VAE to reuse its parent knowledge while learning to characterize novel dissimilar features in its own BC space
A goal-based intrinsically-motivated exploration process (IMGEP) is used for the parameter sampling strategy. The IMGEP operates in the hierarchy of goal spaces ${BC_i}$ as defined by HOLMES embedding hierarchy ${\mathcal{R}_i}$.
The exploration process iterates through steps 1-to-4 as illustrated in the above figure:
(i) sample a target BC space and a target goal in it (the sampling strategy here is equivalent to Novelty Search)
(ii) sample a set of parameters for the next rollout to achieve that goal (here nearest neighbor + mutation)
(iii) let the system rollout and observe the outcome artifact
(iv) store the resulting (parameter, observation, encodings) triplets in an explicit memory (growing archive of all discoveries).
We show above the resulting discoveries made by IMGEP-HOLMES in the LeniaChem system. Experiments were conducted in 2D ($64^2$ and $32^2$) and in 3D ($16^3$). For 3D artifacts we only show the central slice (z=8) but you can see below a video of what it looks like when projected in Minecraft. We conducted growth experiments where patterns are initialized on a small portion of the petri dish (30%), as well a “degrowth” variant where the the petri-dish is filled at t0 by a CPPN-generated pattern. While growth/expansion is one of our desirata for open-ended systems, the CPPN-generated patterns are visually appealing and produce some sort of decaying art effect when coupled with the Lenia dynamical system in Minecraft, as shown in the second video.
In the context of such an “open-ended challenge” were the ambitions were also quite “open-ended”, we were more or less free to do anything we want.
It has been quite fun implementing our ideas in the Minecraft environment, but there are many other directions that we could explore for future work.
While we mainly used the Minecraft engine as a “renderer” in our current experiments, it will be interesting that our growth process occurs inside the world’s physics. Minecraft physics are quite poor (most blocks don’t even have gravity) but some blocks do have interesting properties and we could also imagine having external sources perturbing the growth process (eg: players throwing blocks inside the petri-dish).
Secondly, it will be interesting to integrate some sort of quality-guidance in the meta-diversity search, for instance by rewarding displacements of the structure’s center of mass to hope discovering flying machines in Minecraft [Grbic et al., 2020], but defining such score might be hard for more complex functionalities (how to reward building a CPU machine?) and might be very difficult to optimize due the sparse reward.
Another type of guidance that would be very interesting to integrate in the discovery process is human-guidance. We have shown in Etcheverry et al. 2020 that minimal feedback from an external user could help the meta-diversity search specialize toward “interesting” types of diversity. What constructions would come out from a community of players guiding the discoveries in distributed parts of the world with their subjective notion of what is “interesting”?
This work benefitted from access to the HPC resources of IDRIS under the allocation 2020-[A0091011996] made by GENCI, using the Jean Zay supercomputer.
The source code used for this challenge can be found at the following link: https://github.com/mayalenE/evocraftsearch/