Configuring PushGP Runs¶
The primary abstraction for starting PushGP runs with PyshGP is to instantiate a
PushEstimator
and call the fit()
method with a dataset of training cases. The
estimator can be configured to use different search algorithms, selection methods,
variation operators, and other hyperparameters. This guide demonstrates a variety
of different ways a PushEstimator
could be configured to change the way
programs are synthesized in PyshGP.
At a minimum, a GeneSpawner
must be provided when creating a PushEstimator
. The
spawner is used to generate random genomes during the initialization of an evolutionary
population and random genes during mutation operations. The genes might be produced by
the spawner are samples from a set of inputs, literals, ephemeral random constant
generators, and the InstructionSet
.
If only a GeneSpawner
is provided, the hyperparameters of the PushEstimator
will be the defaults listed in the API. See pyshgp.gp package.
from pyshgp.gp.estimators import PushEstimator
from pyshgp.gp.genome import GeneSpawner
spawner = GeneSpawner(
n_inputs=1,
instruction_set="core",
literals=[],
erc_generators=[lambda: random.randint(0, 10)]
)
est = PushEstimator(spawner=spawner)
est.fit(X, y)
The PushEstimator
can be further configured with the top-level hyperparameters
that apply directly to the estimator. Examples include populaiton_size
,
max_generations
, and initial_genome_size
. More information about these
hyperparameters can be found in the API. See pyshgp.gp package.
from pyshgp.gp.estimators import PushEstimator
est = PushEstimator(
spawner=spawner,
populaiton_size=1000,
max_generations=300,
initial_genome_size=(40, 200),
simplification_steps=1000
)
est.fit(X, y)
Evolutionary Components¶
PyshGP aims to be extensive as much as possible. It is expected that users will want
to implement their own components (selection methods, variation operators, etc) and
use them in coordination with the abstractions provided by PyshGP. To accomplish this,
the PushEstimator
accepts instances of various abstract base classes. Users can
choose to use instances of concrete sub-classes provided by PyshGP, or implement their own.
from pyshgp.gp.estimators import PushEstimator
from pyshgp.gp.selection import Lexicase
from pyshgp.gp.variation import VariationOperator
class ReverseMutation(VariationOperator):
"""A mutation that reverses the parent genome."""
def __init__(self):
super().__init__(1)
def produce(self, parents: Sequence[Genome], spawner: GeneSpawner) -> Genome:
return Genome.create(parents[0][::-1])
est = PushEstimator(
spawner=spawner,
selector=Lexicase(epsilon=True), # This selector has its own configuration.
variation_strategy=ReverseMutation(),
population_size=300
)
This design is in direct conflict with the sci-kit learn philosophy of designing estimators,
where hyperparameters are simple values and all of the configuration exists in the estimator.
In order to bring the PushEstimator
back into towards a simpler (and narrower) API, most
the evolutionary components can be set with a string that corresponds to the name of a
reasonable “preset” value. For example, selector="lexicase"
is the same as selector=Lexicase()
.
The following sections describe common ways of configuring the different components of evolution.
Parent Selection¶
Parent selection is controlled by an instance of a Selector
type, and it’s used to
select one or more individuals from evolutionary population. Different selectors apply
different “selection pressure” which guides evolution differently.
The preset selectors that can be referenced by name are:
"roulette"
: Fitness proportionate selection, also known as roulette wheel selection."tournament"
: Tournament selection. Default tournament size is 7."lexicase"
: Lexicase selection. Defaultepsilon=False
."epsilon-lexicase"
: Epsilon lexicase selection."elite"
: Selects the bestn
individuals by total error.
Variation Strategy¶
A variation operator is a transformation from parent genomes to a child genome. A variation pipeline is a variation operator composed of other variation operators that are applied in a sequence. A variation strategy is a variation operator that composed of other variation operators that are each associated with a probability.
The preset variation operators that can be referenced by name are:
"deletion"
: Deletes random genes."addition"
: Adds random genes at random points."alternation"
: Pulls genes from a parent and randomly switches which parent it is pulling from."genesis"
: Creates entirely new random genomes."cloning"
: Returns the parent’s genome unchanged."umad"
: Uniform mutation by addition and deletion."umad-shrink"
: Variant of UMAD that biases towards more deletion than addition."umad-grow"
: Variant of UMAD that biases towards more addition than deletion.
For a reference on UMAD, see this paper.
When configuring a PushEstimator
, you can specify a variation strategy containing multiple
possible operators to apply with some probability. For example, the following configuration will
use Alternation
70% of the time and Genesis
the other 30% of the time.
from pyshgp.gp.estimators import PushEstimator
from pyshgp.gp.variation import VariationOperator, Alternation
est = PushEstimator(
spawner=spawner,
variation_strategy=(
VariationStrategy()
.add(Alternation(alternation_rate=0.01, alignment_deviation=10), 0.7)
.add(Genesis(size=(20, 100)), 0.3)
)
)
Search Algorithms¶
Documentation TBD.