Evolutionary Computing | Kenniy Olorunnimbe

... one general law, leading to the advancement of all organic beings, namely, multiply, vary, let the strongest live and the weakest die.

- C. Darwin, The Origin of Species, 1859; Wordswortth 1998 Edition, p. 186

image courtesy genetic-programming.com

Evolutionary computing (or computation) draws inspiration directly from evolutionary biology to develop search and optimization techniques for solving complex problems. They are usually rooted in Darwinian theory of evolution. From Wikipedia, the Darwinian theory states that:

…all species of organisms arise and develop through the natural selection of small, inherited variations that increase the individual’s ability to compete, survive, and reproduce.

This suggests that as organisms evolve, they tend to be more fit to their environment, through processes of reproduction, genetic variations and natural selection. This theory gives rise to the natural computing approach called evolutionary algorithms (EAs). There are different types of EAs, but the most popular ones are genetic algorithms (GA), genetic programming (GP), evolution strategies (ES) and evolutionary programming (EP). This post will be talking primary primary about GA, and GP, and briefly about ES and EP. Before diving into these algorithms, we need to understand what problem-solving as a search task is. We will briefly discuss some search techniques, such as hill climbing and simulated annealing. These are single individual based search algorithms, as opposed to the EAs which are population-based, with a concept of competition. Understanding these single search algorithms provides the foundation needed for a better understanding of the EAs.

Problem-solving as a search technique

From the evolutionary perspective, a problem can be seen as a collection of information from which knowledge is to be extracted. This could be as simple as determining the value of $x$ that minimizes the function $f(x) = x^3 + x + 3$ (a minimization problem), or to assign classes to a set of students (a timetable problem). For each of these, the problem of solving them corresponds to taking a sequence of steps that leads to either a desired global performance or an improved individual performance. The process of looking for this performance gain is called a search. Search algorithms take a problem as input and return a solution. The problem needs to be first formulated though, and that involves 3 main concepts:

Choice of representation i.e. encoding possible candidate solutions as a search space. This is primarily defined by the initial solution state and a set of other possible states. These are also called the feasible solutions.
Specification of the objective i.e. accurate description of what is to be achieved, in form of a mathematical statement. Then the maximization problem indicated above, the objective can be written as $min\ f(x)$ .
Definition of the evaluation function i.e. a function that returns a specific value that indicates the quality of a candidate solution. It can also serve as a means of comparing between multiple individuals for one or more candidate solutions. Solutions to the evaluation function by the individual candidates is a response surface which is termed the fitness landscape.

A definition of the minimization problem in the example given above will look like:

For search space $S$ , with feasible solutions $F,\ F \subseteq S$ , find $x^*$ such that $eval(x^*) \le eval(x), \forall x \in F$ .

The evaluation function (represented as $eval(x)$ ) with the lowest value for $x$ is considered the better one. It might also not be straightforward to get a global solution or global optimum to a search problem. In contrast to a global optimum, a local optimum will be a solution that is only optimal, in comparison to its neighbourhood, and there could be a much better solution in the overall search space, as shown in the illustration below.

Local and global optima

In a maximization problem, the global optimum (maximum) will be the peak value of the fitness landscape, as against the local optima (maxima). Two popular search techniques are hill-climbing and simulated annealing.

Hill Climbing

Hill climbing is a local search method that uses am iterative improvement strategy. At initialization, the strategy is applied to a single (randomly selected) point ( $x$ ) in the search space. At each iteration, a new point is selected by performing a small perturbation (displacement) of the current point, by selecting a new point $x'$ in the neighbourhood of $x$ , i.e. $x' \in N(x)$ . This can be by simply adding a small random number to the current value of $x$ i.e. $x'=x+ \Delta x$ . If the new point provides a better value of the evaluation function, then it becomes the current point, else another displacement is done. Termination is achieved when no further improvement is achievable, a fixed number of iteration is done or a goal is attained. The algorithm can be seen below:

A simple hill-climbing algorithm

procedure [x] = hill-climbing(max_it, g) initialize x eval(x) t <- 1 while t < max_it & x != g & no_improvement do x' <- displace(x) eval(x') if eval(x') is better than eval(x) then x <- x' end if t <- t + 1 end while end procedure

As might have been already observed, this simple hill climbing has some weakness:

It can easily terminate at a local optimum, based on the initialization point;
There is no information about the distance between the solution found and the global optimum;
It is quite impossible to provide an upper bound for the computational time.

To help with some of these problems, there is an iterated hill-climbing that starts from multiple initialization positions, and mentions a memory of best solution so far. There is also a stochastic hill-climbing procedure that associate a probability value to the likelihood of a new point $x'$ to be selected, providing a possibility of escaping from a local optimum.

Simulated Annealing

Simulated annealing is inspired by the annealing process in physics. Annealing is the process of allowing a heated material (glass or metal) to cool down slowly, in order to toughen it up, and remove internal stress, thereby reducing brittleness. The origin idea of simulated annealing (SA) algorithm is from statistical thermodynamics, which suggests that:

large systems at a given temperature approach the equilibrium state spontaneously, characterized by a mean value of energy, depending on the temperature. By simulating the transition to equilibrium and decreasing the temperature, it is possible to find smaller and smaller values of the system’s mean energy.

Given $x$ as the current system configuration, and $Ex, T$ are the energy and temperature of $x$ respectively, the probability of $x$ is given by the Bolzmann-Gibbs distribution, $P(x) = k e^{{-\frac{E}{T}}}$ , where k is a system constant. In a system with multiple possible states (i.e. a search space), the mean energy on the system is given by:

E_m = \frac{\sum_iE_i e^{{-\frac{E_i}{T}}}} {\sum_i e^{{-\frac{E_i}{T}}}}

The computation might take a long time for a large number of configurations, but we can randomly sample the configurations (i.e. using Monte Carlo method) to simulate equilibrium.

In SA algorithms, the evaluation function is used to represent the energy, and the fitness function is used to represent the temperature. We are in essence searching by reducing the value of $T$ . Given this details, the SA algorithm works as follow:

Assuming a current configuration of $x$ , it is given a small random displacement, resulting in $x' \in N(x)$ . The energy $E(x')$ of the new position is computed and the difference, $\Delta E = E(x') - E(x)$ helps to decide if to accept the new configuration or not.

If $\Delta E<=0$ , the displacement is accept, and $x'$ becomes $x'$ for the next iteration. If $\Delta E> 0$ , the probability of accepting $x'$ is given by $P(\Delta E) = e^{\frac{\Delta E}{T}}$ . The algorithm can be seen below.

A simple simulated annealing algorithm

procedure [x] = simulated_annealing(g) initialize T initialize x eval(x) t <- 1 while not_stopping_criterion do x' <- displace(x) eval(x') if eval(x') <= eval(x) then x <- x' else if random[0,1) < exp[(eval(x)-eval(x))/T] then x <- x' end if T <- g(T,t) t <- t + 1 end while end procedure

By repeating the steps in the algorithm, we are simulating the thermal motion of atoms in thermal contacts with a heat bath at temperature T. Using equation $P(\Delta E)$ enables the system to evolve into a Boltzmann distribution.

Evolutionary Algorithms

Evolution can be regarded as a search process that is capable of locating solutions problems offered by environmental problems. Iterative (search and optimization) problems that are inspired by nature, are generally referred to as evolutionary algorithms (EAs). They can be applied to a variety of domains with a requirement for problem-solving. Evolutionary computation (EC) is the name that describes the field of research dedicated to EAs. The main types of EC are genetic algorithms (GA), genetic programming (GP) evolution strategies (ES), and evolutionary programming (EP). A standard EA is expected to have the following characteristics:

A population of individuals that reproduce with inheritance. Each individual represents an encoded point in the search space. They can reproduce (sexually or asexually), generating offspring that inherits some of their features/traits, causing some resemblance between offspring and parent.
Genetics variation. Offspring are genetically different from parents through mutation. Mutation allows the appearance of new traits in offspring, hence potentially new regions in the search space to explore.
Natural Selection. There is a measure of fitness/quality to each individual, which dictates their competition survival and reproduction in the environment. Individuals with higher fitness have a selective advantage.

A standard EA is a generative, iterative and probabilistic algorithm, maintaining a population $P$ of $N$ individuals, $P = {x_1,...x_N}$ . At each iteration, $t$ , each individual in the population corresponds to a potential solution to a search problem that needs to be solved, and they are evaluated to give their fitness (a measure of adaptability). At iteration $t+1$ , a new population is generated by selecting some (typically the fittest) of the individuals from the population and reproducing them, sexually or asexually. Genetic variable through mutation can affect some of the offspring. The end of this process constitutes a generation.

An example of a simple EA is presented below. An initialization parameter is typically used to generate the initial population. Parameters $pc\ \&\ pm$ is also used, corresponding to the crossover (recombination) and mutation (variation) probabilities. The stopping criteria is typically a maximum number of generations, or once the specified objective function is achieved.

A standard evolutionary algorithm

procedure [P] = standard_EA(pc, pm) initialize P f <- eval(P) P <- select(P,f) t <- 1 while not_stopping_criterion do P <- reproduce(P,f, pc) P <- variate(P,pm) P <- eval(P) P <- select(P,f) t <- t + 1 end while end procedure

Most EAs can be represented using this standard algorithm, with the difference lying in the representation, selection, reproduction and variation operators, as well as in the order that they are applied.

Genetics Algorithms

Genetic algorithms (GA) borrow vocabulary directly from natural genetics. The data structure representing the individuals in the population is often called chromosomes. In standard GAs, the individuals are represented as binary digits, ${0, 1}$ (bitstrings). Each unit of a chromosome is called a gene, and their location in the chromosome is called a locus. Different forms a gene can assume ( $0\ \&\ 1$ when bitstrings) are called alleles.

Chromosome

Image above shows a bitstring of length 10, representing a standard chromosome in GA. The meaning of a particular chromosome (its phenotype) is defined according to the search problem - they present a potential solution to the problem. Each one of the chromosomes is assigned a probability of reproduction, $p_i, i,...,N$ , such that its likelihood of been selected is directly proportional to its fitness, in comparison with other chromosomes in the population. The higher the fitness, the higher the likelihood of selection. Selection is typically performed via an algorithm called fitness proportionate selection, (aka Roulette Wheel selection).

Crossover

Mutation

The selected chromosomes generate offspring using genetic operators, such as crossover or bit-mutation (shown in images above). In the crossover operator, $r$ is randomly selected from $\{1...l-1\}$ , where $l$ is the length of the chromosome. These operators are also presented in the previously shown standard EA algorithm, as reproduce, and variate. Although there is still a chance of the fittest chromosomes been eliminated, i.e. not making it to the genetic operation, they still have a much higher chance of selection, because they occupy the largest proportion of the “roulette wheel”.

Applications of genetic algorithms include pattern recognition and numeric function optimization.

Genetics Programming

Genetic programming (GP) is a type of evolutionary algorithm that was developed as an extension of genetic algorithms. The data structure used in GP are representations of computer programs, and the fitness function involves executing the programs. In essence, GP is a search based on the evolution of the possible spaces of a computer program, such that when run, it will produce a suitable output that is related to solving a specified problem.

Similar to other EA, in GP, a problem is defined by its representation, and the specification of an objective function. The representation involves choosing an appropriate function set, F, and an appropriate terminal set, T. The functions set are those that are believed to have been previously useful and sufficient, while the terminal set is usually represented as variables or constants.

The primary motivation of GP was getting computers to solve problems without explicitly programmed to. In the original sense, a computer program follows the functional paradigm, i.e. it is the application of a sequence of functions to arguments. The computer programs in the specified language become the individuals in the population that can be represented as a parse tree such as S-expressions as used in Lisp programming. The individuals are executed to obtain their phenotype. Two things have to be kept in mind during the execution:

Syntactic closure: we have to examine all the values returned by the functions and terminals, such that they can all accept as arguments the values and data typed by a function from F, or a terminal from T.
Sufficiency: Define a search space that corresponds to finding F and T. The search space has to be large enough, and the F and T should be sufficient from the problem domain.

Types of functions that could be used are arithmetic functions (e.g. +, *, -, /), logic functions (e.g. AND, OR, NAND) and programming functions. The terminals can be variables constants or functions that do not receive any parameter.

GP Crossover

S-expressions - parent 1: (+ (* 2 y)(/ x 4)) ; parent 2: (-(* 2 4) 6) offspring 1: (+ 6 (/ x 4)) offspring 2: (- (* 2 4) (* 2 y))

An example of a GP crossover is shown above (arrow at crossover points). The offspring programs are made up of sup trees from the parents and may have shapes that are quite different from the parents. Similar to GA, two S-expressions are required as input for the crossover operation. However, recombination cannot be random, as it needs to be a valid syntax of the language in use. This means that different crossover points can be chosen for each of the parents, as shown in the above example. In this example F = {+, -, *, /}, and T = {1, 2, 3, 4, 5, 6, x, y}.

Although mutation is also possible, it is seldom used. It could be quite useful for the purpose of diversity and dynamic variability of the programs. The image below shows a mutation operation in GP.

GP Mutation

S-expressions - original: (+ 6 (/ x 4)) mutated: (+ 6 (+ x 4))

A possible application of GP is in the design of classifiers.

Other EAs: Evolution Strategies and Evolutionary Programming

Evolution strategies (ES) were designed to solve problems in fluid mechanics, and later generalized for parameter optimization problems. The data structure employed in ES corresponds to a real-valued vector. In the general case, an individual is represented as $\mathbf{v} = (\mathbf{x}, \sigma, \theta)$ , where $\mathbf{x}$ represents the attribute symbol, and $\sigma$ and $\theta$ are the set strategy parameters.

Evolutionary programming was introduced as an alternative form of artificial intelligence. In an environment that’s considered to be a finite sequence of symbols from a finite set of alphabets, an intelligent behaviour was believed to:

prediction a certain environment;
have an appropriate response to the prediction.

Given a well-defined cost function, the evolutionary task therefore is that of evolving an algorithm that is capable of operating the sequence of symbols in the environment, in order to produce an output that maximizes the algorithm performance in relation to the next input.

We will not not be exploring the selection, crossover and mutation of these algorithms in this post.

When should we consider using evolutionary algorithms?

Given a problem, some key points have to be considered before we consider using EAs:

If the search space is large, not smooth nor unimodal, or the fitness function is noisy;
For smooth or unimodal search space, gradient or hill-climbing algorithms performs much better than EAs;
If the search space is well known (i.e. TSP), heuristics can be introduced to augment any of the EAs to enable it to perform better.

The applications of EA are divided into 5 broad categories:

Planning (i.e. routing, scheduling, packing)
Design (i.e. signal processing)
Simulation, identification and control (i.e. general plant control)
Classification (i.e. machine learning, pattern recognition)
Other applied fields such as arts, engineering and language.