Genetic algorithms: a simple R example

Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature.

GA generates a population, the individuals in this population (often called chromosomes) have a given state. Once the population is generated, the state of these individuals is evaluated and graded on their value. The best individuals are then taken and crossed-over – in order to hopefully generate 'better' offspring – to form the new population. In some cases the best individuals in the population are preserved in order to guarantee 'good individuals' in the new generation (this is called elitism).

The GA site by Marek Obitko has a great tutorial for people with no previous knowledge on the subject.

To explain the example I will use my version of the Knapsack problem.

You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of “survival points”. You’re objective is to maximize the number of survival points.

The following table shows the items you can choose from.










item survivalpoints weight
pocketknife 10.00 1.00
beans 20.00 5.00
potatoes 15.00 10.00
onions 2.00 1.00
sleeping bag 30.00 7.00
rope 10.00 5.00
compass 30.00 1.00

In R I have used the package genalg to set-up the model. Later on, ggplot2 will be used to visualize the evolution of the model.

Let's define the dataset and weight constraint;

library(genalg)
library(ggplot2)

dataset <- data.frame(item = c("pocketknife", "beans", "potatoes", 
    "onions", "sleeping bag", "rope", "compass"), survivalpoints = c(10, 20, 
    15, 2, 30, 10, 30), weight = c(1, 5, 10, 1, 7, 5, 1))
weightlimit <- 20

Before creating the model we have to set-up an evaluation function. The evaluation function will evaluate the different individuals (chromosomes) of the population on the value of their gene configuration.

An individual can for example have the following gene configuration: 1001100.

Each number in this binary string represents whether or not to take an item with you. A value of 1 refers to putting the specific item in the knapsack while a 0 refers to leave the item at home. Given the example gene configuration we would take the following items;

chromosome = c(1, 0, 0, 1, 1, 0, 0)
dataset[chromosome == 1, ]
##           item survivalpoints weight
## 1  pocketknife             10      1
## 4       onions              2      1
## 5 sleeping bag             30      7

We can check to what amount of surivival points this configuration sums up.

cat(chromosome %*% dataset$survivalpoints)
## 42

Above we gave a value to the gene configuration of a given chromosome. This is exactly what the evaluation function does.

The genalg algorithm tries to optimize towards the minimum value. Therefore, the value is calculated as above and multiplied with -1. A configuration which leads to exceeding the weight constraint returns a value of 0 (a higher value can also be given).

We define the evaluation function as follows.

evalFunc <- function(x) {
    current_solution_survivalpoints <- x %*% dataset$survivalpoints
    current_solution_weight <- x %*% dataset$weight

    if (current_solution_weight > weightlimit) 
        return(0) else return(-current_solution_survivalpoints)
}

Next, we choose the number of iterations, design and run the model.

iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01, 
    elitism = T, evalFunc = evalFunc)
cat(summary.rbga(GAmodel))
## GA Settings
##   Type                  = binary chromosome
##   Population size       = 200
##   Number of Generations = 100
##   Elitism               = TRUE
##   Mutation Chance       = 0.01
## 
## Search Domain
##   Var 1 = [,]
##   Var 0 = [,]
## 
## GA Results
##   Best Solution : 1 1 0 1 1 1 1 

The best solution is found to be 1101111. This leads us to take the following items with us on our trip into the wild.

solution = c(1, 1, 0, 1, 1, 1, 1)
dataset[solution == 1, ]
##           item survivalpoints weight
## 1  pocketknife             10      1
## 2        beans             20      5
## 4       onions              2      1
## 5 sleeping bag             30      7
## 6         rope             10      5
## 7      compass             30      1

This in turn gives us the total number of survival points.

# solution vs available
cat(paste(solution %*% dataset$survivalpoints, "/", sum(dataset$survivalpoints)))
## 102 / 117

Let's visualize how the model evolves.

animate_plot <- function(x) {
    for (i in seq(1, iter)) {
        temp <- data.frame(Generation = c(seq(1, i), seq(1, i)), Variable = c(rep("mean", 
            i), rep("best", i)), Survivalpoints = c(-GAmodel$mean[1:i], -GAmodel$best[1:i]))

        pl <- ggplot(temp, aes(x = Generation, y = Survivalpoints, group = Variable, 
            colour = Variable)) + geom_line() + scale_x_continuous(limits = c(0, 
            iter)) + scale_y_continuous(limits = c(0, 110)) + geom_hline(y = max(temp$Survivalpoints), 
            lty = 2) + annotate("text", x = 1, y = max(temp$Survivalpoints) + 
            2, hjust = 0, size = 3, color = "black", label = paste("Best solution:", 
            max(temp$Survivalpoints))) + scale_colour_brewer(palette = "Set1") + 
            opts(title = "Evolution Knapsack optimization model")

        print(pl)
    }
}

# in order to save the animation
library(animation)
saveMovie(animate_plot(), interval = 0.1, outdir = getwd())

Animated GA graph

The x-axis denotes the different generations. The blue line shows the mean solution of the entire population of that generation, while the red line shows the best solution of that generation. As you can see, it takes the model only a few generations to hit the best solution, after that it is just a matter of time until the mean of the population of subsequent generations evolves towards the best solution.

For more information on genetic algorithms, check out:

August 1, 2012Permalink 22 Comments

22 thoughts on “Genetic algorithms: a simple R example

  1. Great example. Have you considered trying to find the maximum survival points for the least weight (aka maximizing your bang for your buck)? Do you think the best approach be to edit the eval function or create another independent cost function?

    • Is your suggestion to change the objective to minimize the weight? The model could easily be changed to achieve this. However, instead of a weight limit a constraint should be set on the minimum number of survival points to achieve.

      The model as it is now will try to locate the items with the “most bang for the buck” and select those with the highest ranking until the weight limit is fulfilled. As you can see, it leaves the product with the least number of surival points per kg, potatoes, behind.

      When building more complex models it would definitely be a good idea to create a seperate cost function.

  2. Interesting article but I don’t quite get it. Isn’t the weight of the items in the solution 25 kg? The limit was set to 20 kg.

    • Thanks for the remark Peter. I played around with different weight limits while writing the post. This was a remnant of when I had set the limit to 25 kg. I’ve corrected the post.

  3. Try this:
    library(lpSolve)
    f.obj <- c(10, 20,15, 2, 30, 10, 30)
    f.con <- matrix(c(1, 5, 10, 1, 7, 5, 1),1,7,byrow=TRUE)
    f.dir <- c("<=")
    f.rhs <- c(20)
    s1

  4. s1 <- lp ("max", f.obj, f.con, f.dir,f.rhs,binary.vec=1:7)
    lp ("max", f.obj, f.con, f.dir,f.rhs,binary.vec=1:7)$solution

    • You’re absolutely right, this example could just as easily be solved using an IP model. However, the example is just for demonstration purposes. In fact, an IP model would probably be better suited for this particular problem as its solution will be a global optimum, whereby with GA it is possible to get stuck in a local optimum. I’ve written a small post on using the lpSolveApi in R (http://goo.gl/xj7mM).

  5. FYI, I want to add that a new R package GA is available on CRAN to deal with genetic algorithms.
    Using such a package we may solve the above knapsack problem as follows:

    library(GA)

    knapsack <- function(x)
    {
    f <- sum(x * dataset$survivalpoints)
    penalty <- sum(dataset$weight) * abs(sum(x*dataset$weight) – weightlimit)
    f – penalty
    }

    GA <- ga("binary", fitness = knapsack, nBits = nrow(dataset),
    maxiter = 1000, run = 200, popSize = 20)
    summary(GA)
    plot(GA)

    dataset[GA@solution == 1, ]
    sum(dataset$weight*GA@solution)

    A paper about the GA package should appear on JSS.

    Luca Scrucca

    • Thanks for the tip Luca. Its good to see a new actively maintained GA package. When I’ll write a next GA related post, I’ll make sure to incorporate it.

  6. Pingback: Genetic algorithms: a simple R example « Another Word For It

  7. In order to better understand the logic behind this sort script I’ve tried to write my own that select the best (or the group of the best) predictor variables as an input of a lm() model.

    As I think in this case the return value of evalFunc() should be r.square or sum of squeare error, and the input should be a list of the candidate varialbes; But I couldn’t figure out how to bring this together.

    I think this is a tipical sort of problem, and maybe exist another library (sricpt) for this, but
    I would like to leart more about r language, so if You can show me a solution for this that makes my r
    knowlege better.

    (ps. sorry for my english)

    • Hi Istvan, sorry for the late reply!
      Basically your evalFunc() should return a good value if the selected variables are ‘good’. So if you base being good on r-square or the sum of squares, your evalFunc() could basically just return one of these values times ‘-1′ (because the algorithm in the genalg package optimizes towards the minimum). It would be worth checking out the new GA package on CRAN (on which more recent development has occurred).
      If you can’t figure it out, you’re free to send me a code snippet and I’ll have a look at it.

  8. Thank you very much for your interesting post.

    I want to know what if the gene must be integer in such a way that chromosome becomes a vector of integers.
    I understand the cost function should do its work but the GA algoritm will fail as genalg only allows float (rbga) or binary (rbga.bin).

    Is there any trick to manage the issue, apart of try to manage the integer as binary and to define gene as 32*num_integers_need and managing to move in and out inside the cost function ?

    Thankx in advance

    JB

  9. Pingback: glass spigot

  10. Thank you for this easy-to-understand example of using the genalg package in R. My question is: can you give me your thoughts on the advantages and disadvantages of the various R genetic algorithm packages, e.g. genalg, GA, rgp, rgenoud, etc.? Or point me to someone else’s overview of the packages? Thank you.

    • Hi Ed, I don’t really have a well-founded answer to that as I am not familiar with all the packages. I guess it depends on where you want to use your application for. If it’s going to be used in a production environment: how active is the package maintained, and is this by a single person or a group. If you want to use in in a more experimental environment, how easy is it to extend the code and add new features? (I guess this is applicable to a lot of packages…) I would suggest submitting a detailed question to the respective authors of the packages.

  11. Great demonstration! Thx.

    I am currently using rbga function. However, I am wondering if the candidates have to be integers, rather than floating values, what should I do?

    Thanks in advance. Looking forward to seeing your reply:)