I was able to introduce multi-core functionality using ‘pprocess multicore library‘ (Thanks for the pointer Arun!). This effort required roughly four days research and coding, as it was my very first go. In the end, it is relatively simple, depending upon the code in which it is introduced. I included a user-defined core quantity, and the ability to modify the number of cores engaged during runtime.
The key to multi-core functionality is getting your head wrapped around protected memory spaces. That is, each core assigned by pprocess (or any multi-core library) will reproduce a fully functional copy (instance) of the section of your code which you are spawning on each core.
Each of these instances can read from any global variable, no problem. However, they cannot write back to global variables as all of them would try to write back to the same variable at the same time, known as a race condition. Bad things would happen.
It is imperative to keep in mind that what happens in each instance will not affect what happens in the other instances, on the other cores. Once spawned, they are all independent even though they work with the same variable names and are processed by copies of the same code.
If the returned values are to affect something, that something must be outside of the multi-core environment, post-collection. This may require some rearranging of your code.
For me, it meant that instead of a fitness = fitness + 1 inside the pprocess pipeline, I returned the single value of fitness for each instance on each core, and then conducted the sum function when I collect the results from pprocess.
Below I offer an example of the original for loop and the pprocess which replaces it, for multi-core support. I have left the single core for loop in place as a user invoked bypass of pprocess when the overhead of multi-core processing reduces performance (common on non-CPU intensive runs or on a limited number of cores).
In my code, each GP tree must evaluate all rows in the given data. I therefore employ a for loop to iterate through each row in previously loaded .csv file (not shown in this example). The Python method (function) fitness_eval() is what conducts the evaluations.
As fitness_eval itself calls another two Python methods (3 methods in total), the key was to rewrite my code such that any global variables for which values are updated (written to) were made local instead, passed from method to method directly. In the end, this makes for better code, forcing me to rethink the way I had designed this and others sections.
if cores == 1: # employ only one CPU core and bypass ‘pprocess’
for row in range(0, data_rows): # increment through all rows in the data
fitness = fitness + fitness_eval(row) # evaluate tree fitness
else: # employ multiple CPU cores using ‘pprocess’
results = pp.Map(limit = self.cores)
parallel_function = results.manage(pp.MakeParallel(fitness_eval))
for row in range(0, data_rows): # increment through all rows in data
parallel_function(row) # evaluate tree fitness
fitness = sum(results[:]) # ‘pprocess’ returns the fitness scores in a single dump
In this example, the pprocess method parallel_function() is a wrapper for my original method fitness_eval(), such that I pass the incrementing variable row through parallel_function(row) which in turn hands if off to multiple copies of fitness_eval(). Cool, huh?! :)
One benefit of pprocess, as compared to the Python mulitiprocessing in Python library, is that pprocess can pass more than one variable through the called methods without using a pickle (a package used to send variables to and from methods while in mulit-core memory spaces). So, you can treat your methods just as you would on a single core, sending and returning variables using method(var_1, var_2, var_n) and subsequent return var_1, var_2, var_n.
But keep in mind what I shared above, about how those variables can only return an isolated value per instance, as changes to each of those variables on each core will not affect the other instances. Make sense?