Wednesday, July 28, 2010

Creeping elegance, or shameless hacking?

In my previous post, I did not feel great about handling processes in the following code for selection, but I did not see a good way around it.


from heapq import heapreplace

def selected(population, popSize, nSelect, best = None):
    if best == None:
        best = nSelect * [(None, None)]
    threshold = best[0][0]
    nSent = 0

    for process in processes:
        if nSent == popSize: break
        process.submit(population[nSent], threshold)
        nSent += 1

    for process, x, score in scored(popSize):
        if score > threshold:
            heapreplace(best, (score, x))
            threshold = best[0][0]
        if nSent < popSize:
            process.submit(population[nSent], threshold)
            nSent += 1

    return best


What I really want is for selected to know nothing about parallel processing, and for the generator of scored individuals to know nothing about selection. The problem is that threshold changes dynamically, and the generator needs a reference to it. As best I can tell, there are no scalar references in Python. Having taught LISP a gazillion times, I should have realized immediately that I could exploit the lexical scoping of Python, and pass to the generator a threshold-returning function defined within the scope of selected.


def selected(self, population, nSelect, best = None):
    if best is None:
        best = nSelect * [(None, None)]

    threshold = lambda: best[0][0]

    for x, score in evaluator.eval(population, threshold):
        if score > best[0][0]:
            heapreplace(best, (score, x))

    return best


Perhaps I should not be blogging about my first Python program. Then again, I’m not the worst programmer on the planet, and some folks may learn from my discussion of code improvement. This go around, I need to show you the generator.


def eval(self, population, threshold):
    popSize = len(population)
    nSent = 0

    for process in self.processes:
        if nSent == popSize: break
        process.submit(population[nSent], threshold())
        nSent += 1

    for unused in population:
        process = self.processes[self.isReady.recv()]
        yield process.result()
        if nSent < popSize:
            process.submit(population[nSent], threshold())
            nSent += 1


This is a method in class Evaluator, which I plan to release. No knowledge of parallel processing is required to use Evaluator objects. The __init__ method starts up the indexed collection of processes, each of which knows its own index. It also opens a Pipe through which processes send their indexes when they have computed the fitness of individuals submitted to them. The Evaluator object’s Connection to the pipe is named isReady.

The first for loop comes from the original version of selected. Iteration over population in the second for loop is just a convenient way of making sure that a result is generated for each individual. In the first line of the loop body, a ready process is identified by receiving its index through the isReady connection. Then the generator yields the result of a fitness evaluation. The flow of control stops flowing at this point, and resumes only when selected returns to the beginning of its for loop and requests the next result from the eval generator.

When execution of the generator resumes, the next unevaluated individual in the population, if any, is submitted to the ready process, along with the value of a call to the threshold function. The call gives the current value of best[0][0], the selection threshold.

By the way, the Pipe should be a Queue, because only the “producer” processes, and not the “consumer” process, send messages through it. But Queue is presently not functioning correctly under the operating system I use, Mac OS X.

No comments :

Post a Comment