Thursday, April 26, 2012

Bob Marks grossly misunderstands “no free lunch”

And so does Bill Dembski. But it is Marks who, in a “Darwin or Design?” interview, reveals plainly the fallacy at the core of his and Dembski's notion of “active information.” (He gets going at 7:50. To select a time, it's best to put the player in full-screen mode. I've corrected slips of the tongue in my transcript.)

[The “no free lunch” theorem of Wolpert and Macready] said that with a lack of any knowledge about anything, that one search was as good as any other search. [14:15]

And what Wolpert and Macready said was, my goodness, none of these [“search”] algorithms work as well as [better than] any other one, on the average, if you have no idea what you're doing. And so the question is… and what we've done here is, if indeed that is true, and an algorithm works, then that means information has been added to the search. And what we've been able to do is take this baseline, that all searches are the same, and we've been able to, in cases where searches work, measure the information that is placed into the algorithm in bits. And we have looked at some of the evolutionary algorithms, and we found out that, strikingly, they are not responsible for any creation of information. [14:40]

And according to “no free lunch” theorems, astonishingly, any search, without information about the problem that you're looking for, will operate at the same level as blind search. And that's... It's a mind-boggling result. [28:10]

Bob has read into the “no free lunch” (NFL) theorems what he believed in the first place, namely that if something works, it must have been designed to do so. Although he gets off to a good start by referring to the subjective state of the practitioner (“with a lack of knowledge,” “if you have no idea what you're doing”), he errs catastrophically by making a claim about the objective state of affairs (“one search is as good as any other search,” “all searches are the same”).

Does your lack of knowledge about a problem imply that all available solution methods (algorithms) work equally well in fact? If you think so, then you're on par with the Ravenous Bugblatter Beast of Traal, “such a mind-bogglingly stupid animal, it assumes that if you can't see it, it can't see you.” Your lack of knowledge implies only that you cannot formally justify a choice of algorithm. There not only may be, but in practice usually will be, huge differences in algorithm performance.

What boggles my mind is that Marks and Dembski did not learn this from Wolpert and Macready (1997), “No Free Lunch Theorems for Optimization.” In Section III-A, the authors observe that “it is certainly true that any class of problems faced by a practitioner will not have a flat prior.” This means that some problems are more likely than others, and the NFL theorems do not hold in fact. So what is the significance of the theorems?

First, if the practitioner has knowledge of problem characteristics but does not incorporate them into the optimization algorithm, then... the NFL theorems establish that there are no formal assurances that the algorithm chosen will be at all effective. Second, while most classes of problems will certainly have some structure which, if known, might be exploitable, the simple existence of that structure does not justify choice of a particular algorithm; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification. [emphasis mine]
So don't take my word for it that Bob has twisted himself into intellectual contortions with his apologetics. This comes from an article with almost 2600 citations. If memory serves, Marks and Dembski have cited it in all 7 of their publications.

Marks and Dembski believe, astonishingly, that the NFL theorems say that an algorithm outperforms “blind search” only if some entity has exploited problem-specific information in selecting it, when the correct interpretation is that the practitioner is justified in believing that an algorithm outperforms “blind search” only if he or she exploits problem-specific knowledge [justified true belief, not just information] in selecting it. This leads them to the fallacious conclusion that when a search $s$ outperforms blind search, they can measure the problem-specific information that an ostensible "search-forming process” added to $s$ to produce the gain in performance. They silently equate performance with information, and contrive to transform the gain in performance into an expression that looks like gain of Shannon information.

Their name-game depends crucially on making the outcome of a search dichotomous — absolute success (performance of 1) or absolute failure (performance of 0). Then the expected performance of a search is also its probability of success. There is a probability $p$ that blind search solves the problem, and a probability $p_s > p$ that search $s$ solves the problem, and the ratio $p_s / p$ is naturally interpreted as performance gain. But to exhibit the “added information” (information gain), Marks and Dembski do a gratuitous logarithmic transformation of the performance gain, $$I_+ = \log \frac{p_s}{p} = \log p_s - \log p = -\!\log p + \log p_s,$$ and call the result active information. (The last step is silly, of course. Evidently it makes things look more “Shannon information-ish.”) To emphasize, they convert performance into “information” by sticking to a special case in which expected performance is a probability.

Here's a simple (in)sanity check. Suppose that I have a “pet” algorithm that I run on all problems that come my way. Obviously, there's no sense in which I add problem-specific information. But Marks and Dembski cherry-pick the cases in which my algorithm outperforms blind search, and, because active information is by definition the degree to which an algorithm outperforms blind search, declare that something really did add information to the algorithm.

Now, a point I'll treat only briefly is that Marks and Dembski claim that the cases in which my pet algorithm greatly outperforms blind search are exceedingly rare. The fact is that they do not know the distribution of problems arising in the real world, and have no way of saying how rare or common extreme performance is for simple algorithms. In the case of computational search, we know for sure that the distribution of problems diverges fabulously from the uniform. Yet Marks and Dembski carry on about “Bernoulli's Principle of Insufficient Reason and Conservation of Information in Computer Search,” doing their damnedest to fob off subjective assignment of uniform probability as objective chance.

A bit of irony for dessert [35:50]:

Question: Are you getting any kind of response from the other side? Are they saying this is kind of interesting, or are they kind of putting stoppers in their ears? What's going on?

Answer: It's more of the stoppers in the ears thus far. We have a few responses on blogs, which are unpleasant, and typically personal attacks, so those are to be ignored. We're waiting for, actually, something substantive in response.


  1. Tom:

    I think you've misunderstood Wolpert and Macready's quote. The first sentence of the quote, if turned around, would say that there is a "formal assurance" that the practitioner can have if they have knowledge of a structure and use that knowledge in selecting an algorithm.

    How am I making a mistake?

    1. You haven't stated a contradiction. Your mistake is to think that you have.

      The performance of an algorithm on a problem is an objective fact. It depends only on the algorithm and the problem. The correctness of these statements is obvious. You must contradict them to claim that the performance of the algorithm on the problem depends on an extrinsic entity.

      The passage from Wolpert and Macready addresses what the practitioner can prove about the performance of an algorithm on a problem. Again, the practitioner is irrelevant to the objective fact of the algorithm's performance on the problem.

    2. Tom, we'll likely continue to disagree, but it seems to me that W&M are saying that "if" the practitioner chooses an algorithm based on their knowledge of the structure of what is being searched, and using their knowledge of how an algorithm operates, then they would have "formal assurance" that the algorithm would outperform a blind search. In the case of incredibly large searches, this "knowledge" is never present, and hence, no one can outperform a blind search.

      In the case of a blind search of stupendously large spaces, the practitioner would have to demonstrate that he/she had this knowledge. Isn't this what you're saying? IOW, the "practitioner is irrelevant to the objective fact of the algorithm" unless they can prove having knowledge of both the type of space being explored, and how best to explore it.