Wednesday, December 16, 2009

Work is not information

Suppose you are searching for a DNA sequence that translates to a protein with certain properties. To simplify matters, let's say you know magically that the sequence is 300 bases in length. You might roll a fair four-faced die 300 times to guess a sequence, and then conduct an experiment to see if the guess is correct. Clearly, there is no knowledge of the specific problem manifest in the die-rolling procedure.

Now let's say you do 1000 repetitions of the die-rolling procedure, and conduct 1000 experiments to check the randomly generated DNA sequences. The probability of "hitting the target" with a "query" of nature increases. But does this mean that you exploited more knowledge about the problem than you did previously? Of course not. It means only that you did more work.

In their recently published article, Conservation of Information in Search: Measuring the Cost of Success, Bill Dembski and Bob Marks sometimes measure work and call it active information. They claim,
Active information captures numerically the problem-specific information that assists a search in reaching a target.
But in Section III.A they demonstrate otherwise:
Multiple queries clearly contain more information than a single query. Active information is therefore introduced from repeated queries.
Now wait just a gol-dern minute, Billy Bob! The first sentence says that a procedure doing more work yields more information about the solution to the problem than a procedure doing less work. The second says that the procedure doing more work has more more information about how to solve the specific problem than a procedure doing less work. The error here is not just equivocal use of the word information. The authors go on to calculate a gain in active information for repetition of an utterly uninformed procedure.

Dembski and Marks commit the same errors in Section III.F.1, where they show that increasing the number of offspring in an evolutionary search increases the probability of obtaining an offspring more fit than the parent. To make this probability gain into active information, they redefine the search as just one generation of the evolutionary search they originally considered.

The fact that you can gain information by doing work does not imply that work is itself information.

[1/2/2010: I contradicted myself in saying "repetition of an utterly uninformed procedure" after predicating that "you know magically that the sequence is 300 bases in length." In a forthcoming post, I will explain that what Dembski and Marks call the endogenous information of a search problem seems endogenous only if one ignores magical circumscription of the solution space.]

Thursday, December 10, 2009

Blunder in the new Dembski-Marks paper

The recently published conference paper of Dembski and Marks, Bernoulli’s Principle of Insufficient Reason and Conservation of Information in Computer Search contains an enormous, undebatable, and embarrassing error in the argument regarding the so-called "search for a search."

Search algorithms explore a finite space of solutions Ω. An algorithm basically selects a solution in Ω, examines the solution, and decides which solution to select next. It repeats this until it has made Q selections. It is crucial to the arguments of Dembski and Marks that search algorithms have the capacity for randomizing their decisions. To randomize a decision is to base it, at least in part, on inputs from a random source. (Think of flipping a fair coin repeatedly and feeding the sequence of outcomes into a computer program. The inputs would permit the program to randomize its decisions.)

Dembski and Marks believe that people search for search algorithms in a higher-order space Ω2. They write,
Let Ω2 be the finite space of all search algorithms on the search space Ω.
The word I've emphasized is wrong, wrong, and wrong. The set of all randomized search algorithms is infinite, not finite. Section III.A and the "proof" of Equation (7) in Appendix B are based on a false premise.

Consider making just one randomized decision between alternative solutions ω1 and ω2 in Ω, with the probability of choosing ω1 equal to p. The set of all possible values for p is infinite, and therefore the set of all randomized algorithms for making the decision is infinite.

Some technical details

For a fixed representation of randomized algorithms (e.g., probabilistic Turing machines reading i.i.d. uniform bits from an auxiliary tape), the set of probabilities p possible in a randomized dichotomous decision is countably infinite. There is no upper bound on the size and running time of the randomized decision algorithm.

The set of probabilities [0, 1] is uncountably infinite. Thus, while every randomized search algorithm implements a random search process, vanishingly few random search processes correspond to randomized search algorithms. Relatively few randomized search algorithms are fast enough and small enough to solve a problem in practice. We humans necessarily prefer small randomized search algorithms that make decisions rapidly.