Monday, August 25, 2014

The censorial anti-ID activist is… Bob O'Hara?

Those of you acquainted with Bob O’Hara have already busted a gut laughing. There is not, as best I can tell, a censorial bone in his body. He is the mildest of the pack nipping at the heels of ID creationists. As I recall, he even respects their wish not to be called creationists.

A year ago, ID-creationism advocate Casey Luskin alternately detailed and insinuated everything he ever wanted you to believe about Springer’s abandonment of the creationist-edited volume Biological Evolution: New Perspectives. And what he insinuated about Bob O’Hara was ugly.

[A]pparently [Matzke’s] post generated a lot of complaints to Springer from people who didn’t want the company to publish a book with articles sympathetic to ID. For example, one of Matzke’s Panda’s Thumb followers, statistician Bob O’Hara, reported that “I’ve been in contact with one of the editors at Springer, so they’re now certainly aware of the situation.” Within a day or two, Springer had removed its page for Biological Information: New Perspectives from its website.
Luskin cutely juxtaposes Bob’s comment with the removal of the book announcement, inviting you to read the worst into it. But this isn’t strong enough a rhetorical trick for the political magazine Human Events, which brings us “Powerful Conservative Voices.” So the powerful Robert J. Marks II resorted, apparently, to embellishment in his article of last week, Biological Information: New Perspectives from Intelligent Design:
Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.
I now have Bob’s permission not only to invoke his name, but also to reveal how he bullied a senior editor at Springer US on February 27, 2012. (He sent me a copy of his note, three days later.) After identifying the book, he wrote:
This has the potential to be a controversial text (as the editors are all active in pushing Intelligent Design), so I'm wondering why it's being published as an engineering text, rather than biology: it would seem to be a better fit there.
Gee. No threat. No doomsaying. No claim to know anything about what he had not yet seen. I see a flat statement of fact, followed by a gentle suggestion that the book was misclassified. Look back at what Marks wrote, and consider the warped mind that would concoct such propaganda.

Did Springer drop the title because of outside pressure? I doubt it highly. As I explained in my last post, the book deal was shady from the get-go. The most that Bob O’Hara did was to shine light on it. Predictably, the Pharisees focus on the alleged breach of contract instead of their own dubious ethics.

[Edit. I just learned that Marks has deleted an erroneous erratum from the online copy of an article, leaving no indication that one of the two highlighted theorems is severely botched. The kicker is that Marks received, long before submitting the paper to the journal that published it, a clear explanation of the error. His correspondent CC'd me! See “The theorem that never was: Diversionary ‘erratum’ from Dembski and Marks.”]

Saturday, August 23, 2014

You’re making things up again, Robert J. Marks II

One of the “20 Most Influential Christian Scholars,” the distinguished professor who approved a master’s thesis that plagiarized his own publications, your favorite whited sepulcher and mine, Robert J. Marks II, is making things up again. This time it’s a tale of censorship of the creationist-edited volume Biological Information: New Perspectives, told through a conservative political outlet, Human Events. Marks embellishes and contradicts what Casey Luskin, an intelligent-design advocate at the Discovery Institute, reported a year ago. He would have you believe:

Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.
Luskin flatly contradicts the first statement. And I’ve had, since March 2, 2012, a copy of the email that the “anti-ID activist” sent to Springer. It is utterly devoid of what Marks attributes to it [see for yourself]. Here is what I think is significant: the author sent the note to a Ph.D. scientist-editor at Springer US. But Springer DE was handling the book, according to Luskin. [The editor did not reply.] It may well be that New York pushed the panic button in Heidelberg. In any case, we know for sure:


You’re making things up again, Robert


Marks points out that Springer reneged on the contract, but somehow forgets to mention that he knew from the outset that the deal was shady. According to Luskin, it was Dembski who proposed to an editor of a Springer engineering series on intelligent systems that Biological Information: New Perspectives be included. Dembski would think that he could talk a fast line to justify it. He thinks that about everything he does. But Marks’ field is intelligent systems. And so is mine. He knew just as well as I did that it was wrong to dump the book into that series. A big threat to the publisher’s reputation, I think, was that institutions buying all volumes, expecting them to be about intelligent systems as advertized, would scream loudly when they got creationism instead.

I can’t resist amplifying Marks’ first sentence.

A diverse group of [secretly invited] scientists [many of whom were not scientists] gathered at [but not under the auspices of] Cornell University in 2011 to discuss their research [not peer-reviewed] into the nature and origins of biological information [loosely interpreted].
My initial response to the proceedings of the enclave is here. I wish that I’d given more emphasis to the fact that, despite all of Marks’ bragging about the attendees, the organizers didn’t, well, organize them to review the papers of their peers. The symposium was pretty much a group-hug. There were many presentations by John Sanford, who is busy setting up simulations to show that genomes only deteriorate — and rapidly, at that. He’s confident that our species won’t survive to the end of the century. It’s all science, of course. But he does hope to persuade you that the End Time is at hand.

Marks and Luskin carry on about their contract. But they don’t seem terribly anxious to admit that they consort with people who, if their madness did not align with established religions, would be locked up. At present, a guy I know well is heavily preoccupied with off-brand religion, and is in the protective custody of the state. He seems no crazier to me than Sanford. (Watch this if you think I’m exaggerating.) And the injustice of the difference in society’s treatment of him and it’s treatment of a YEC with an elaborate delusional system is weighing heavily on my mind.

Thursday, August 21, 2014

Calling all statisticians and probabilists

The following is a question I posted at the Mathematics Stack Exchange. Folks who see it here are likely to understand that my "Making Less of No Free Lunch" campaign is intended as much to hose down a creationist cloud of dust as to rain on the NFL parade. Please contribute a bit of your expertise to a worthy cause.


I hope to replace a proof of my own, in a paper explaining that the "no free lunch" theorems for optimization actually address sampling and statistics, with a reference to an existing result on sampling. The random selection process $$X_1, X_2, \dotsc, X_n$$ over the domain of the random objective function $F$ is statistically independent of the selected values, $$F(X_1), F(X_2), \dotsc, F(X_n),$$ despite the functional dependence $$X_i \equiv X(F(X_1), F(X_2), \dotsc, F(X_{i-1})),$$ $1 < i \leq n,$ where $X$ is statistically independent of $F.$ (See Section 0.1 here [or my last post] for more detail.)

To put this in terms of sampling, $F$ associates members of a statistical population with their unknown responses. The sampler $X$ processes data already in the sample to extend the selection. This typically biases the selection. But the selection process is statistically independent of the sequence of responses. That is the justification for referring to the latter as a sample.

This looks like textbook stuff to me: "Data processing biases the selection." But I am awash in irrelevant hits when I Google the terms you see here, and others like them.

Friday, August 15, 2014

Sampling Bias Is Not Information

[Edits 8/19/2014 and 8/20/2014: Although the proof of the so-called “no free lunch” theorem is logically very simple, it looked like a vomited hairball in the first version of this post. I’ve made it considerably more inviting. Note also that the online version of “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch” now has a preface.]

This corrects a post that I retracted, and promised to replace, more than a year ago. It took me only a few days to redo the math. But then I executed a fabulously ADDled drift into composition of a preface for my first paper on “no free lunch” in optimization, planning to use it also as a replacement post. Now I must admit that I am ensnarled in words, and that I don’t know when, or even if, I will break free. Consequently, I am dumping here some fragments that should make sense, though they say much less than I would like to get said.

For those of you who are more interested in ID creationism than in NFL, the highlight is that my references to “conservation of information,” elevated to “English’s Principle of Conservation of Information” by Dembski and Marks in “Conservation of Information in Search: Measuring the Cost of Success,” amount to nothing but failure to recognize that I was discussing statistical independence. Much of what I say in the subsection “Understanding the Misunderstanding of Information” applies not only to me, but also to Dembski and Marks. In fact, I recommend reading the relatively accessible “Bob Marks grossly misunderstands ‘no free lunch’” first.



Abstract

The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. [An “optimizer” selects a sample of the values of the objective function. Its “performance” is a statistic of the sample.] The present work generalizes the result to an uncountable set of distributions. The focus is upon the conservation of information as an optimizer evaluates points [statistical independence of the selection process and the selected values]. It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions. [The paper mistakes selection bias for prior information of the objective function.] Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer. Empirical studies are best regarded as attempts to infer the prior information optimizers have about distributions [match selection biases to constrained problems] — i.e., to determine which tools are good for which tasks.


0. Sampling Bias Is Not Information

This preface (Sect. 0) addresses expository errors in the original paper (English, 1996), but stands as a self-contained report. Specific corrections and amplifications appear in the remainder of the text (Sects. 1–6).


Figure 1: Objective function $ \newcommand{\disteq}{\stackrel{\scriptscriptstyle{\mathcal{D}}}{=}} \newcommand{\Fseq}[2]{% {F_1^{#1}\hspace{-1pt}({#2)}} } \newcommand{\selection}{{X_1^n}} \newcommand{\sample}{{\Fseq{n}{X}}} \newcommand{\Fn}[1]{{\Fseq{n}{#1}}} \newcommand{\wn}{{w_1^n}} \newcommand{\xn}{{x_1^n}} \newcommand{\yn}{{y_1^n}} \newcommand{\zn}{{z_1^n}} \newcommand{\boldx}{\mathbf{x}} \newcommand{\boldw}{\mathbf{w}} \newcommand{\X}{\mathcal{X}} \newcommand{\Xdiamond}{\mathcal{X}_\diamond} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Ydiamond}{\mathcal{Y}_\diamond} F$ and quality measure $\psi$ comprise the optimization problem. An optimizer is a sampler $X,$ along with a reduction $r$ of the selection $\xn$ and the sample $\yn$ to outputs $\tilde{x}_1^m = x_{j_1}, \dotsc, x_{j_m}$ and $\tilde{y}_1^m = y_{j_1}, \dotsc, y_{j_m},$ with $j_1^m$ strictly increasing. NFL analyses make four assumptions: the selection is non-repeating, the reduction is identical for all optimizers under consideration, the output $\tilde{y}_1^m$ depends only on the sample, and quality depends only on $\tilde{y}_1^m.$ Accordingly, write $\tilde{y}_1^m = r(\yn)$ and $\phi_n = \psi(\tilde{y}_1^m).$ Then the quality $\psi(r(\yn))$ of the optimizer’s output is a statistic $\phi = \psi \circ r$ of the sample it generates.


Black-box optimization may be decomposed into generation of a sample of the values of the objective function, and use of the sample to produce a result. “No free lunch” (NFL) analyses assume explicitly that sampling is without repetition, and define quality of the optimization result to be a statistic of the sample (Wolpert and Macready, 1995, 1997). Fig. 1 unpacks the tacit assumptions, the most important of which is that the optimizers under consideration differ only in selection of samples. Fig. 2 shows how postprocessing of the sample, assumed to be identical for all optimizers, is embedded in the statistic. Although the statistic combines part of the optimizer with part of the optimization problem, the NFL literature refers to it as the performance measure, and to samplers as optimization (or search) algorithms.


Figure 2: Random sampler $X$ is statistically independent of random objective function $F,$ and the selection $X(\lambda) = x_1, \dotsc, X(y_1, \dotsc, y_{n-1}) = x_n$ is statistically independent of the sample $F(x_1) = y_1,$ $\dotsc, F(x_n) = y_n.$ The statistic $\phi = \psi \circ r$ is the composition of the quality measure $\psi$ on optimizer outputs, given in the optimization problem, and the reduction $r$ of samples to outputs, assumed to be identical for all optimizers. The selection and the sample are abbreviated $\selection = \xn$ and $\sample = \yn,$ respectively.


The better known of NFL theorems, including that of Sect. 3.2 (which appears also in Streeter, 2003, and Igel and Toussaint, 2005), are reinventions of basic results in probability (Häggström, 2007). They address sampling and statistics. This claim is justified by showing that the selection process \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})), \] $1 \leq i \leq n,$ is statistically independent of the selected values \[ \sample \equiv F(X_1), \dotsc, F(X_n), \] despite its data processing. Sect. 3.1 supplies proof for the special case of deterministic selection, with the domain and codomain of the random objective function restricted to finite sets. Here the result is generalized to random selection, with repetition allowed, and with the domain and codomain possibly infinite, though countable. Furthermore, the selection process is equipped to terminate. Although this would seem to complicate matters, the proof is greatly simplified.

The selection process cannot be regarded as having or gaining information of unselected values of the objective function when it is statistically independent of the selected values. Data processing is a source of bias in the selection. The paper fails, in exposition, to recognize statistical independence for what it is, and refers instead to “conservation of information.” (The formal results are correct.) It furthermore equates the propensity of an optimizer to perform better for some distributions of the random objective function and worse for others with prior information that somehow inheres in the optimizer. The notion that one entity simply has, rather than acquires, varying degrees of prior information and misinformation of others is incoherent.

The following subsection formalizes the system sketched in Fig. 2, proves that the selection is indeed statistically independent of the selected values, and goes on to demonstrate that a general NFL theorem, so called, follows almost immediately. This exercise leaves very little room for contention that the NFL theorems address something other than sampling. The preface concludes with a sub­sec­tion that briefly deconstructs the paper’s erroneous claims about information.


0.1 Formal Results on Sampling

Sampling processes do not necessarily terminate, and it is convenient to treat them all as infinite. The distinguished symbol $\diamond$ is interpreted as the sample terminator. Let countable sets $\Xdiamond$ = $\X \cup \{\diamond\}$ and $\Ydiamond$ = $\Y \cup \{\diamond\},$ where sets $\X$ and $\Y$ exclude $\diamond.$ The random objective function $F$ maps $\Xdiamond$ to $\Ydiamond,$ with $F(x) = \diamond$ if and only if $x = \diamond.$

Finite sequences of the forms $\alpha_1, \dotsc,$ $\alpha_n$ and $\gamma(\alpha_1), \dotsc,$ $\gamma(\alpha_n)$ are abbreviated $\alpha_1^n$ and $\gamma_1^n(\alpha),$ respectively. A sampler is a random function, statistically independent of $F,$ from the set of all finite sequences on $\Ydiamond$ to $\Xdiamond.$ A selection is a random vector $\selection$ with \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})) \] $1 \leq i \leq n,$ where $X$ is a sampler. The selection is non-repeating if $\selection \in \pi(\X)$ surely, where $\pi(\X)$ is the set of all non-repeating sequences on $\X.$

A statistic is a function with the set of all finite sequences on $\Ydiamond$ as its domain. Assume, without loss of generality, that the codomain of a statistic is a countable superset of its range, which is countable like the domain.

Probability measure is unproblematic when considering the selection $\selection$ and the corresponding sequence $\sample,$ because both take values in countable sets. The following lemma establishes that the selection is statistically independent of the selected values, i.e., that it is correct to refer to $\sample$ as a sample of the values $\{F(x) \mid x \in \Xdiamond\}$ of the objective function. The data processing in extensions of the selection, highlighted in the proof, is a potential source of selection bias, not information about the objective function.

Lemma. Selection $X_1, \dotsc, X_n$ is statistically independent of $F(X_1), \dotsc, F(X_n).$

Proof. Let $\xn$ and $\yn$ be nonempty sequences on $\Xdiamond$ and $\Ydiamond,$ respectively, such that $P( \Fseq{n}{x} = \yn) > 0.$ Then \begin{align*} P( \selection = \xn, \sample = \yn) &= P(\selection = \xn, \Fseq{n}{x} = \yn) \\ &= P(\selection = \xn \mid \Fseq{n}{x} = \yn) \cdot P(\Fseq{n}{x} = \yn), \end{align*} and \begin{align*} &P(\selection = \xn \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(\Fseq{0}{x}) = x_1, \dotsc, X(\Fseq{n-1}{x}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n) \end{align*} because sampler $X$ is statistically independent of objective function $F.$ 

Corollary 1. Selection $X_1, \dotsc, X_n$ is statistically independent of $\phi(F(X_1), \dotsc, F(X_n))$ for all statistics $\phi.$

The following NFL theorem, so called, uses the symbol $\disteq$ to denote equality in probability distribution. It says, loosely, that the distribution of a statistic depends on the choice of sampler if and only if the distribution of the statistic depends on the choice of sample.

Theorem. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X,$ and let $\phi$ be a statistic. Then \begin{equation} \phi(F(X_1), \dotsc, F(X_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:selections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} \phi(F(w_1), \dotsc, F(w_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:sequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof.
($\Rightarrow$) Suppose that (1) holds for all non-repeating selections $\selection,$ and let $\wn$ be a non-repeating sequence on $\X.$ There exists sampler $X$ with constant selection $\selection = \wn,$ and thus (2) follows.
($\Leftarrow$) Suppose that (2) holds for all non-repeating sequences $\wn$ on $\X,$ and let $\selection$ be a non-repeating selection. A condition stronger than (1) follows. For each and every realization $\selection = \wn,$ non-repeating on $\X$ by definition, \begin{equation*} \phi(\sample) \disteq \phi(\Fseq{n}{w}) \disteq \phi(\Fseq{n}{x}). \end{equation*} The first equality holds by Corollary 1, and the second by assumption. 

Corollary 2. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X.$ Then \begin{equation} F(X_1), \dotsc, F(X_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distselections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} F(w_1), \dotsc, F(w_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distsequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof. Set the statistic $\phi$ in the theorem to the identity function. 

The corollary is essentially the NFL theorem of English (1996, Sect. 3.2). When (4) holds for all $n,$ the random values $\{F(x) \mid x \in \X\}$ of the objective function are said to be exchangeable (Häggström, 2007).


0.2 Understanding the Misunderstanding of Information

The root error is commitment to the belief that information is the cause of performance in black-box optimization (search). The NFL theorems arrived at a time when researchers commonly claimed that evolutionary optimizers gained information about the fitness landscape, and adapted themselves dynamically to improve performance. Wolpert and Macready (1995) observe that superior performance on a subset of functions is offset precisely by inferior performance on the complementary subset. In online discussion of their paper, Bill Spears referred to this as conservation of performance. My paper suggests that conservation of information accounts for conservation of performance.

The lemma of Sect. 3.1, “Conservation of Information,” expresses the absolute uninformedness of the sample selection process. The performance of a black-box optimizer has nothing whatsoever to do with its information of the objective function. But the paper recognizes only that information gain is impossible, and claims incoherently that prior information resides in the optimizer itself. Conservation of this chimeral information supposedly accounts for conservation of performance in optimization. Here are the salient points of illogic:

  1. Information causes performance.
  2. The optimizer gains no exploitable information by observation, so it must be prior information that causes performance.
  3. There is no input by which the optimizer might gain prior information, so it must be that prior information inheres in the optimizer.
  4. Prior information of one objective function is prior misinformation of another. Conservation of performance is due to conservation of information.

It should have been obvious that prior information is possessed only after it is acquired. The error is due in part to a mangled, literalistic half-reading of (Wolpert and Macready, 1995, p. 8, emphasis added):

The NFL theorem illustrates that even if we know something about [the objective function] … but don’t incorporate that knowledge into [the sampler] then we have no assurances that [the sampler] will be effective; we are simply relying on a fortuitous matching between [the objective function] and [the sampler].
The present work (Sect. 6) concludes that:
The tool literally carries information about the task. Furthermore, optimizers are literally tools — an algorithm implemented by a computing device is a physical entity. In empirical study of optimizers, the objective is to determine the task from the information in the tool.
This reveals confusion of one type of information with another. When a toolmaker imparts form to matter, the resulting tool is in-formed to suit a task. But such form is not prior information. Having been formed to perform is different from having registered a signal relevant to the task. An optimization practitioner may gain information of a problem by observation, and then form a sampler to serve as a proxy in solving it. Although the sampler is informed to act as the practitioner would, it is itself uninformed of the problem to which it is applied, and thus cannot justify its own actions. The inherent form that accounts for its performance is sampling bias.

Unjustified application of a biased sampler to an optimization problem is merely biased sampling by proxy. The NFL theorems do not speak to this fundamental point. They specify conditions in which all of the samplers under consideration are equivalent in overall performance, or are nearly so. Ascertaining that none of these theorems applies to a real-world circumstance does not justify a bias, but instead suggests that justification may be possible. There is never a “free lunch” for the justifier.


References

T. M. English. Evaluation of evolutionary and genetic optimizers: No free lunch. In L. J. Fogel, P. J. Angeline, and T. Bäck, editors, Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pages 163–169. MIT Press, Cambridge, MA, 1996.

O. Häggström. Intelligent design and the NFL theorems. Biology & Philosophy, 22(2):217–230, 2007.

C. Igel and M. Toussaint. A no-free-lunch theorem for non-uniform distributions of target functions. Journal of Mathematical Modelling and Algorithms, 3(4):313–322, 2005.

M. J. Streeter. Two broad classes of functions for which a no free lunch result does not hold. In Genetic and Evolutionary Computation – GECCO 2003, pages 1418–1430. Springer, 2003.

D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Technical report, SFI-TR-95-02-010, Santa Fe Institute, 1995.

D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.

Tuesday, August 12, 2014

NFL&I

My last post showed that a claim of Winston Ewert, William Dembski, and Robert J. Marks II to groundbreaking research does not withstand a minute of Googling. In the comments section, I made some admissions about my own work on "no free lunch" theorems for optimization. I want to highlight them here.

If I had a face-to-face chat with Winston, I'd tell him that I really, really, really regret having gotten into the "no free lunch" thing. Rather than play to my strengths — I'm a crackerjack computer scientist — I wasted a lot of myself on obfuscated reinvention of a probabilist's wheel. What's dreadfully embarrassing is that it took me many years to realize how bad my work was. I had to acknowledge that Haggstrom [cited below] was correct in his criticism of NFL. And Winston should acknowledge that he does not have a theoretical bone is his body.
Quoting what I said about myself, Anonymous asks:
Would you be willing to elaborate on this? It might help others avoid the types of mistakes you want them to avoid, if you could explain specifically: 1) in what ways your work was poor, or a "reinvented wheel", 2) why you did not notice your work was bad, and 3) what work/areas should be avoided (for fear of reinvention). Thanks.
The following is my response, lightly edited.


The first two questions are entirely legitimate. The last borders on the fallacy of the complex question. I'm more than willing to elaborate. But I'm going to limit my response, because I'm struggling to get a couple other things written. [Obviously, I've drifted off-task.]

1) I have toiled and toiled over a preface [now available] for my first NFL paper (1996), discovering errors in my explanations of errors, and compounding my embarrassment. Dog willing, and the Creek don't rise, I will eventually post "Sampling Bias Is Not Information" here. You can arrange for Google Scholar to alert you when the title appears, if you don't want to follow my blog. For now, I'll say only that what I call "conservation of information" is nothing but statistical independence of the sampling ("search" or "optimization") process and the sample of random values of the objective function. I describe statistical independence in the introduction to the paper, but fail to identify it. That is indeed bad.

2) For one thing, I had given my confusion a fancy name. But I would rather focus here on my runaway arrogance. I independently proved the main NFL theorem in 1994. Questioning a student during his thesis defense, I got him to state clearly that "future work" was supposed to lead to a generally superior optimizer. After a centisecond of thought, I sketched the simple argument that Häggström gives in "Intelligent Design and the NFL Theorems" (2007). I considered going for a publication, but, as obvious as the result was to me, I had to believe that it was already in the optimization literature. Twenty years ago, a lit review in an unfamiliar field was a big undertaking. And I had more than enough work to do at the time.

When Wolpert and Macready disseminated "No Free Lunch Theorems for Search" through GA Digest (early 1995), stirring up a big controversy, I concluded that I was remarkably clever, rather than that the whole affair was silly. The accepted version of my first NFL paper included my simple proof, but a reviewer's suggestion led me to see exchangeability (Häggström provides the term) as the "NFL condition." Rather than save the result for another paper, I dumped it into Sect. 3.2 about the time that camera-ready copy was due. The upshot is that I obsoleted what is known as the NFL theorem, 16 months before it was published. (I sent a copy of my paper to Bill Macready, who responded, "Nice work, Tom.")

As the reputation of NFL grew, so did my arrogance. I forgot what I had suspected in the beginning, namely that something so simple was probably not a novel discovery. The more work I did with NFL, the more it became "my thing." In 2000 and 2001, I gave tutorials on the topic at conferences. I put NFL in the cover letters of my applications for academic positions. When Dembski came along with his book No Free Lunch (2002), I was able to respond with "authority." And the number of citations of Wolpert and Macready (presently 3823) continued to go up and up and up. "Fifty Million Frenchmen Can't Be Wrong," you know, about the importance of NFL.

3) Do not presume to contribute to a field that you have not bothered to study in depth.

Tuesday, August 5, 2014

ID creationist “scholarship” sinks to new depths

[Edit: I regret turning nasty here. See the comments section for something rather more kind I would say to Winston Ewert.]

I mentioned a couple posts ago that I was annoyed with the editors and reviewers of an article, apparently to be published in an IEEE journal, by Winston Ewert, William Dembski, and Robert J. Marks II. [Edit 9/18/2014: Marks has removed the paper from his website. The IEEE Transactions on Systems, Man, and Cybernetics presently offers "Algorithmic Specified Complexity in the Game of Life" as a preprint.] Although I focus on the authors here, it will be clear enough what the folks responsible for quality control, including those on Ewert’s dissertation committee at Baylor, should have done. The short version is that, Googling obvious keywords, it takes at most a minute to discover that prior work, of which the authors claim to have no knowledge, exists in abundance.

Before continuing, I want to emphasize that my response to ID creationism is anything but knee-jerk. Consider, if nothing else, that the way I dealt with a dubious theorem supplied by Ewert et al. was to attempt to prove it. I not only succeeded in doing so, but supplied the proof here. I have since proved a fundamental theorem that some might take as advancing ID theory, though I see it otherwise.

Back to the article. There is no way around concluding that the veterans, Dembski and Marks, are either incompetent or dishonest. (You can guess which way I lean from a subtitle I’ve considered for this blog: Truth Is the First Casualty of Culture War.) And Youngman Ewert is either lazy, dishonest, or intellectually hobbled. (I see many signs of hero worship and true-believerism.)

You need no clear idea of what information theory is to grasp the significance of Dembski allowing himself to be promoted as the “Isaac Newton of information theory.” Marks edited a volume on engineering applications of information theory. The two have repeatedly cited standard texts that I own, but have not studied thoroughly. I am not an information theorist. So how is it that I should be thunderstruck by a flatly incorrect statement in the second sentence of the abstract, and then be floored by an intensification of it in the second paragraph of the text?

You may think that you are unqualified to judge whether Ewert, Dembski, and Marks are right or wrong. But Google Scholar settles the matter unequivocally. That is what makes the falsehood so atrocious. The authors state in the introduction:

Both Shannon [1], [2] and Kolmogorov-Chaitin-Solomonoff (KCS)1 [2]-[9] measures of information are famous for not being able to measure meaning. [...] We propose an information theoretic method to measure meaning. To our knowledge, there exists no other general analytic model for numerically assigning functional meaning to an object.

1Sometimes referred to as only Kolmogorov complexity or Kolmogorov information.
My immediate thoughts, incomprehensible to most of you, but eminently sensible to many who’ve had a course in information theory, were Kolmogorov structure function and Kolmogorov sufficient statistic. I first learned about Kolmogorov complexity from a chapter in reference [2]: Cover and Thomas, Elements of Information Theory. See Section 14.12, “Kolmogorov Sufficient Statistic.” If Dembski and Marks do not understand the relevance, then they are incompetent. If they do, then they are dishonest.

Back to Google. And to Ewert. And to Baylor. The article clearly draws on Ewert’s dissertation, which had better include a chapter with a title like “Literature Review.” Actually, his dissertation proposal should have surveyed the relevant literature. A scholar should comprehend a body of knowledge before trying to extend it — right? Let’s make the relatively charitable assumptions that Ewert, though presuming to make a fundamental contribution to information theory, never took a course in information theory, and failed to grasp some aspects of the introduction to Kolmogorov complexity in Cover and Thomas. He nonetheless had obvious keywords to enter into Google Scholar. There are 41,500 results for the unclever search

meaningful information Kolmogorov.
(The term meaning is too general, and meaningful is the last word of the first sentence in the abstract.) Close to the top are two articles by Paul Vitányi, both entitled “Meaningful Information.” With the search narrowed to
"meaningful information" Kolmogorov,
there are about 1,310 results. This is what I did first. I immediately went to the more recent of Vitányi's articles, because he’s a prominent researcher in Kolmogorov complexity, and also the coauthor of the standard text on the topic (reference [24] in Ewert et al.). I have highlighted key phrases in the abstract for those of you who don’t care to grapple with it.
Abstract—The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accidental information. There can be several ways (model classes) in which the regularity is expressed. Kolmogorov has proposed the model class of finite sets, generalized later to computable probability mass functions. The resulting theory, known as Algorithmic Statistics, analyzes the algorithmic [Kolmogorov] sufficient statistic when the statistic is restricted to the given model class. However, the most general way to proceed is perhaps to express the useful information as a total recursive function. The resulting measure has been called the “sophistication” of the object. We develop the theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication—all the information in them is meaningful and there is no residual randomness), determine its relation with the more restricted model classes of finite sets, and computable probability distributions, in particular with respect to the algorithmic (Kolmogorov) minimal sufficient statistic, the relation to the halting problem and further algorithmic properties.

So what are you up to in this paper, Dr. Ewert? When you feed us the “to our knowledge” guano, knowingly creating the misimpression that you tried to acquire knowledge, you are essentially lying. You have exaggerated the significance of your own work by failing to report on the prior work of others. I doubt highly that you are capable of understanding their work. Were you so inept as to not Google, or so deceitful as to sweep a big mound of inconvenient truth under the rug? Assuming the former, you need to catch on to the fact that your “maverick genius” coauthors should have known what was out there.