Bounded Science: 2014

Sunday, September 28, 2014

Would E.T. notice an icon of ID creationism?

Robert J. Marks II in his article on IDC in the conservative political outlet Human Events:

Yet we all agree that a picture of Mount Rushmore with the busts of four US Presidents contains more information than a picture of Mount Fuji.

As Jeff Shallit indicates, no, we really don’t. He has formal measures of information in mind, as I usually do. But I’ve posted a lot of formal stuff lately, and I’m going to do something more intuitive. [What you see here is an abortive attempt at late-night writing from over a month ago. Now that Jeff has posted a note he sent to Marks, I'm going to let it go as is. The pictures are fun.]

Is there some special kind of information in an image of Mount Rushmore that would grab the attention of an extraterrestrial flying by? A bright patch is certainly noticeable, but I don’t think that qualifies as a special kind of information, or as much information of any kind. And as lichen grows on the sculpture, it darkens. (This video has before-and-after shots at 4:50.) If you want to know what really wows E.T., click on the image below.

Photo by Volkan Yuksel (cropped).

There may well be a “look here, look here” icon long after the faces have crumbled.

Am I playing a dirty trick? No, by showing you the big picture, I’m allowing you to see that the form of the sculpture does not stand out from the rest of the mountain. It could not have been otherwise. A sculptor subtracts from what is already present to arrive at the result. Even when the medium is marble, there are sometimes features that drive the composition (see the quotes of Michelangelo and Henry Moore in a past post). Gutzon Borglum could not simply imagine the form of the monument, and then pick a mountain arbitrarily. He had to study available mountain form-ations, and imagine what he could produce by removing modest amounts of material.

Am I trying to diminish the work of Borglum? Certainly not. For someone to envision a monument in the side of a mountain is amazing. My point is that much of the form-ation of the sculpture was already done. The in-form-ation by the sculptor was relatively fine detail, for the most part, and that is why the gross features do not stand out from the surrounding stone.

Of course, the ID creationists make E.T. get up close and personal. The point has been made a gazillion times that an extraterrestrial may be so unlike a person that faces mean nothing to it. What objectively stands out in a shot that is tighter, but not as tight as the IDCists want it to be, is the relatively flat surface surrounding the heads. The pile of rubble beneath the carving also draws attention to it. How ironic.

The IDCists always frame what they say contains some sort of special information, without accounting for how that happens. Put simply, why does E.T. zoom in on a relatively small part of Mount Rushmore, if it doesn’t stand out? To come at this another way, Marks expects us to compare the typical image of Mount Fuji, far in the distance, to the typical image of Mount Rushmore, which is a small part containing the sculpture. That is what prompted me to go looking for shots from different perspectives and different distances. [… “If you want any more, you can sing it yourself.”]

"Mountfujijapan" by Swollib

Special Added Bonus Feature: Creationist Persecution Fantasy

Thursday, September 25, 2014

Response to Denyse O’Leary at Salvo

I received a tip that my name had been “taken in vain” by Denyse O’Leary. Unfortunately, the context is one in which I am more doglike than godlike: “The Law of Conservation of Information.”

Dembski did not invent the underlying idea of conservation of information. Biologist Peter Medawar (1980s) and computer scientist Tom English (1996) advanced the view that information is not created from scratch but rather is redistributed from existing sources. Robert Marks II and his students at Baylor University in Texas have developed the idea in terms of search, and their approach has profound consequences for plausible ideas of how evolution occurs, especially when vast claims are made for WEASEL and other evolution computer programs. As we will see in Part II, they are smuggling in information in order to arrive at their target.

I’ve come to understand a fair amount of the psychology of creationists. But I remain mystified by their proclivity to hold forth on anything and everything that comes along. What I’ve learned from my errors is that I’m qualified to speak authoritatively on precious few matters. And even on those, I have to be exceedingly careful. Denyse has had her head handed to her various times at Uncommon Descent, when she’s ventured into the simplest of math. Is she unembarrassed, or undeterred by embarrassment? Similarly, when she apes the rhetoric of the likes of Demkski and Meyer, where does the unconscious lying to herself end, and the conscious lying to her readers begin?

Ms. O’Leary, my 1996 formulation of "search" was needlessly complicated. With simplification, search is clearly a process of sampling a set of alternatives (which Dembski and Marks refer to as the sample space). To my huge embarrassment, conservation of information turns out to be nothing but obfuscation of statistical independence — a concept that undergraduates encounter early in introductory courses on probability and statistics. There can be no conservation of information in random selection of a sample because there is no information whatsoever. It is absurd to speak of conserving what does not exist.

If samplers have no information about the samples they draw, then how do we account for the fact that sampler (search) A is more likely than sampler B to select a sample that includes at least one element of the target (to hit the target)? There is not the least mystery here. Samplers differ in their biases. That is the gist of why I was wrong to indicate in 1996 that information somehow resides in samplers, and why Dembski and Marks are wrong to do so today.

The following includes a technical correction of my own errors, but ends with exposition that should make sense to everyone who is able to follow you:
http://boundedtheoretics.blogspot.com/2014/08/sam...

The errors of Dembski and Marks apparently derive from a misunderstanding of the "no free lunch" theorem for search. The following links to an interview in which Marks attempts to explain the theorem in layperson's terms, and provides an accessible discussion of how he goes awry:
http://boundedtheoretics.blogspot.com/2012/04/bob...

P.S.—Note that much of the misunderstanding is attributable to misnaming. I know that Ms. O’Leary appreciates the powerful impact of language upon thought. If you refer to the process of sample selection as search, designate a particular subset of the sample space as the target, and say that the selection process hits the target when the sample includes an element of the subset, then you will have a very hard time thinking straight about sampling.

Wednesday, September 24, 2014

Encourage Winston Ewert to lift the five-year embargo on his dissertation

In the days before electronic dissemination of theses and dissertations, I heard of a trick to see if someone had paid attention to your work: insert a buck between the pages of the library copy, and check on it a year later. Obviously, communication among scholars has changed radically. But what remains the same is the hope that someone will actually delve into the full account of your scholarship — i.e., that the document amounts to more than an exercise that you had to complete in order to move on to other things.

Students rarely withhold their theses and dissertations from public view. It makes sense if you have developed a valuable trade secret, or if you reasonably believe that someone might steal your results. There has been no such sense in the one-year embargoes that students working with Professor Robert J. Marks II at Baylor have placed on their masters' theses. But I never groused. And I waited patiently to see Winston Ewert's dissertation, Algorithmic Specified Complexity (August 2013). However, it turns out that he has opted for a five-year embargo.

This is exactly the opposite of what Winston should do. Please contact him to explain that shutting out the light is a bad move for someone who has chosen the path of creationism. He provides an email address at the Evolutionary Informatics Lab website.

Monday, August 25, 2014

The censorial anti-ID activist is… Bob O'Hara?

Those of you acquainted with Bob O’Hara have already busted a gut laughing. There is not, as best I can tell, a censorial bone in his body. He is the mildest of the pack nipping at the heels of ID creationists. As I recall, he even respects their wish not to be called creationists.

A year ago, ID-creationism advocate Casey Luskin alternately detailed and insinuated everything he ever wanted you to believe about Springer’s abandonment of the creationist-edited volume Biological Evolution: New Perspectives. And what he insinuated about Bob O’Hara was ugly.

[A]pparently [Matzke’s] post generated a lot of complaints to Springer from people who didn’t want the company to publish a book with articles sympathetic to ID. For example, one of Matzke’s Panda’s Thumb followers, statistician Bob O’Hara, reported that “I’ve been in contact with one of the editors at Springer, so they’re now certainly aware of the situation.” Within a day or two, Springer had removed its page for Biological Information: New Perspectives from its website.

Luskin cutely juxtaposes Bob’s comment with the removal of the book announcement, inviting you to read the worst into it. But this isn’t strong enough a rhetorical trick for the political magazine Human Events, which brings us “Powerful Conservative Voices.” So the powerful Robert J. Marks II resorted, apparently, to embellishment in his article of last week, Biological Information: New Perspectives from Intelligent Design:

Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.

I now have Bob’s permission not only to invoke his name, but also to reveal how he bullied a senior editor at Springer US on February 27, 2012. (He sent me a copy of his note, three days later.) After identifying the book, he wrote:

This has the potential to be a controversial text (as the editors are all active in pushing Intelligent Design), so I'm wondering why it's being published as an engineering text, rather than biology: it would seem to be a better fit there.

Gee. No threat. No doomsaying. No claim to know anything about what he had not yet seen. I see a flat statement of fact, followed by a gentle suggestion that the book was misclassified. Look back at what Marks wrote, and consider the warped mind that would concoct such propaganda.

Did Springer drop the title because of outside pressure? I doubt it highly. As I explained in my last post, the book deal was shady from the get-go. The most that Bob O’Hara did was to shine light on it. Predictably, the Pharisees focus on the alleged breach of contract instead of their own dubious ethics.

[Edit. I just learned that Marks has deleted an erroneous erratum from the online copy of an article, leaving no indication that one of the two highlighted theorems is severely botched. The kicker is that Marks received, long before submitting the paper to the journal that published it, a clear explanation of the error. His correspondent CC'd me! See “The theorem that never was: Diversionary ‘erratum’ from Dembski and Marks.”]

Saturday, August 23, 2014

You’re making things up again, Robert J. Marks II

One of the “20 Most Influential Christian Scholars,” the distinguished professor who approved a master’s thesis that plagiarized his own publications, your favorite whited sepulcher and mine, Robert J. Marks II, is making things up again. This time it’s a tale of censorship of the creationist-edited volume Biological Information: New Perspectives, told through a conservative political outlet, Human Events. Marks embellishes and contradicts what Casey Luskin, an intelligent-design advocate at the Discovery Institute, reported a year ago. He would have you believe:

Despite the intelligent design content, the German publishing company Springer invited the organizers to publish papers from the conference. But, even though no one had yet seen the book, publicity at an atheistic leaning neo-Darwinist blog prompted an anti-ID activist to contact Springer upper management and claim Springer’s publishing of the book would ruin Springer’s reputation in science. So Springer reneged on its contract with the Editors at the last minute.

Luskin flatly contradicts the first statement. And I’ve had, since March 2, 2012, a copy of the email that the “anti-ID activist” sent to Springer. It is utterly devoid of what Marks attributes to it [see for yourself]. Here is what I think is significant: the author sent the note to a Ph.D. scientist-editor at Springer US. But Springer DE was handling the book, according to Luskin. [The editor did not reply.] It may well be that New York pushed the panic button in Heidelberg. In any case, we know for sure:

♫ You’re making things up again, Robert ♫

Marks points out that Springer reneged on the contract, but somehow forgets to mention that he knew from the outset that the deal was shady. According to Luskin, it was Dembski who proposed to an editor of a Springer engineering series on intelligent systems that Biological Information: New Perspectives be included. Dembski would think that he could talk a fast line to justify it. He thinks that about everything he does. But Marks’ field is intelligent systems. And so is mine. He knew just as well as I did that it was wrong to dump the book into that series. A big threat to the publisher’s reputation, I think, was that institutions buying all volumes, expecting them to be about intelligent systems as advertized, would scream loudly when they got creationism instead.

I can’t resist amplifying Marks’ first sentence.

A diverse group of [secretly invited] scientists [many of whom were not scientists] gathered at [but not under the auspices of] Cornell University in 2011 to discuss their research [not peer-reviewed] into the nature and origins of biological information [loosely interpreted].

My initial response to the proceedings of the enclave is here. I wish that I’d given more emphasis to the fact that, despite all of Marks’ bragging about the attendees, the organizers didn’t, well, organize them to review the papers of their peers. The symposium was pretty much a group-hug. There were many presentations by John Sanford, who is busy setting up simulations to show that genomes only deteriorate — and rapidly, at that. He’s confident that our species won’t survive to the end of the century. It’s all science, of course. But he does hope to persuade you that the End Time is at hand.

Marks and Luskin carry on about their contract. But they don’t seem terribly anxious to admit that they consort with people who, if their madness did not align with established religions, would be locked up. At present, a guy I know well is heavily preoccupied with off-brand religion, and is in the protective custody of the state. He seems no crazier to me than Sanford. (Watch this if you think I’m exaggerating.) And the injustice of the difference in society’s treatment of him and it’s treatment of a YEC with an elaborate delusional system is weighing heavily on my mind.

Thursday, August 21, 2014

Calling all statisticians and probabilists

The following is a question I posted at the Mathematics Stack Exchange. Folks who see it here are likely to understand that my "Making Less of No Free Lunch" campaign is intended as much to hose down a creationist cloud of dust as to rain on the NFL parade. Please contribute a bit of your expertise to a worthy cause.

I hope to replace a proof of my own, in a paper explaining that the "no free lunch" theorems for optimization actually address sampling and statistics, with a reference to an existing result on sampling. The random selection process $$X_1, X_2, \dotsc, X_n$$ over the domain of the random objective function $F$ is statistically independent of the selected values, $$F(X_1), F(X_2), \dotsc, F(X_n),$$ despite the functional dependence $$X_i \equiv X(F(X_1), F(X_2), \dotsc, F(X_{i-1})),$$ $1 < i \leq n,$ where $X$ is statistically independent of $F.$ (See Section 0.1 here [or my last post] for more detail.)

To put this in terms of sampling, $F$ associates members of a statistical population with their unknown responses. The sampler $X$ processes data already in the sample to extend the selection. This typically biases the selection. But the selection process is statistically independent of the sequence of responses. That is the justification for referring to the latter as a sample.

This looks like textbook stuff to me: "Data processing biases the selection." But I am awash in irrelevant hits when I Google the terms you see here, and others like them.

Friday, August 15, 2014

Sampling Bias Is Not Information

[Edits 8/19/2014 and 8/20/2014: Although the proof of the so-called “no free lunch” theorem is logically very simple, it looked like a vomited hairball in the first version of this post. I’ve made it considerably more inviting. Note also that the online version of “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch” now has a preface.]

This corrects a post that I retracted, and promised to replace, more than a year ago. It took me only a few days to redo the math. But then I executed a fabulously ADDled drift into composition of a preface for my first paper on “no free lunch” in optimization, planning to use it also as a replacement post. Now I must admit that I am ensnarled in words, and that I don’t know when, or even if, I will break free. Consequently, I am dumping here some fragments that should make sense, though they say much less than I would like to get said.

For those of you who are more interested in ID creationism than in NFL, the highlight is that my references to “conservation of information,” elevated to “English’s Principle of Conservation of Information” by Dembski and Marks in “Conservation of Information in Search: Measuring the Cost of Success,” amount to nothing but failure to recognize that I was discussing statistical independence. Much of what I say in the subsection “Understanding the Misunderstanding of Information” applies not only to me, but also to Dembski and Marks. In fact, I recommend reading the relatively accessible “Bob Marks grossly misunderstands ‘no free lunch’” first.

Abstract

The recent “no free lunch” theorems of Wolpert and Macready indicate the need to reassess empirical methods for evaluation of evolutionary and genetic optimizers. Their main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average. [An “optimizer” selects a sample of the values of the objective function. Its “performance” is a statistic of the sample.] The present work generalizes the result to an uncountable set of distributions. The focus is upon the ~~conservation of information as an optimizer evaluates points~~ [statistical independence of the selection process and the selected values]. ~~It is shown that the information an optimizer gains about unobserved values is ultimately due to its prior information of value distributions.~~ [The paper mistakes selection bias for prior information of the objective function.] ~~Inasmuch as information about one distribution is misinformation about another, there is no generally superior function optimizer~~. Empirical studies are best regarded as attempts to ~~infer the prior information optimizers have about distributions~~ [match selection biases to constrained problems] — i.e., to determine which tools are good for which tasks.

0. Sampling Bias Is Not Information

This preface (Sect. 0) addresses expository errors in the original paper (English, 1996), but stands as a self-contained report. Specific corrections and amplifications appear in the remainder of the text (Sects. 1–6).

Figure 1: Objective function $ \newcommand{\disteq}{\stackrel{\scriptscriptstyle{\mathcal{D}}}{=}} \newcommand{\Fseq}[2]{% {F_1^{#1}\hspace{-1pt}({#2)}} } \newcommand{\selection}{{X_1^n}} \newcommand{\sample}{{\Fseq{n}{X}}} \newcommand{\Fn}[1]{{\Fseq{n}{#1}}} \newcommand{\wn}{{w_1^n}} \newcommand{\xn}{{x_1^n}} \newcommand{\yn}{{y_1^n}} \newcommand{\zn}{{z_1^n}} \newcommand{\boldx}{\mathbf{x}} \newcommand{\boldw}{\mathbf{w}} \newcommand{\X}{\mathcal{X}} \newcommand{\Xdiamond}{\mathcal{X}_\diamond} \newcommand{\Y}{\mathcal{Y}} \newcommand{\Ydiamond}{\mathcal{Y}_\diamond} F$ and quality measure $\psi$ comprise the optimization problem. An optimizer is a sampler $X,$ along with a reduction $r$ of the selection $\xn$ and the sample $\yn$ to outputs $\tilde{x}_1^m = x_{j_1}, \dotsc, x_{j_m}$ and $\tilde{y}_1^m = y_{j_1}, \dotsc, y_{j_m},$ with $j_1^m$ strictly increasing. NFL analyses make four assumptions: the selection is non-repeating, the reduction is identical for all optimizers under consideration, the output $\tilde{y}_1^m$ depends only on the sample, and quality depends only on $\tilde{y}_1^m.$ Accordingly, write $\tilde{y}_1^m = r(\yn)$ and $\phi_n = \psi(\tilde{y}_1^m).$ Then the quality $\psi(r(\yn))$ of the optimizer’s output is a statistic $\phi = \psi \circ r$ of the sample it generates.

Black-box optimization may be decomposed into generation of a sample of the values of the objective function, and use of the sample to produce a result. “No free lunch” (NFL) analyses assume explicitly that sampling is without repetition, and define quality of the optimization result to be a statistic of the sample (Wolpert and Macready, 1995, 1997). Fig. 1 unpacks the tacit assumptions, the most important of which is that the optimizers under consideration differ only in selection of samples. Fig. 2 shows how postprocessing of the sample, assumed to be identical for all optimizers, is embedded in the statistic. Although the statistic combines part of the optimizer with part of the optimization problem, the NFL literature refers to it as the performance measure, and to samplers as optimization (or search) algorithms.

Figure 2: Random sampler $X$ is statistically independent of random objective function $F,$ and the selection $X(\lambda) = x_1, \dotsc, X(y_1, \dotsc, y_{n-1}) = x_n$ is statistically independent of the sample $F(x_1) = y_1,$ $\dotsc, F(x_n) = y_n.$ The statistic $\phi = \psi \circ r$ is the composition of the quality measure $\psi$ on optimizer outputs, given in the optimization problem, and the reduction $r$ of samples to outputs, assumed to be identical for all optimizers. The selection and the sample are abbreviated $\selection = \xn$ and $\sample = \yn,$ respectively.

The better known of NFL theorems, including that of Sect. 3.2 (which appears also in Streeter, 2003, and Igel and Toussaint, 2005), are reinventions of basic results in probability (Häggström, 2007). They address sampling and statistics. This claim is justified by showing that the selection process \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})), \] $1 \leq i \leq n,$ is statistically independent of the selected values \[ \sample \equiv F(X_1), \dotsc, F(X_n), \] despite its data processing. Sect. 3.1 supplies proof for the special case of deterministic selection, with the domain and codomain of the random objective function restricted to finite sets. Here the result is generalized to random selection, with repetition allowed, and with the domain and codomain possibly infinite, though countable. Furthermore, the selection process is equipped to terminate. Although this would seem to complicate matters, the proof is greatly simplified.

The selection process cannot be regarded as having or gaining information of unselected values of the objective function when it is statistically independent of the selected values. Data processing is a source of bias in the selection. The paper fails, in exposition, to recognize statistical independence for what it is, and refers instead to “conservation of information.” (The formal results are correct.) It furthermore equates the propensity of an optimizer to perform better for some distributions of the random objective function and worse for others with prior information that somehow inheres in the optimizer. The notion that one entity simply has, rather than acquires, varying degrees of prior information and misinformation of others is incoherent.

The following subsection formalizes the system sketched in Fig. 2, proves that the selection is indeed statistically independent of the selected values, and goes on to demonstrate that a general NFL theorem, so called, follows almost immediately. This exercise leaves very little room for contention that the NFL theorems address something other than sampling. The preface concludes with a subsection that briefly deconstructs the paper’s erroneous claims about information.

0.1 Formal Results on Sampling

Sampling processes do not necessarily terminate, and it is convenient to treat them all as infinite. The distinguished symbol $\diamond$ is interpreted as the sample terminator. Let countable sets $\Xdiamond$ = $\X \cup \{\diamond\}$ and $\Ydiamond$ = $\Y \cup \{\diamond\},$ where sets $\X$ and $\Y$ exclude $\diamond.$ The random objective function $F$ maps $\Xdiamond$ to $\Ydiamond,$ with $F(x) = \diamond$ if and only if $x = \diamond.$

Finite sequences of the forms $\alpha_1, \dotsc,$ $\alpha_n$ and $\gamma(\alpha_1), \dotsc,$ $\gamma(\alpha_n)$ are abbreviated $\alpha_1^n$ and $\gamma_1^n(\alpha),$ respectively. A sampler is a random function, statistically independent of $F,$ from the set of all finite sequences on $\Ydiamond$ to $\Xdiamond.$ A selection is a random vector $\selection$ with \[ X_i \equiv X(F(X_1), \dotsc, F(X_{i-1})) \] $1 \leq i \leq n,$ where $X$ is a sampler. The selection is non-repeating if $\selection \in \pi(\X)$ surely, where $\pi(\X)$ is the set of all non-repeating sequences on $\X.$

A statistic is a function with the set of all finite sequences on $\Ydiamond$ as its domain. Assume, without loss of generality, that the codomain of a statistic is a countable superset of its range, which is countable like the domain.

Probability measure is unproblematic when considering the selection $\selection$ and the corresponding sequence $\sample,$ because both take values in countable sets. The following lemma establishes that the selection is statistically independent of the selected values, i.e., that it is correct to refer to $\sample$ as a sample of the values $\{F(x) \mid x \in \Xdiamond\}$ of the objective function. The data processing in extensions of the selection, highlighted in the proof, is a potential source of selection bias, not information about the objective function.

Lemma. Selection $X_1, \dotsc, X_n$ is statistically independent of $F(X_1), \dotsc, F(X_n).$

Proof. Let $\xn$ and $\yn$ be nonempty sequences on $\Xdiamond$ and $\Ydiamond,$ respectively, such that $P( \Fseq{n}{x} = \yn) > 0.$ Then \begin{align*} P( \selection = \xn, \sample = \yn) &= P(\selection = \xn, \Fseq{n}{x} = \yn) \\ &= P(\selection = \xn \mid \Fseq{n}{x} = \yn) \cdot P(\Fseq{n}{x} = \yn), \end{align*} and \begin{align*} &P(\selection = \xn \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(\Fseq{0}{x}) = x_1, \dotsc, X(\Fseq{n-1}{x}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n \mid \Fseq{n}{x} = \yn) \\ &\quad= P(X(y_1^0) = x_1, \dotsc, X(y_1^{n-1}) = x_n) \end{align*} because sampler $X$ is statistically independent of objective function $F.$ ☐

Corollary 1. Selection $X_1, \dotsc, X_n$ is statistically independent of $\phi(F(X_1), \dotsc, F(X_n))$ for all statistics $\phi.$

The following NFL theorem, so called, uses the symbol $\disteq$ to denote equality in probability distribution. It says, loosely, that the distribution of a statistic depends on the choice of sampler if and only if the distribution of the statistic depends on the choice of sample.

Theorem. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X,$ and let $\phi$ be a statistic. Then \begin{equation} \phi(F(X_1), \dotsc, F(X_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:selections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} \phi(F(w_1), \dotsc, F(w_n)) \disteq \phi(F(x_1), \dotsc, F(x_n)) \label{eq:sequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof.
($\Rightarrow$) Suppose that (1) holds for all non-repeating selections $\selection,$ and let $\wn$ be a non-repeating sequence on $\X.$ There exists sampler $X$ with constant selection $\selection = \wn,$ and thus (2) follows.
($\Leftarrow$) Suppose that (2) holds for all non-repeating sequences $\wn$ on $\X,$ and let $\selection$ be a non-repeating selection. A condition stronger than (1) follows. For each and every realization $\selection = \wn,$ non-repeating on $\X$ by definition, \begin{equation*} \phi(\sample) \disteq \phi(\Fseq{n}{w}) \disteq \phi(\Fseq{n}{x}). \end{equation*} The first equality holds by Corollary 1, and the second by assumption. ☐

Corollary 2. Let $x_1, \dotsc, x_n$ be a nonempty, non-repeating sequence on $\X.$ Then \begin{equation} F(X_1), \dotsc, F(X_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distselections} \end{equation} for all non-repeating selections $X_1, \dotsc, X_n$ if and only if \begin{equation} F(w_1), \dotsc, F(w_n) \disteq F(x_1), \dotsc, F(x_n) \label{eq:distsequences} \end{equation} for all non-repeating sequences $w_1, \dotsc, w_n$ on $\X.$

Proof. Set the statistic $\phi$ in the theorem to the identity function. ☐

The corollary is essentially the NFL theorem of English (1996, Sect. 3.2). When (4) holds for all $n,$ the random values $\{F(x) \mid x \in \X\}$ of the objective function are said to be exchangeable (Häggström, 2007).

0.2 Understanding the Misunderstanding of Information

The root error is commitment to the belief that information is the cause of performance in black-box optimization (search). The NFL theorems arrived at a time when researchers commonly claimed that evolutionary optimizers gained information about the fitness landscape, and adapted themselves dynamically to improve performance. Wolpert and Macready (1995) observe that superior performance on a subset of functions is offset precisely by inferior performance on the complementary subset. In online discussion of their paper, Bill Spears referred to this as conservation of performance. My paper suggests that conservation of information accounts for conservation of performance.

The lemma of Sect. 3.1, “Conservation of Information,” expresses the absolute uninformedness of the sample selection process. The performance of a black-box optimizer has nothing whatsoever to do with its information of the objective function. But the paper recognizes only that information gain is impossible, and claims incoherently that prior information resides in the optimizer itself. Conservation of this chimeral information supposedly accounts for conservation of performance in optimization. Here are the salient points of illogic:

Information causes performance.
The optimizer gains no exploitable information by observation, so it must be prior information that causes performance.
There is no input by which the optimizer might gain prior information, so it must be that prior information inheres in the optimizer.
Prior information of one objective function is prior misinformation of another. Conservation of performance is due to conservation of information.

It should have been obvious that prior information is possessed only after it is acquired. The error is due in part to a mangled, literalistic half-reading of (Wolpert and Macready, 1995, p. 8, emphasis added):

The NFL theorem illustrates that even if we know something about [the objective function] … but don’t incorporate that knowledge into [the sampler] then we have no assurances that [the sampler] will be effective; we are simply relying on a fortuitous matching between [the objective function] and [the sampler].

The present work (Sect. 6) concludes that:

The tool literally carries information about the task. Furthermore, optimizers are literally tools — an algorithm implemented by a computing device is a physical entity. In empirical study of optimizers, the objective is to determine the task from the information in the tool.

This reveals confusion of one type of information with another. When a toolmaker imparts form to matter, the resulting tool is in-formed to suit a task. But such form is not prior information. Having been formed to perform is different from having registered a signal relevant to the task. An optimization practitioner may gain information of a problem by observation, and then form a sampler to serve as a proxy in solving it. Although the sampler is informed to act as the practitioner would, it is itself uninformed of the problem to which it is applied, and thus cannot justify its own actions. The inherent form that accounts for its performance is sampling bias.

Unjustified application of a biased sampler to an optimization problem is merely biased sampling by proxy. The NFL theorems do not speak to this fundamental point. They specify conditions in which all of the samplers under consideration are equivalent in overall performance, or are nearly so. Ascertaining that none of these theorems applies to a real-world circumstance does not justify a bias, but instead suggests that justification may be possible. There is never a “free lunch” for the justifier.

References

T. M. English. Evaluation of evolutionary and genetic optimizers: No free lunch. In L. J. Fogel, P. J. Angeline, and T. Bäck, editors, Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pages 163–169. MIT Press, Cambridge, MA, 1996.

O. Häggström. Intelligent design and the NFL theorems. Biology & Philosophy, 22(2):217–230, 2007.

C. Igel and M. Toussaint. A no-free-lunch theorem for non-uniform distributions of target functions. Journal of Mathematical Modelling and Algorithms, 3(4):313–322, 2005.

M. J. Streeter. Two broad classes of functions for which a no free lunch result does not hold. In Genetic and Evolutionary Computation – GECCO 2003, pages 1418–1430. Springer, 2003.

D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Technical report, SFI-TR-95-02-010, Santa Fe Institute, 1995.

D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997.

Tuesday, August 12, 2014

NFL&I

My last post showed that a claim of Winston Ewert, William Dembski, and Robert J. Marks II to groundbreaking research does not withstand a minute of Googling. In the comments section, I made some admissions about my own work on "no free lunch" theorems for optimization. I want to highlight them here.

If I had a face-to-face chat with Winston, I'd tell him that I really, really, really regret having gotten into the "no free lunch" thing. Rather than play to my strengths — I'm a crackerjack computer scientist — I wasted a lot of myself on obfuscated reinvention of a probabilist's wheel. What's dreadfully embarrassing is that it took me many years to realize how bad my work was. I had to acknowledge that Haggstrom [cited below] was correct in his criticism of NFL. And Winston should acknowledge that he does not have a theoretical bone is his body.

Quoting what I said about myself, Anonymous asks:

Would you be willing to elaborate on this? It might help others avoid the types of mistakes you want them to avoid, if you could explain specifically: 1) in what ways your work was poor, or a "reinvented wheel", 2) why you did not notice your work was bad, and 3) what work/areas should be avoided (for fear of reinvention). Thanks.

The following is my response, lightly edited.

The first two questions are entirely legitimate. The last borders on the fallacy of the complex question. I'm more than willing to elaborate. But I'm going to limit my response, because I'm struggling to get a couple other things written. [Obviously, I've drifted off-task.]

1) I have toiled and toiled over a preface [now available] for my first NFL paper (1996), discovering errors in my explanations of errors, and compounding my embarrassment. Dog willing, and the Creek don't rise, I will eventually post "Sampling Bias Is Not Information" here. You can arrange for Google Scholar to alert you when the title appears, if you don't want to follow my blog. For now, I'll say only that what I call "conservation of information" is nothing but statistical independence of the sampling ("search" or "optimization") process and the sample of random values of the objective function. I describe statistical independence in the introduction to the paper, but fail to identify it. That is indeed bad.

2) For one thing, I had given my confusion a fancy name. But I would rather focus here on my runaway arrogance. I independently proved the main NFL theorem in 1994. Questioning a student during his thesis defense, I got him to state clearly that "future work" was supposed to lead to a generally superior optimizer. After a centisecond of thought, I sketched the simple argument that Häggström gives in "Intelligent Design and the NFL Theorems" (2007). I considered going for a publication, but, as obvious as the result was to me, I had to believe that it was already in the optimization literature. Twenty years ago, a lit review in an unfamiliar field was a big undertaking. And I had more than enough work to do at the time.

When Wolpert and Macready disseminated "No Free Lunch Theorems for Search" through GA Digest (early 1995), stirring up a big controversy, I concluded that I was remarkably clever, rather than that the whole affair was silly. The accepted version of my first NFL paper included my simple proof, but a reviewer's suggestion led me to see exchangeability (Häggström provides the term) as the "NFL condition." Rather than save the result for another paper, I dumped it into Sect. 3.2 about the time that camera-ready copy was due. The upshot is that I obsoleted what is known as the NFL theorem, 16 months before it was published. (I sent a copy of my paper to Bill Macready, who responded, "Nice work, Tom.")

As the reputation of NFL grew, so did my arrogance. I forgot what I had suspected in the beginning, namely that something so simple was probably not a novel discovery. The more work I did with NFL, the more it became "my thing." In 2000 and 2001, I gave tutorials on the topic at conferences. I put NFL in the cover letters of my applications for academic positions. When Dembski came along with his book No Free Lunch (2002), I was able to respond with "authority." And the number of citations of Wolpert and Macready (presently 3823) continued to go up and up and up. "Fifty Million Frenchmen Can't Be Wrong," you know, about the importance of NFL.

3) Do not presume to contribute to a field that you have not bothered to study in depth.

Tuesday, August 5, 2014

ID creationist “scholarship” sinks to new depths

[Edit: I regret turning nasty here. See the comments section for something rather more kind I would say to Winston Ewert.]

I mentioned a couple posts ago that I was annoyed with the editors and reviewers of an article, apparently to be published in an IEEE journal, by Winston Ewert, William Dembski, and Robert J. Marks II. [Edit 9/18/2014: Marks has removed the paper from his website. The IEEE Transactions on Systems, Man, and Cybernetics presently offers "Algorithmic Specified Complexity in the Game of Life" as a preprint.] Although I focus on the authors here, it will be clear enough what the folks responsible for quality control, including those on Ewert’s dissertation committee at Baylor, should have done. The short version is that, Googling obvious keywords, it takes at most a minute to discover that prior work, of which the authors claim to have no knowledge, exists in abundance.

Before continuing, I want to emphasize that my response to ID creationism is anything but knee-jerk. Consider, if nothing else, that the way I dealt with a dubious theorem supplied by Ewert et al. was to attempt to prove it. I not only succeeded in doing so, but supplied the proof here. I have since proved a fundamental theorem that some might take as advancing ID theory, though I see it otherwise.

Back to the article. There is no way around concluding that the veterans, Dembski and Marks, are either incompetent or dishonest. (You can guess which way I lean from a subtitle I’ve considered for this blog: Truth Is the First Casualty of Culture War.) And Youngman Ewert is either lazy, dishonest, or intellectually hobbled. (I see many signs of hero worship and true-believerism.)

You need no clear idea of what information theory is to grasp the significance of Dembski allowing himself to be promoted as the “Isaac Newton of information theory.” Marks edited a volume on engineering applications of information theory. The two have repeatedly cited standard texts that I own, but have not studied thoroughly. I am not an information theorist. So how is it that I should be thunderstruck by a flatly incorrect statement in the second sentence of the abstract, and then be floored by an intensification of it in the second paragraph of the text?

You may think that you are unqualified to judge whether Ewert, Dembski, and Marks are right or wrong. But Google Scholar settles the matter unequivocally. That is what makes the falsehood so atrocious. The authors state in the introduction:

Both Shannon [1], [2] and Kolmogorov-Chaitin-Solomonoff (KCS)¹ [2]-[9] measures of information are famous for not being able to measure meaning. [...] We propose an information theoretic method to measure meaning. To our knowledge, there exists no other general analytic model for numerically assigning functional meaning to an object.

¹Sometimes referred to as only Kolmogorov complexity or Kolmogorov information.

My immediate thoughts, incomprehensible to most of you, but eminently sensible to many who’ve had a course in information theory, were Kolmogorov structure function and Kolmogorov sufficient statistic. I first learned about Kolmogorov complexity from a chapter in reference [2]: Cover and Thomas, Elements of Information Theory. See Section 14.12, “Kolmogorov Sufficient Statistic.” If Dembski and Marks do not understand the relevance, then they are incompetent. If they do, then they are dishonest.

Back to Google. And to Ewert. And to Baylor. The article clearly draws on Ewert’s dissertation, which had better include a chapter with a title like “Literature Review.” Actually, his dissertation proposal should have surveyed the relevant literature. A scholar should comprehend a body of knowledge before trying to extend it — right? Let’s make the relatively charitable assumptions that Ewert, though presuming to make a fundamental contribution to information theory, never took a course in information theory, and failed to grasp some aspects of the introduction to Kolmogorov complexity in Cover and Thomas. He nonetheless had obvious keywords to enter into Google Scholar. There are 41,500 results for the unclever search

meaningful information Kolmogorov.

(The term meaning is too general, and meaningful is the last word of the first sentence in the abstract.) Close to the top are two articles by Paul Vitányi, both entitled “Meaningful Information.” With the search narrowed to

"meaningful information" Kolmogorov,

there are about 1,310 results. This is what I did first. I immediately went to the more recent of Vitányi's articles, because he’s a prominent researcher in Kolmogorov complexity, and also the coauthor of the standard text on the topic (reference [24] in Ewert et al.). I have highlighted key phrases in the abstract for those of you who don’t care to grapple with it.

Abstract—The information in an individual finite object (like a binary string) is commonly measured by its Kolmogorov complexity. One can divide that information into two parts: the information accounting for the useful regularity present in the object and the information accounting for the remaining accidental information. There can be several ways (model classes) in which the regularity is expressed. Kolmogorov has proposed the model class of finite sets, generalized later to computable probability mass functions. The resulting theory, known as Algorithmic Statistics, analyzes the algorithmic [Kolmogorov] sufficient statistic when the statistic is restricted to the given model class. However, the most general way to proceed is perhaps to express the useful information as a total recursive function. The resulting measure has been called the “sophistication” of the object. We develop the theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication—all the information in them is meaningful and there is no residual randomness), determine its relation with the more restricted model classes of finite sets, and computable probability distributions, in particular with respect to the algorithmic (Kolmogorov) minimal sufficient statistic, the relation to the halting problem and further algorithmic properties.

So what are you up to in this paper, Dr. Ewert? When you feed us the “to our knowledge” guano, knowingly creating the misimpression that you tried to acquire knowledge, you are essentially lying. You have exaggerated the significance of your own work by failing to report on the prior work of others. I doubt highly that you are capable of understanding their work. Were you so inept as to not Google, or so deceitful as to sweep a big mound of inconvenient truth under the rug? Assuming the former, you need to catch on to the fact that your “maverick genius” coauthors should have known what was out there.

Thursday, July 31, 2014

Proving a theorem of Ewert, Marks, and Dembski

[Edit: This post essentially does right what Ewert, Dembski, and Marks did wrong. I’ve since shown that the theorem follows from a very simple result in probability.]

In what appears to be a forthcoming journal article [published version], ID creationists Winston Ewert, William Dembski, and Robert J. Marks II state a formal result, and justify it merely by citing a three-page paper in the proceedings of a symposium. I was already annoyed with the reviewers and the editors for missing a huge defect, and decided to while away the sleepless hours by checking the putative proof in On the Improbability of Algorithmic Specified Complexity.

The formalism and the argument are atrocious. I eventually decided that it would be easier to reformulate what I thought the authors were trying to say, and to see if I could generate my own proof, than to penetrate the slop. It took me about 20 minutes, working directly in LaTeX. Then I decided to provide some explanation that is missing in the paper.

The theorem is correct. As Ewert, Marks, and Dembski put it, "The probability of obtaining an object exhibiting $\alpha$ bits of [algorithmic specified complexity] is less than or equal to $2^{-\alpha}$." It is something that they can establish a property like this. But algorithmic specified complexity is a sum of bits of Shannon self-information and bits of Kolmogorov complexity, which seem like apples and oranges to me.

I should mention that the result probably comes from Ewert's doctoral dissertation, Algorithmic Specified Complexity, still under wraps at Baylor. Evidently Dembski, a mathematician, did not edit the paper.

The following makes more sense if you read Sections I and II of the paper. Those of you with a bit of math under your belts will be amazed by the difference.

$\newcommand{\suppmu}{\mathrm{supp}(\mu)} \newcommand{\B}{\mathcal{B}} \newcommand{\X}{\mathcal{X}} \renewcommand{\S}{\mathcal{S}} \newcommand{\P}{\mathcal{P}} \newcommand{\Bprime}{\mathcal{B}^\prime}$The set of all strings (finite sequences) on $\B = \{0, 1\}$ is $\B^*.$ Assume that the binary encoding $e: \S \rightarrow \B^*$ of the set $\S$ of objects of interest is 1-to-1. This allows use of the set of codewords $e(\S)$ in place of $\S.$

The set of all programs $\P$ for universal computer $U$ is a prefix-free subset of $\B^*.$ That is, no program is a proper prefix of any other. The conditional Kolmogorov complexity $K(x|y)$ is the length $\ell(p)$ of the shortest program $p$ that outputs $x$ on input of $y,$ i.e., \[ K(x|y) = \hspace{-0.3em} \min_{p : U(p, y) = x} \hspace{-0.4em} \ell(p). \] The Kraft inequality \[ \sum_{x \in \X} 2^{-\ell(x)} \leq 1 \] holds for all prefix-free sets $\X \subset \B^*,$ including the prefix-free set of programs $\P.$ It follows that for all $\X \subseteq \B^*,$ \[ \sum_{x \in \X} 2^{-\!K(x|y)} \leq \sum_{p \in P} 2^{-\ell(p)} \leq 1, \] where $y$ is a string in $\B^*.$ In the first sum, all terms correspond to distinct programs, and each exponent $-\!K(x|y)$ is the negative length of a program that outputs $x$ on input of $y$.

Theorem 1. Let $\mu$ be a probability measure on encoded objects $e(\S).$ Also let \[ X = \{x \in \suppmu \mid -\!\log_2 \mu(x) - K(x | y) \geq \alpha\}, \] where $y$ is a string in $\B^*$ and $\alpha \geq 0.$ Then $\mu(X) \leq 2^{-\alpha}.$

Proof. Rewrite the property of string $x$ in $X$ to obtain a bound on $\mu(x):$ \begin{align*} -\!\log_2 \mu(x) - K(x | y) &\geq \alpha \\ \log_2 \mu(x) + K(x | y) &\leq -\alpha \\ \log_2 \mu(x) &\leq -\alpha - K(x | y) \\ \mu(x) &\leq 2^{-\alpha - K(x | y)}. \end{align*} Applying the bound, \begin{align*} \mu(X) &= \sum_{x \in X} \mu(x) \\ &\leq \sum_{x \in X} 2^{-\alpha -K(x | y)} \\ &= \sum_{x \in X} 2^{-\alpha} \cdot 2^{-K(x | y)} \\ &= {2^{-\alpha} \sum_{x \in X} 2^{-K(x | y)}} \\ &\leq 2^{-\alpha}. \end{align*} The last step follows by the Kraft inequality. Q.E.D.

Thursday, February 6, 2014

Bronowski, Michelangelo, Moore, and Einstein

I'm watching Jacob Bronowski's documentary series The Ascent of Man. You'll find in the following transcript a better account of how sculpture takes form than you'll get from any intelligent-design theorist. The notion that there's an independent design that the sculptor forces upon the stone is simply wrong. But to dwell on that would be to miss what Bronowski emphasizes, an interesting analogy of science to sculpture. On reflection, I saw the similarity of his remarks to some by Einstein, which I quote below. Hopefully someone out there will find the connection interesting.

BRONOWSKI: A popular cliche in philosophy says that science is pure analysis or reductionism, like taking the rainbow to pieces, and art is pure synthesis — putting the rainbow together. This is not so. All imagination begins by analysing nature. Michelangelo said that.

When that which is divine in us doth try
    To shape a face, both brain and hand unite
    To give, from a mere model frail and slight,
    Life to the stone by Art's free energy.

BRONOWSKI: The material asserts itself through the hand and thereby prefigures the shape of the work for the brain. The sculptor, as much as the mason, feels for the form within nature.

The best of artists hath no thought to show
    Which the rough stone in its superfluous shell
    Doth not include: to break the marble spell
    Is all the hand that serves the brain can do.

[TOM: See Sonnets XIV and XV here.]

BRONOWSKI: By the time Michelangelo carved the head of Brutus, other men quarried the marble for him. But Michelangelo had begun as a quarryman in Carrara and he still felt that the hammer in their hands, and in his, was groping in the stone for a shape that was already there. The quarrymen work in Carrara now for the modern sculptors who come here — Marino Marini, Lipschitz and Henry Moore. Their descriptions of their work are not as poetic as Michelangelo's, but they carry the same feeling.

HENRY MOORE: To begin with, as a young sculptor, I couldn't afford expensive stone. And I got my stone by going round the stone yards, and finding what they would call a random block. Then I had to think in the same way that Michelangelo might have done, so that one had to wait until an idea came that fitted the shape of the stone. And that was seeing the idea in that block.

BRONOWSKI: Of course, it can't be literally true that what the sculptor imagines and carves out is already there, hidden in the block. And yet the metaphor tells the truth about the relation of discovery that exists between man and nature. In one sense, everything that we discover is already there. A sculptured figure and the law of nature are both concealed in the raw material. And in another sense, what a man discovers is discovered by him. It would not take exactly the same form in the hands of someone else. Neither the sculptured figure nor the law of nature would come out in identical copies when produced by two different minds in two different ages. Discovery is a double relation of analysis and synthesis together. As an analysis it probes for what is there. But then, as a synthesis, it puts the parts together in a form in which the creative mind transcends the bare limits, the bare skeleton that nature provides.

BRONOWSKI: Sculpture is a sensuous art. The Eskimos make small sculptures that are not even meant to be seen, only handled. So it must seem strange that I choose as my model for science sculpture and architecture. And yet it's right. We have to understand that the world can only be grasped by action, not by contemplation. The hand is more important than the eye. We are not one of those contemplative civilisations of the Far East or the Middle Ages that believed that the world has only to be seen and thought about and who practised no science. We are active, and indeed we know in the evolution of man, that it is the hand that drives the subsequent evolution of the brain. We find tools made by man before he became man. Benjamin Franklin called man the "tool-making animal." And that's right. And the most exciting thing about that is that even in prehistory, man already made tools that have an edge finer than they need have.

[TOM: I've cleaned up this transcript.]

EINSTEIN: The reciprocal relationship of epistemology and science is of noteworthy kind. They are dependent upon each other. Epistemology without contact with science becomes an empty scheme. Science without epistemology is — insofar as it is thinkable at all — primitive and muddled. However, no sooner has the epistemologist, who is seeking a clear system, fought his way through to such a system, than he is inclined to interpret the thought-content of science in the sense of his system and to reject whatever does not fit into his system. The scientist, however, cannot afford to carry his striving for epistemological systematic that far. He accepts gratefully the epistemological conceptual analysis; but the external conditions, which are set for him by the facts of experience, do not permit him to let himself be too much restricted in the construction of his conceptual world by the adherence to an epistemological system. He therefore must appear to the systematic epistemologist as a type of unscrupulous opportunist: he appears as realist insofar as he seeks to describe a world independent of the acts of perception; as idealist insofar as he looks upon the concepts and theories as the free inventions of the human spirit (not logically derivable from what is empirically given); as positivist insofar as he considers his concepts and theories justified only to the extent to which they furnish a logical representation of relations among sensory experiences. He may even appear as Platonist or Pythagorean insofar as he considers the viewpoint of logical simplicity as an indispensable and effective tool of his research. [Wikiquote]

BOUNDED SCIENCE: There can be no scientific explanation of science.

Subscribe to: Posts ( Atom )