Sunday, July 10, 2011

Pattern Recognition

Scientists pride themselves in the ability to tease informative patterns out of masses of data. And with good reason -- that skill (or aptitude) is one of the traits that leads to insight, and thus publications and professional success.
I don't believe that gazing at "spaghetti graph" reconstructions is the best way to evaluate whether or not the Tiljander data series were used correctly in Mann08 (for links to referred-to papers and posts, see here). That's a question that's better answered by reading her paper (Tiljander03), getting a feel of what her data looks like (graphs here), and thinking about the physical meaning of the varve characteristics that go into "XRD," "lightsum," "darksum," and "thickness."

By weaving these threads together, we can figure out the solution to this puzzle:

Can the Tiljander data series be meaningfully calibrated to the instrumental temperature record, 1850-1995?

The answer is No.

There might be a way to indirectly achieve such a calibration, which was the approach that authors of Kaufman09 took with XRD after belatedly coming to grips with this problem. But there's no feasible direct approach, of the type used in Mann08 and Mann09.

This has proven to be a very contentious point. But there's no good reason it should be seen as such. Truly contentious questions have strong arguments on each side of the issue. The defenders of Mann08 don't even argue for "Yes," but rather for a stance akin to "I don't know, and it doesn't matter."

That's silly.

Knowing that the Tiljander data series were massively contaminated by non-climate signals in the 19th and 20th centuries, we can look for patterns in the reconstructions presented in Mann08 and Mann09.

Let's consider a few cartoons.

In this post, I will present the general case of a multiproxy reconstruction that runs from 500 through 1849. It could be anything -- temperature, precipitation, kangaroo population over time. What matters for this exercise is that we have three collections of data series that we are using as proxies for the information we are trying to reconstruct.

TR -- This is the large, established data set. We don't know these series are good proxies, but overall, we're pretty confident that they are. For temp recons, think "Tree-Rings" (dendro).

ND -- This is a new, smaller data set, of different types of series than comprise data set TR. We hope these series will be good proxies. For temp recons, think "Non-Dendro."

Data set ND has two parts: NDm and NDk.

NDm is comprised of most of the ND data set. In fact, it included everything in ND data set except NDk.

NDk is a single ND series. When first encountered, it seemed promising. But it was then established that NDk is contaminated. It can't contribute any real information to any reconstruction. For temp recons, think of the Korttajarvi (Tiljander) varve records.

Now, let's look at some multiproxy reconstructions.

First, here's the recon built on dataset TR (shown in Black). We think this is accurate, to the extent signified by the error range that we calculate (not shown).

Here's the recon built on dataset ND (Blue; dataset TR shown in Gray for comparison). It's encouraging -- it looks like the "hindcast" produced by TR is in line with the one produced by ND.

Here's the recon built with the combined TR and ND data sets (Red). Again, this is encouraging with respect to the mental model we are working from.

Next, let us discard the part of the ND data set that we know is bad. That is series NDk. The remaining ND data set is "NDm." Here's the recon built from the TR data set and the NDm data set (TR+NDm shown in Brown). It looks a lot like the TR data set alone.

Finally, let's look at one more recon. Here's one built from NDm series alone (Green).

There's something funny going on. This NDm recon doesn't look anything like the others!


So these are the patterns we are encountering.
  • TR recon -- The recon we started off thinking is a good representation of the past.
  • ND recon -- Similar to TR.
  • TR + ND recon -- Very similar to TR.
  • TR + NDm recon -- Very similar to TR.
  • NDm recon -- Different from TR.
The "secret sauce" that makes the ND reconstruction look like the TR reconstruction is the contaminated NDk data series. Taken together, the "legitimate" ND proxies generate a lousy recon (NDm).

However, add those same NDm data sets to the TR data set, and create a nice-looking recon (TR + NDm).

What we see is explained by these rules:
  • Use the TR data sets, and get the TR recon.
  • Use the NDm data set, and get a very different recon.
  • Combine the NDm data set with the TR data set, and get the TR recon.
  • Add the contaminated NDk series to NDm, and get the TR recon.
  • Add the contaminated NDk series to NDm and TR, and get the TR recon.
The NDm data series are a flop. But there seems to be something "magical" about both TR and NDk. Add either or both, and a TR-like recon appears!

What is the basis for this magic?

[UPDATE, Dec. 16, 2011 -- In the following comment on a recent Climate Audit thread, I describe this pattern from another perspective. Slightly modifed from the original.]
AMac Posted Dec 15, 2011 at 10:09 AM

Comparisons of reconstructions built from tree-ring proxies with those built from non-dendro proxies can be quite difficult. This is partly because the patterns are non-intuitive, and partly because the figures in top-tier papers such as Mann08 and Mann09 are confusing, even to the point of being misleading.

Here is the key observation. The addition of the uncalibratable, upside-down Tiljander data series to a multiproxy reconstruction often has effects as follows:

* (A) If the No-Tiljander multiproxy recon already had the “hockey stick” shape, the With-Tiljander version will be largely unchanged.

* (B) If the No-Tiljander multiproxy recon did not have a “hockey stick” shape, the With-Tiljander version will be changed so that it has a “hockey stick” shape.

* (C) If the No-Tiljander multiproxy recon did not have a “hockey stick” shape and fails certain “validation” tests, the With-Tiljander version with a “hockey stick” shape will “pass” those tests.

Points (B) and (C) are best illustrated by a figure that clearly shows the relevant pair of reconstructions. Such figures are absent from Mann08 and Mann09.


  1. his NDm recon doesn't look anything like the others!



  2. nevermind, I see my mistake

  3. Right... it's a counterintuitive result.

    The recon built from the non-dendro proxies without Tiljander look nothing like the dendro recon.

    Then, add in the Bad Data that comes from using the uncalibratable Tiljander proxies... and the effect is to "improve" the reconstruction -- it looks more like the one based on tree-ring proxies.

    Huh? Add contaminated even upside-down data, and get a better reconstruction?!?


    It's magic!

    (and everyone in climate science knows that NDk contains The Magical Climate Ingredient)

  5. I should say not only does it fail verification without NDm but it passes verification when NDm is included, even though everyone is in agreement that it (NDm) cannot be meaningfully calibrated to the temperature record upon which the verification is calculated ANYWAY (if I understand the verification calculations correctly). You'd think that aside from questioning their reconstruction, they'd question their verification technique. I'd love to see a proper statistical journal publish a paper on RE.

  6. Anonymous 2:59 PM --

    > I should say not only does it fail verification without NDm but it passes verification when NDm is included, even though everyone is in agreement that it (NDm) cannot be meaningfully calibrated to the temperature record upon which the verification is calculated

    By bringing up "verification," you raise another mystery.

    However, you have to be careful to define indefinite articles like "it." I'll try and rephrase what you wrote.

    "The ND data set generates 'passing' validation statistics. But, try the same exercise once you've taken out the NDk Bad Data series. Now, it -- 'it' being NDm -- fails validation!"

    These strange validation results again point to the "magic" in the NDk Bad Data series.

    WWSS?* "A curious pattern, indeed!" or "I don't know, and it doesn't matter"?

    - - - - -

    * What Would Sherlock Say?

  7. this is the same sort of analysis that is at issue in the original MM papers, where bristlecones plus a variety of other proxies yield bristlecones.

    Mann tried to frame the issue as the "right" number of PCs to use, but the fundamental issue is whether bristlecones (and more narrowly Graybill's bristlecone chronologies from the 1980s) are magic thermometers for the entire world.

    My interpretation at the time is that the phenomenon you describe from the central limit theorem. There is no "signal" in the majority of proxies - they are literally just red noise. Bristlecones plus red noise gave as good a reconstruction as bristlecones plus "proxies".

    In the reconstruction period, the red noise cancels out. In the calibration period, correlation weighting orients the noise to reinforce the stick - a phenomenon more or less independently observed by non-climate science statisticians looking at the problem (me, Jeff Id, Lubos, David Stockwell).

    CPS reconstructions work differently but end up with the same result. The properties of the series are known ex ante and series like bristlecones and Yamal are used over and over.

  8. Watch out for that "just red noise". McI likes to hide signal in there. Play games with super paramaterized noise, that he reveals obliquely (and hides from when called on it). And he likes him some pick 100 too...

  9. And usually he will say "just the bcps". But he leaves out Gaspe and Yamal (on very different regions). So, yeah...if you cut out the three most important aspects of the signal, the reconstruction gets worse...duh. But it's not even single culling.