Sunday, August 22, 2010

A comment on M+W10 submitted to

I submitted a comment to the post Doing it yourselves (20 August 2010), as the author makes some interesting remarks on the intersection of McShayne and Wyner (2010) and the Tiljander proxies. My comment entered the moderation queue last night after position #41, and wasn't among the ten comments that have been released in three batches this morning. Perhaps it has been failed, or perhaps it's being delayed. If does make a belated appearance (accompanied with inline commentary?), I'll note that in an update.

[ UPDATE 22 Aug. 2010 3:20 PM EDT -- In the past hour, my comment passed moderation, and was slotted into position #42 (the comment count is currently at 60). Gavin Schmidt's inline commentary is reproduced at the tail of this post. -- AMac ]

"M&W10" is the recently-released preprint of "A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?," by BB McShane and AJ Wyner. Because it questions the methodology of Mann08, it's already beloved by many critics of the AGW Consensus. However, this is a very lengthy and complex statistical treatment that is performed by two non-paleoclimatologists, done without the benefit of insights from insiders. It has, predictably, been slammed by the most-capable pro-AGW Consensus science-bloggers. It seems to me that many of these critiques have merit: the authors would benefit from a rewrite (and a shortening/focusing), if this is possible at this late stage in the submission process.

That said, what I take as M&W10's major point seems very timely: a focus on the tightness of the uncertainty estimates that accompany paleoclimate reconstructions like those published by Prof. Mann's group (M&W10 takes Mann08 as its jumping-off point). My sense is that the confidence expressed by the narrowness of these papers' uncertainty bars is very misplaced. Of course these are opinions. I don't claim the statistical chops to comment in detail on M&W10's methods.

The figure in "Doing it yourselves" that touches on the Lake Korttajarvi data series is reproduced (fair use) below. Here is the paragraph that accompanies it --
It’s also easy to test a few sensitivities. People seem inordinately fond of obsessing over the Tiljander proxies (a set of four lake sediment records from Finland that have indications of non-climatic disturbances in recent centuries – two of which are used in M&W). So what happens if you leave them out?

That figure seemed bereft of context. It is referring to which of M&W10's sections? Using M&W10's methods, the Realclimate authors are excluding the Tiljander proxies from... what, exactly?

So I submitted the following comment. Rereading it, I see that I edited away my "a question" during one of my rewrites. That is, "Could you show the output of your M&W10-style 'Lasso' reconstructions for Non-Dendro proxies and for Non-Dendro/Non-Tiljander proxies? Those would be comparable to the Non-Dendro and Non-Dendro/Non-Tiljander traces in the current version of Mann08's Fig S8a."

AMac says:
Your comment is awaiting moderation.
21 August 2010 at 9:59 PM
A question, and two points of clarification.

First, thanks for the graphics that show various "Lasso" outputs with the Tiljander proxies omitted. This post's "No Tiljander Proxies" figure with the six reconstructions (1000 AD - 2000 AD) is an extension of M&W10's Figure 14. As such, it's a little hard to interpret without reference to that legend.

According to Fig. 14:

* The dashed Green line is the M&W10 backcast for Northern Hemisphere land temperature that employs the first 10 principal components on a global basis.

* The dashed Blue line is the backcast for NH land temperature that employs the first 5 PCs (global), as well as the first 5 local (5x5 grid) PCs.

* The dashed Red line is the backcast for NH land temperatures that employs only the first PC (global).

All three dashed lines appear to be based on the use of the entire data set used in Mann et al, (PNAS, 2008)--both Tree-Ring proxies and Non-Dendro proxies.

This is an important consideration, as there seems to be general agreement that Mann08's Dendro-Including reconstructions are not grossly affected by the exclusion of the Tiljander proxies. This point was made in Mann08's Fig. S8a (all versions).

However, there has been much contention over the extent to which NH land reconstructions restricted to non-dendro proxies are affected by the inclusion or exclusion of the Tiljander varve data. See, for example, the 6/16/10 Collide-a-scape thread The Main Hindrance to Dialogue (and Detente).

Mann08 achieved prominence because of its novel findings: the claim of consistency among reconstructions based on Dendro proxies, and those based on Non-Dendro proxies. Mann08 also claimed that the use of Non-Dendro proxies in could extend validated reconstructions far back in time. This is stated in Mann08's abstract (see also press release).

Thus, the matters of interest would be addressed if the dashed and solid traces in this post's "No Tiljander Proxies" figure were based only on the relevant data set: the Non-Dendro proxies.

(The failure of Non-Dendro reconstructions to validate in early years in the absence of Tiljander was raised by Gavin in Comment #414 of "The Montford Delusion" thread (also see #s 483, 525, 529, and 531). While progress in that area would be welcome, it is probably difficult to accomplish with M&W10's Lasso method.)

The first point of clarification is that the Tiljander proxies cannot be meaningfully calibrated to the instrumental temperature record, due to increasing influence of non-climate factors post-1720 (Discussion). This makes them unsuitable for use by the methods of Mann08 -- and thus by the methods of M&W10 -- which require the calibration of each proxy to the 1850-1995 temperature record.

The second point of clarification is concerns this phrasing in the post:
People seem inordinately fond of obsessing over the Tiljander proxies (a set of four lake sediment records from Finland that have indications of non-climatic disturbances in recent centuries – two of which are used in M&W).
The "four lake sediment records" used in Mann08 are "Darksum," "Lightsum," "XRD," and "Thickness." The authors of Tiljander et al (Boreas, 2003) did not ascribe meaning to "Thickness," because they derived "Darksum" by subtracting "Lightsum" from "Thickness." Thus, "Thickness" contains no information that is not already included in "Lightsum" and "Darksum."

In other words, there are effectively only three Tiljander proxies (Figure).
- - - - - - - - - -

M&W10's Figure 14 is reproduced (fair use) below:

FIG 14. Backcasts to 1000 AD from the various models considered in this section are plotted in grey. CRU Northern Hemisphere annual mean land temperature is given by the thin black line with a smoothed version given by the thick black line. Three forecasts are featured: regression on one proxy principal component (red), regression on ten proxy principal components (green), and the two stage model featuring five local temperature principal components and five proxy principal components (blue).
- - - - - - - - - -

[ UPDATE 22 Aug. 2010 3:20 PM -- My comment passed moderation and was slotted into position #42, with its original timestamp of 21 August 2010 at 9:59 PM. Gavin Schmidt's inline commentary follows, with [Notes] for my follow-ups inserted. -- AMac ]

[Response: We aren't going to go over your issues with Mann et al (2008) yet again [Note 1] - though it's worth pointing out that validation for the no-dendro/no-Tilj is quite sensitive to the required significance, for EIV NH Land+Ocean it goes back to 1500 for 95%, but 1300 for 94% and 1100 AD for 90% (see here). But you missed the point of the post above entirely. The point is not that M&W have the best method and it's sensitivities need to be examined, but rather that it is very easy to edit the code and do what ever you like to understand their results better i.e. "doing it yourself".[Note 2] If you want a no-dendro/no-Tiljander reconstruction using their methodology, then go ahead and make it (it will take just a few minutes - I know, I timed it - but to help you along, you need to change the selection criteria in R_fig14 to be sel < - (allproxy1209info[,"StartYear"] <= 1000) & (allproxy1209info[,2] != 7500) & (allproxy1209info[,2] != 9000) (no_dendro) and change the line proxy < - proxy[,-c(87:88)] to proxy < - proxy[,-c(32:35)] (no_tilj)). [Note 3] Note that R_fig14 does not give any info about validation, so you are on your own there. The bottom line is that it still doesn't make much difference (except the 1PC OLS case, which doesn't seem very sensible either in concept or results anyway). [Note 4] - gavin]

[Note 1] There is no "yet again" to discuss. The problems with Tiljander have never been addressed. Not by Dr. Schmidt, not by Prof. Mann, not by any of Mann08's other authors. As stated in my comment, the main issue is that the Tiljander proxies cannot be calibrated to the instrumental record, and thus are wholly unsuited to Mann08's methods.

[Note 2] It is silly to propose that I have "missed the point of the post entirely." It is silly to suggest that I've claimed that M&W10 "have the best method and it's sensitivities need to be examined."

[Note 3] Like most pro-AGW-Consensus advocates, most lukewarmers, and most skeptics: I am not conversant in either "R" or MatLab. Thus, I cannot immediately profit from Dr. Schmidt's sincere and well-meaning advice.

[Note 4] Dr. Schmidt links to the No-Dendro/No-Tiljander variant of M&W10's Figure 14 that he generated:

That is an informative figure! It suggests that exclusion of the Tiljander proxies does not greatly alter the Non-Dendro reconstructions obtained by M&W10's "Lasso" method, if the first 10 global PCs are used (Green), or if the first 5 global PCs and the first 5 gridded PCs are used (Blue). On the other hand, the use of only the first global PC shows something interesting. Adding in the uncalibratable Tiljander proxies completely changes the character of the first principal component. Without it, PC1 of the Non-Dendro proxies follows causes the anomaly trace (solid Red line) to follow the approximate general path of the followed by the anomaly trace when governed by the 10 global PCs and the 5 global PCs plus 5 gridded PCs (solid Red line). Add in Tiljander (dashed Red line), and the PC1-governed trace flatlines around -0.3 C for the duration of the reconstruction.

Perhaps those better-versed than me in principal component analysis will be able to make more sense of this figure.

[ UPDATE 23 Aug. 2010 -- Wording two paragraphs up altered. In reviewing Gavin's new No-Dendro/No-Tilj reconstruction figure, I am discussing the shapes of the anomaly traces governed by the principal component(s). These are not the PCs themselves -- AMac ]


  1. Good post. Detailed and well-written.

    I get your point that the major issue is the use of Tiljander to create the "non-dendro" chronology (which was a major reason why M08 was PNAS notable).

    RC is huddling to give the right rebuttal or just slowing the discussion down. It's unfortunate that they do that. That the can't just let opponents put stuff up without the need for Voice of God rebutalls.

    Also annoying that McIntyre exhibits the same behavior to some extent (e.g. MMH thread closing).

  2. What does the no-dendro versus no Tiljander/no dendro in M08 methods look like? We've talked a lot about significance, but how do the shapes change?

  3. > What does the no-dendro versus no Tiljander/no dendro in M08 methods look like [wrt shape]?

    Short answer, I don't know. Gavin/Mike say that Mann08 2x-revised S8a shows that Tiljander "doesn't matter." Steve McI says that his emulation of Mann08's MatLab code in R shows that Tiljander matters, a lot.

    Gavin's latest traces of No-Dendro/No-Tilj by Lasso support his idea that it doesn't matter (Green, Blue). But then Red implies a major change to the way the first PC looks, with/without Tiljander. The Consensus AGW team's record on speaking straighforwardly is not good when it comes to Tiljander, so I wonder.

    I have a falsifiable hypothesis to address the subject, but absent the ability to understand and run R, I can't run the test. I'll learn eventually.

  4. AMac,

    The M&W method was not benchmarked against synthetic proxies. This is a huge problem since it is at odds with common practice in the field.

    AFAICT the claims, both in the paper and in the abstract were pretty conservative, as Gavin has shown and you have reproduced even with the (questionable) M&W methodology. You also have to remember that the red curve has nothing to do with the Mann 08 methodology (and as Gavin points out, it doesn't make much sense). Follow the bouncing ball. Remember that Mann only claimed a no dendro significance to 1500.

  5. That should be the claims in M08 were coservative.

  6. Rattus, thanks for stopping by.

    > the claims in M08 were conservative.

    That depends on what you mean by "conservative."

    If it's a generic compliment -- "you look sharp in that suit!" -- well, sure.

    If it connotes "in general accord with the field's consensus expectations," I'd agree.

    But if you mean "not a particularly noteworthy advance," the very fact that it was published in PNAS is prima facie evidence against the assertion. Along with the authors, the editors, and reviewers of this high-impact peer-reviewed journal clearly thought otherwise. This is borne out by a read of the abstract and of the Penn State press release that accompanied the paper.

    In my own view: it is quite radical to behave as though uncalibratable data series can be incorporated into work that is absolutely reliant on the calibration of data series to the instrumental temperature record.

    The tactics of the authors and their allies in handling this criticism has also been unconventional, by the standards of "normal" or "modern" scientific practice.

  7. Take away the extensions further back in time and the non-dendro reconstriction and the then the paper's result (the recon itself) was too similar to previous work and not really a notable advance. It would not be a PNAS paper. Might still be RIGHT. But not new. I was honestly worried that McI was exaggerating the issue of the no-dendro recon, so I looked critically at the paper and it honestly was a highlighted result (in abstract, etc.)

    What Mike did was add more proxies (cavesickles and lake mud) in M08 to get more fundamental results. But he did not adequately explain the issues with those proxies (if he had, it would have been less of a PNAS paper).

    Annoyingly, he also changed methodology at the same time, he added more we can't reallly deconfound what changed drives what. BTW, this is a crit I have of the Mcs as well. And of course the PNAS length restrictions and just gernalist subject are not appropriate for real plus and minus digging into different methods and listing their details and all that.

  8. TCO,

    Mann has several papers prior to Mea 2008 which looked at the performance of the RegEM methods vs. the older ones, so I am not sure that your criticism of "changing methodologies" is warranted. You have to look at the history, which will not show up in a single paper.

  9. Rat:

    1. I'm aware of several of the types of papers you refer to. I find them lacking in the sort of method establishment needed (they have the form of that and are referred back to in the manner which one would do. But when I look at them and how they are used later, am not really satisfied.) I could be wrong of course, but that's my honest take. And definitely the existence and pattern of these references is not an "aha" to me. As I have considered them and have a perspective on their quality.

    2. My honest (I am VERY capable of finding fault with my side) and with perspective (I've published (in one) and used journal literature in many fields...and just sort of have a view of how science is done and what is "quality") view:

    Mike touts other papers as method defining. But he does not really do the heavy lifting, even of someone like Tapio Schneider. I've gone back to look at some of them. Believe me...I'm all about reading the refs and the refs to the refs. When I investigated an interesting paper in physical sciences, I almost always pull the 40 refs to the paper. And some of the refs to refs. And at least scan them!

    Mike touts his method explaining papers in the way, that you would expect. But when you really dig back to look at them, you don't see the solid multi-factor type mathematical analysis of algortihms. Really it is more like building a supporting argument in a softer science...than what I'm used to say in physics or chemistry or math or even hard core market research. More of trying to sticht things together and make it all support each other. than to really develop a new method or instrument (for instance a new AFM) and then use it for problems (for instance evaluating surface nature of a catalyst). This is a personal perspective, but I urge you to at least consider that I might have enough self-awareness to make the judgement not trhough disliking the fellow. And a pretty good ability to parse and even "feel" quality of science work across many fields.

  10. A Mann 08 coauthor was a coauthor with Kaufmann on the paper that issued a correction for upside-down Tiljander.