This post is unfinished. I’m posting it now in order to refer to it in a comment on another blog, and intend to finish it in the near future.
April 14: finishing delayed while recoding data extraction to speed it up and enable further exploration of the effects of large GHCN adjusted value ranges.
If the title of this post seems familiar you may be recalling a Climate Audit post from 2010 NASA GISS – Adjusting the Adjustments which suggested that:
It is entirely possible that the change in GISS US since August 2007 is primarily due to the replacement of USHCN v1 methodology (TOBS and that sort of thing that we discussed in the past) with Menne’s changepoint methodology used in USHCN v2
This present post examines the volatility of the adjustments due to that changepoint methodology in the Global Historical Climatology Network-Monthly (GHCN-M v3) and the U.S. Historical Climatology Network (USHCN v2), and in NASA Gistemp, which uses the adjusted GHCN-M data as input.
The Climate Audit post considered changes resulting from a major change in methodology. Here I examine the changes resulting from the “less major” (I hesitate to use the word “minor” in the absence of disclosure of the updated code) changes in methodology from subversion to subversion, and the day-to-day changes in adjusted station records using the same subversion – hence the change in title to “Self-adjusting the Adjustments”. And while the focus of the Climate Audit post was primarily on the US, my focus is primarily global, with only peripheral discussion of USHCN.
Before examining adjustment volatility I am adding two short notes here as an aid to readers who may not be familiar with the details of temperature record adjustment:
A note on temperature record versions
GHCN-M (v3) is currently at version 3.3.0 with version 4 in beta. Gistemp uses GHCN-M data as its principal input, but does not explicitly use version numbers, so I will refer here to Gistemp v3, meaning the current Gistemp using GHCN-M v3 adjusted data as input, and Gistemp v2, meaning the previous version of Gistemp in use up to December 2011, which used GHCN-M v2 unadjusted data as input (other than for US USHCN stations, where USHCN adjusted data was substituted for the unadjusted GHCN-M data for these stations).
This change from GHCN-M unadjusted v2 data to adjusted v3 data as Gistemp input is unfortunately frequently misunderstood, giving rise to headlines such as “Massive GISS Tampering” when the headline writer uses the NASA GISS website as a download source, finds that the current GHCN-M v3 adjusted data there no longer matches the GHCN-M v2 unadjusted data archived at the site, and, failing to notice that comparing adjusted to unadjusted data is prone to show changes, jumps to the conclusion that GISS has tampered with or even destroyed the unadjusted or raw data. The saying that a little knowledge is a dangerous thing comes to mind. The data which should be compared with the GHCN-M v2 unadjusted data archived at the GISS site is of course the GHCN-M v3 unadjusted data, but that can only be found at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/, not at the GISS website, since Gistemp no longer uses unadjusted GHCN-M data.
And so, just which temperature values get adjusted?
For GHCN stations, other than those which are also USHCN stations, the most recent monthly temperature observation is retained unadjusted and adjustments are restricted to prior observations. USHCN stations however may also have the most recent monthly temperature observation adjusted. (more to be added here)
To examine GHCN adjustments therefore there is no point examining the most recent data. We need to look at past months to see how the data has been adjusted. When station records are combined to derive gridcell, zonal, and finally global results anomalies rather than absolute temperatures are used. Anomalies are calculated by subtracting the mean of temperatures for the month under consideration over a thirty year period from the temperature for that month. Gistemp used the period 1951-1980, GHCN used the period 1961-1990.
I have chosen January 1978 as the month which I examine here, unless there is a reason, such as a missing January 1978 value, to examine a different month. Generally, unless there is a changepoint within the anomaly base period, other months within that period will be similarly adjusted. So if the January 1978 value varies it is likely that the anomaly base period mean will also vary, and so the anomaly value for a recent month will also vary, even though the adjusted temperatures for recent months are kept close to the unadjusted temperatures. If the adjusted past temperatures vary substantially from one day to the next, the recent anomalies will behave similarly.
With hindsight, the choice of January 1978 may not have been the best choice. I have visited a number of high altitude alpine automated stations, which, if they existed in 1978 and were not automated, would have been inaccessible in winter and so would have had missing values for winter months. As it happens none of those which I have visited are included in the GHCN network, but it is possible that some of the stations elsewhere with missing January values are missing these values for this reason. There are not very many stations with values for other months in 1978 but not for January, so I have continued with that original choice.
And now returning to the main topic:
I accept the need for adjusted temperature records – I tend to dismiss as not worth examining further any temperature record based on a claim that adjustment is unnecessary which fails to justify that claim in particular with reference to station location, instrumentation or time of observation changes, or changing urban or land use effects.
However I would also expect successful adjustment procedures to produce adjustments which follow an appropriately adjusted version of Simon Cameron’s definition:
“An honest politician is one who, when he is bought, will stay bought.”
In other words, I would expect successful adjustments to be stable, with substantial step changes in adjusted values arising only in response to identifiable changes such as station location or instrumentation changes. Changes occurring over time, such as an increasing UHI effect with urban growth, should be reflected by adjusted values changing gradually over time.
I have archived more than 800 GHCN datasets, and these are the basis of the images below. The dates on the horizontal axis are the dates of the datasets from which the January 1978 adjusted and unadjusted temperatures are taken.
The variation of GHCN-M adjusted values over five years is just above the median range for such variation for all GHCN-M stations. The drop of approximately 0.5°C in the unadjusted GHCN-M v4 beta may seem odd, but will be considered in the next post as it is not related to adjustment. The changepoint methodology seems to have correctly identified a 1994 change in station location :
Valentia Observatory shows a slightly greater range of adjusted values, a pattern of variation which seems to change from subversion to subversion, and greatest short term variability for the current subversion, v3.3.0. The April 2012 change from manual to AWS does not seem to have caused a change in adjusted temperature.
But the first station which I examined in this way shows more problematic adjustments:
the range of adjusted January 1978 GHCN temperatures is 1.3°C (and this is not the most extreme adjusted station case – this range is exceeded by about 15% of stations). Gistemp values range even more widely., and the adjusted GHCN temperature can change substantially even overnight, as highlighted with a green box for January/February 2015. It seems highly unlikely that these changes in adjusted value for January 1978 can be explained by frequent changes in station location or instrumentation in the last five years. Marseille/Marignane is also a station with a record stretching back to 1838, and few missing values since 1880. It also seems unlikely that the arrival of data for one additional month in such a long record should cause such substantial jumps in the past adjusted value.
Does this volatility in individual station adjustments make its way through to the global record? It would be nice to be able to run the GHCN-M gridding code with the GHCN adjusted data to see whether the day to day volatility in station adjustments is reflected in day to day variability in the global record. However, contrary to the assurances of assorted trolls that “the code can be downloaded” – demonstrating only that said trolls have not bothered to check what code is available at the link they provide, if indeed they have even bothered to provide a link – the code made available is outdated, and does not include the GHCN gridding code. (The gridding code for the supplementary information for Hausfather et al. 2013 is included, but this does not appear to be used by GHCN-M). The code folder, 52i (the current output is 52j) is dated 2012-10-25, which would correspond to v.3.2.0, but the code files themselves are dated January/February 2010, predating the replacement of GHCN-M v2 by GHCN-v3.0.0, and so possibly pre-v3.0.0. As the code has since changed through v3.1.0, v3.2.0, v3.2.1 and v3.2.2 before reaching the current v3.3.0 closer examination of outdated code would seem time-wasting. Unlike GISS, NOAA/NCDC/NCEI do n0t seem to have realised the potential benefit of making current code available for outside examination.
In the absence of GHCN-M gridding code I have tested the January 9th to 10th 2013 change by using the Gistemp Step3 gridding and global averaging code with the GHCN-M adjusted data as input, without the additional SCAR data and without the additional Gistemp urban station adjustment. Changed global monthly mean values of up to 0.02°C could be found in the 2014 record, although as these were both increases and decreases the overall 2014 mean was little affected. This is not of course simply due to Marseille/Marignane. On January 9th 55% of GHCN stations currently reporting had values for December 2014, while on January 10th 83% had reported. Unfortunately testing in this way using Gistemp code is time consuming, and I have only tested these two dates. Overnight changes at other dates could be more significant, and might not cancel in the annual mean, but systematic testing would require code which could be scripted.
As the image above becomes crowded when GHCN-M v4 beta is added from October 2015, an image showing data only from the introduction of GHCN-M v3.3.0 in June 2015 follows:
USHCN station adjustments might be expected to be less volatile, as station history meta data can be used if available, although the use of estimated monthly means before the end of the current month might introduce additional variability. Here is Brewton 3 SSE, the first station in the USHCN inventory, and the only one for which station history metadata is included in the outdated code available for download.
although not all USHCN stations are as volatile. Monmouth for example displays fairly consistent adjustments from v3.2.2 on, with slightly more volatile prior adjustments cooling the past somewhat less.
More such station adjustment images can be found in recent posts on this blog.
Gistemp input change from unadjusted to adjusted GHCN-M data
Gistemp only adjusts stations considered urban (more about this in the next post). But rural stations, while not affected by definition by urban heat island effects, may nevertheless need adjustment due for example to location or instrumentation changes, or to time of observation changes.
Gistemp v2 was inconsistent in that it tried to adjust USHCN station records for non-urban effects by replacing the unadjusted GHCN-M v2 data for these stations with adjusted USHCN data. No adjustment of rural non-USHCN stations in the US, or rural stations outside the US, was attempted.
Gistemp v3 is more consistent, using GHCN-M v3 adjusted data rather than unadjusted data, so accepting whatever adjustments GHCN has already made to rural station records, and adding its own adjustment for urban stations to that already made by GHCN. Whether adding another automated empirical adjustment on top of a previous automated empirical adjustment is wise is another question, particularly where that previous automated empirical adjustment can be seen to produce erratic and volatile results.
(enough for now, more to be added later)