GHCN monthly data (v2 or v3) may not always be quite what it seems. Consider Nitchequon (Canada: 40371826000), looking here at the five individual time series in GHCN v2:
Some missing years – but where’s the problem?
If we look at the same station in GHCN v3, where Gistemp uses the GHCN v3 *.qca.dat (adjusted) file as input:
the early years have suffered a drastic drop in temperature, by some 5°C. (The first Gistemp plot above, using GHCN v2 data, did not show the recent years, as v2 had more missing values. The monthly values in 2009/2010 match those in GHCN v3, but there were insufficient months available to calculate annual means). Slashing past temperatures by 5°C up to 1985 achieves an impressive 1°C/decade warming over the recent past, but rather stretches credibility. But still, that is not the real problem here.
UPDATE: I should have drawn attention to the Gistemp update:
April 12, 2010: Reports have been coming in from Nitchequon, a ghost town in Quebec, after a long gap from 1986-2006; however those data don’t seem to be consistent with the older records. Hence they are disregarded until further notice.
where use of the incorrect data ceased. When Gistemp moved to use GHCN v3 data in place of v2 data however, use of the incorrect data (as well as that from a number of other stations which had data deemed invalid when using GHCN v2) appears to have resumed. Nitchequon is no longer included for removal in Ts.strange.v3.txt
FURTHER UPDATE: here is the effect of including data from Pangnirtung from 1996 onwards as part of the Nitchequon station record.
The blue dot in the figure shows the location of Nitchequon, the red dot the location of Pangnirtung, the source of the false “Nitchequon” data. [2011, 2012_02 data] indicates that these are the gridded temperature anomalies for 2011 (the last complete year) based on the data used for the February 2012 GISS update. The reason for the choice of this update is that GISS no longer provides the fuller archive of intermediate and final output files which were provided for a period in the past. Currently the adjusted station data from this particular update is available from GISS servers, and it is this update against which I have validated my own GISTEMP implementation.
In this figure the gridded data when the invalid Nitchequon data have been removed have been subtracted from the gridded data where these invalid data are still used (as in the current NASA GISTEMP analysis. This is the effect on the regional gridded temperature anomalies of inclusion of eleven years of invalid data at a single station, Nitchequon.
The area closer to Nitchequon is cooled by up to 0.037°C, while an area further east, just offshore from Newfoundland and Labrador, is warmed by up to 0.008°C. In absolute terms these are not very large changes, but it should be remembered that this is the effect of only eleven years of invalid data at only one station.
So, what is happening here. Well, WMO StationID 71826 corresponds to “PANGNIRTUNG, NU”, not Nitchequon, and these two locations are approximately 1454 km distant from each other. Examination of the GHCN v3 *.qcu.dat (unadjusted) station data:
and comparison with the station data from Environment Canada, shows that the early years, up to 1985, do indeed come from Nitchequon:
but the later years, from 1996 onwards, would seem to come from Pangnirtung:
It seems that a shortage of assigned WMO StationID numbers in Canada has resulted in recycling of some of these numbers from old stations to current stations. There are four other Canadian stations in the GHCN v3 inventory which have StationIDs which have been reassigned to locations between 425 km and 2624 km distant from the original assigned locations, but fortunately none of these have any recent data, so Nitchequon/Pangnirtung may be the only Canadian station to undergo this rather unique form of homogenisation.
There are a small number of stations in other countries where the current WMO location differs from the GHCN inventory location by hundreds of km, including some where the station name remains unchanged, but the distance involved suggests an error in one location rather than a simple station move. Finding recent station data from national services for these stations is rather easier than finding the corresponding historic data needed to check the GHCN data. I will update this post with details of any other similar cases as and when I obtain the necessary data. (There may also be cases where the distance between “old” and “new” locations is measured in tens of km rather than hundreds or thousands, but is still greater than a simple station move to a “nearby” location. In such cases the temperatures for the two locations are likely to be more similar, and it will probably not be worth while checking for such cases). These errors, even if there are more to be found, are unlikely to have much effect on gridded results, but, like the numerous metadata errors, are another indication of carelessness in data acquisition and archiving.
There are also some indications of data acquisition problems. Looking at recent data for Irish stations in GHCN v3 .*qcu.dat (unadjusted) files I noticed that in May the April monthly mean for Valentia (62103953000) was first recorded as 4.2°C, then as 0.3°C, and finally as the (correct) value of 8.4°C. The data source flags for these values indicate that the first two (flag:P) derive from CLIMAT data, the third (flag:K) was “received by the UK Met Office”. Fortunately the two wildly wrong values are flagged with quality flag “O”, indicating that these values are more than five standard deviation units from the mean for that month. Had these wild values been a mere 4.99 standard deviation units away from the month mean they would presumably have been considered valid in further processing. What weight would they have been assigned relative to valid values? Could such wild values play a role in causing the erratic behaviour of adjusted (*.qca.dat file) values observed for some stations in various GHCN v3 versions?
During the period when GHCN files were showing an incorrect value derived from CLIMAT data I checked the CLIMAT data at OGIMET, and found that it was correctly decoded there, but that the raw CLIMAT report did not fully conform to the WMO specification for CLIMAT reports, WMO-No.306 (Manual on codes volume I.1, Part A – Alphanumeric), and I have notified the Irish Met Service of this departure from specification. My suspicion is that OGIMET have coded CLIMAT report interpretation defensively, and allowed for possible (and likely?) departures from the specification, while NOAA may have coded relying on conformance to specification, and may be picking up garbage values from memory.
Unfortunately, the WMO has published “Practical Help for Compiling CLIMAT Reports”, with the disclaimer “This document is not an official publication of WMO and has not been subjected to its standard editorial procedures. The views expressed herein do not necessarily have the endorsement of the Organization”, and which at one point appears to suggest that the departure from specification which I found is acceptable.
Elsewhere in the “Practical Help” you may find: “Manual on Codes (WMO-No. 306) which defines rules that must be followed by the symbols within a code to be considered correctly and unequivocally by the computing device”. The “Manual on Codes” itself is unambiguous on the correct coding at this point. But the damage is done by having a much shorter “Practical Help” which suggests that an alternative coding is acceptable. Whether or not “This document is not an official publication of WMO”, this much shorter and easier to read document will be used for guidance in coding CLIMAT reports. Defensive programming is required here.
I was also amused to find that the data source flags for three Irish stations between 1981 and 1990 indicated “J”: “Colonial Era Archive Data”. Someone should have told us that those regions had been recolonised!