A homogeneity problem?
The dropout (yes, I know the history, but dropout is the term in general use) of GHCN stations in the 1990’s has been the subject of much discussion. As a contribution to this discussion, here are the results of a comparison of the standard Gistemp output (based on the 2010_05 data) and the output from a run restricted to use only stations which have reported at some point in the past five years (i.e. January 2005 onwards).
This comparison shows that the effect of station dropout on anomaly trends in recent years is negligible. Inclusion of these “dropped out” stations in the Gistemp analysis does however seem to have the effect of cooling past values, and this does seem to impose a warming bias if comparisons are made or trends calculated for the period 1880-2009, 1880-present or relative to “the beginning of the 20th century (1880-1920 mean)”. Looking no further than the draft Hansen et al. paper such comparisons or trend calculations can be found. For this reason I think that I am unable to agree with the simple conclusion at the Clear Climate Code Blog that “The 1990s station dropout does not have a warming effect“. Justification for inclusion of such stations in the analysis may be needed unless discussion is restricted to trends in recent years only.
(June 29: updated global mean anomaly plots to add line at zero anomaly, and added plots of difference between “After 2004” stations and all stations)
(July 01: updating now to include plots for stations no longer included)
(July 05: minor updates thought appropriate as I have not yet carried out more formal tests)
I have a number of other posts at various stages of completion, and in particular my comments on the draft Hansen et al. paper, but other matters, and the long delayed return of my new laptop (or rather the hard disk from my new laptop installed in another new laptop), and the added complications of synchronising work in progress instead of simply transferring files from old to new have delayed these more than I expected. This post, although appearing first, could be considered as an appendix to my comments, yet to appear, on that draft paper.
My choice of stations reporting after 2004 differs from the 1992 choice made at the Clear Climate Code Blog, but in fact very similar results are to be expected: relatively few stations “drop out” between 1992 and 2005. My choice of 2004/2005 was based in part on a view that the most recent five years would be a reasonable interpretation of “currently reporting”, and in part probably also influenced by whatever I had most recently read on this topic (unfortunately I am unsure what that was, so I cannot confirm my suspicion that this may have played a role).
All these results should be read with my disclaimer in mind – until accurate metadata becomes available the rural/urban classification of stations is subject to some doubt, and the validity of UHI/negative UHI adjustments must consequently also be treated with some suspicion. See my soon to be published comments on the draft Hansen et al. paper for further comments on this topic.
My results for the standard Gistemp output may be compared with the official Gistemp results at http://data.giss.nasa.gov/gistemp/time_series.html. I have matched the colour key for the time series of zonal means below as closely as possible to the official Gistemp plots for ease of comparison. To obtain the correct official Gistemp plots, select:
- Sources: (Station data only – 1200km smoothing)
- Time interval: (end 2009)
A slight visual difference between my plots and the official plots is due to my use of the R filled.contour function, which merges adjacent colour zones by bevelling corners, whereas these corners remain sharp in the official plots. The data plotted, output from STEP3, matches the official output with high precision (I have retained the standard Gistemp integer calculations and rounding here to achieve this match). The averaging for each latitude zone however is not based directly on standard Gistemp code, which together with the smoothing applied by the R filled.contour function may also slightly increase the visual difference. Apart from a small difference with respect to cells with missing data, which I am investigating, the plot below is for practical purposes identical to the official plot.
The next plot presents the results for the run which I have inelegantly described as “Station Data (after 2004) only”, being a run restricted to use only stations which have at least some data from 2005 or later.
and the difference between these two runs shows small differences after the 1951-1980 anomaly base period, but greater differences further back in the past. In general, when stations without current data are omitted, smaller negative anomalies are found at high northern latitudes and at southern latitudes other than close to the equator, while greater negative anomalies are found close to the equator:
A similar plot for stations with no data after 2004 to complete the picture:
When all latitudes are averaged to obtain global means, both runs match closely in recent years, but further in the past the overall effect of omitting the stations without current data is to shift the (negative) anomaly values back towards zero. The official Gistemp plot uses a 12 month running mean. I have added a plot using December to November means below as the comparison between the two time series can be seen somewhat more clearly in that case.
For the December-November plot I have also included the stations with no data after 2004:
Difference (After 2004 only – All stations), with loess fit:
Difference (After 2004 only – Before 2005 only), with loess fit:
Comparison of “After 2004 only” and “Before 2005 only”, expanded to show earlier years:
Comparison of “All stations”, “After 2004 only” and “Before 2005 only”, expanded to show earlier years:
Comparison of “All stations”, “After 2004 only” and “Before 2005 only”, expanded to show later years:
The increased variability observed since 1990 for the stations which do not have data in recent years, especially when compared with the almost identical behaviour of “All stations” and “After 2004 only” may seem at first sight anomalous, but, as the station count plot below shows, there are very few such stations with data between 1990 and 2004 when compared with the number of stations with data in these years and also after 2004. This also accounts for the late increase in variability for the difference (After 2004 only – Before 2005 only) plot.
Why does this difference in anomaly trends in the early years for the two station subsets arise? Probably as a result of different station characteristics in the two subsets.