The Effect of Station “Dropout” on Gistemp

A homogeneity problem?

The dropout (yes, I know the history, but dropout is the term in general use) of GHCN stations in the 1990’s has been the subject of much discussion. As a contribution to this discussion, here are the results of a comparison of the standard Gistemp output (based on the 2010_05 data) and the output from a run restricted to use only stations which have reported at some point in the past five years (i.e. January 2005 onwards).

Similar results have been published recently at Clear Climate Code and by Tamino at “Open Mind”. This post provides further detail of the effect at different latitudes.

This comparison shows that the effect of station dropout on anomaly trends in recent years is negligible. Inclusion of these “dropped out” stations in the Gistemp analysis does however seem to have the effect of cooling past values, and this does seem to impose a warming bias if comparisons are made or trends calculated for the period 1880-2009, 1880-present or relative to “the beginning of the 20th century (1880-1920 mean)”. Looking no further than the draft Hansen et al. paper such comparisons or trend calculations can be found. For this reason I think that I am unable to agree with the simple conclusion at the Clear Climate Code Blog that “The 1990s station dropout does not have a warming effect“. Justification for inclusion of such stations in the analysis may be needed unless discussion is restricted to trends in recent years only.

(June 29: updated global mean anomaly plots to add line at zero anomaly, and added plots of difference between “After 2004” stations and all stations)

(July 01: updating now to include plots for stations no longer included)

(July 05: minor updates thought appropriate as I have not yet carried out more formal tests)

I have a number of other posts at various stages of completion, and in particular my comments on the draft Hansen et al. paper, but other matters, and the long delayed return of my new laptop (or rather the hard disk from my new laptop installed in another new laptop), and the added complications of synchronising work in progress instead of simply transferring files from old to new have delayed these more than I expected. This post, although appearing first, could be considered as an appendix to my comments, yet to appear, on that draft paper.

My choice of stations reporting after 2004 differs from the 1992 choice made at the Clear Climate Code Blog, but in fact very similar results are to be expected: relatively few stations “drop out” between 1992 and 2005. My choice of 2004/2005 was based in part on a view that the most recent five years would be a reasonable interpretation of “currently reporting”, and in part probably also influenced by whatever I had most recently read on this topic (unfortunately I am unsure what that was, so I cannot confirm my suspicion that this may have played a role).

All these results should be read with my disclaimer in mind – until accurate metadata becomes available the rural/urban classification of stations is subject to some doubt, and the validity of UHI/negative UHI adjustments must consequently also be treated with some suspicion. See my soon to be published comments on the draft Hansen et al. paper for further comments on this topic.

My results for the standard Gistemp output may be compared with the official Gistemp results at I have matched the colour key for the time series of zonal means below as closely as possible to the official Gistemp plots for ease of comparison. To obtain the correct official Gistemp plots, select:

  • Sources: (Station data only – 1200km smoothing)
  • Time interval: (end 2009)

A slight visual difference between my plots and the official plots is due to my use of the R filled.contour function, which merges adjacent colour zones by bevelling corners, whereas these corners remain sharp in the official plots. The data plotted, output from STEP3, matches the official output with high precision (I have retained the standard Gistemp integer calculations and rounding here to achieve this match). The averaging for each latitude zone however is not based directly on standard Gistemp code, which together with the smoothing applied by the R filled.contour function may also slightly increase the visual difference. Apart from a small difference with respect to cells with missing data, which I am investigating, the plot below is for practical purposes identical to the official plot.

All stations

All stations

The next plot presents the results for the run which I have inelegantly described as “Station Data (after 2004) only”, being a run restricted to use only stations which have at least some data from 2005 or later.

Stations with data after 2004

Stations with data after 2004

and the difference between these two runs shows small differences after the 1951-1980 anomaly base period, but greater differences further back in the past. In general, when stations without current data are omitted, smaller negative anomalies are found at high northern latitudes and at southern latitudes other than close to the equator, while greater negative anomalies are found close to the equator:

Comparison: All stations minus stations with data after 2004 only

Comparison: All stations minus stations with data after 2004 only

A similar plot for stations with no data after 2004 to complete the picture:

Stations with no data after 2004

Stations with no data after 2004

When all latitudes are averaged to obtain global means, both runs match closely in recent years, but further in the past the overall effect of omitting the stations without current data is to shift the (negative) anomaly values back towards zero. The official Gistemp plot uses a 12 month running mean. I have added a plot using December to November means below as the comparison between the two time series can be seen somewhat more clearly in that case.

Comparison: 12 month running mean

Comparison: 12 month running mean

For the December-November plot I have also included the stations with no data after 2004:

Comparison: December-November means

Comparison: December-November means

Difference (After 2004 only – All stations), with loess fit:

Difference (After 2004 only - All stations), with loess fit

Difference (After 2004 only - All stations), with loess fit

Difference (After 2004 only – Before 2005 only), with loess fit:

Difference (After 2004 only - Before 2005), with loess fit

Difference (After 2004 only - Before 2005), with loess fit

Comparison of “After 2004 only” and “Before 2005 only”, expanded to show earlier years:

Comparison of "After 2004 only" and "Before 2005 only", expanded to show earlier years

Comparison of "After 2004 only" and "Before 2005 only", expanded to show earlier years

Comparison of “All stations”, “After 2004 only” and “Before 2005 only”, expanded to show earlier years:

Comparison of "All stations", "After 2004 only" and "Before 2005 only", expanded to show earlier years

Comparison of "All stations", "After 2004 only" and "Before 2005 only", expanded to show earlier years

Comparison of “All stations”, “After 2004 only” and “Before 2005 only”, expanded to show later years:

Comparison of "All stations", "After 2004 only" and "Before 2005 only", expanded to show later years

Comparison of "All stations", "After 2004 only" and "Before 2005 only", expanded to show later years

The increased variability observed since 1990 for the stations which do not have data in recent years, especially when compared with the almost identical behaviour of “All stations” and “After 2004 only” may seem at first sight anomalous, but, as the station count plot below shows, there are very few such stations with data between 1990 and 2004 when compared with the number of stations with data in these years and also after 2004. This also accounts for the late increase in variability for the difference (After 2004 only – Before 2005 only) plot.

Station counts

Station counts

Why does this difference in anomaly trends in the early years for the two station subsets arise? Probably as a result of different station characteristics in the two subsets.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

10 Responses to The Effect of Station “Dropout” on Gistemp

  1. Nick Barnes says:

    Thanks for this work. I’d like to look into this, but won’t have time for a couple of weeks. Have you got code and data somewhere we can see?

    Can you compute and report trend values? We compute linear trends and R^2, for both whole-series and last-30-years, and I find this helpful when thinking about the effect of an algorithmic change.

    Your final two charts look very similar to ours. Looking at your very last chart only, it seems that the “post-2004” stations were warmer in the late 19th century than the complete set of stations. In other words, that “post-2004” set of stations has experienced less warming than the complete set. Have I understood this correctly? If so, that is the same as our finding (modulo the 1992/2004 difference, etc).

    • oneillp says:

      Code and data availability – in principle yes, but if there is anything more than an occasional request which can be satisfied via e-mail, setting up a new FTP server will probably not be possible before August because of other work and holiday plans. (April was a bad month – as well as having to return a new laptop for replacement of everything other than the hard disk, the 13 year old NT4 desktop which hosted my personal web and FTP servers also died suddenly, having served those 13 years faithfully up to the day it failed, with a reboot only every three or four months)

      Rather than describing my Gistemp port, data and associated utilities here, I’ll add a new post dealing with code and data.

      I do compute and report trend values for individual stations, by default taking trends from 1975 (variable parameter), but I tend to be rather wary of these. The opportunities for tricks cherry-picking end points for trend computation are legion. As all my graphic output is by program generated R scripts, adding and plotting trend lines is only a matter of adding a couple of lines to these scripts later when desired. Now that you mention it, I might generate these lines as part of the initial script, but comment them out for normal use.

      My final two plots are indeed very similar to yours, the different cut-off years having little effect as the station count is relatively stable between these dates. And you have understood correctly, the “post-2004″ set of stations has experienced less warming than the complete set, making comparisons from 1880 or 1900 or thereabouts to the present, based on the complete set, problematic. It might be useful to add a plot comparing the “post-2004” set of stations with the remaining stations, so I will try to add this to the post when I get the chance.

  2. oneillp says:

    In the post I mentioned a small difference with respect to cells with missing data, which I was investigating. The official Gistemp plot does not show the couple of missing data “holes” which are visible in my plot, and which are shown enlarged below. I have now printed out and examined the gridcell values in SBBX1880.Ts.GHCN.CL.PA.1200 from 1880-1890 for these sub-equatorial latitudes, and verified that these three “holes” are genuine. There is insufficient data in the relevant years in the following latitude bands to estimate annual means in the usual way (requiring at least three seasonal means, each requiring at least two monthly values), and in some cases no monthly data whatever.

    • 1881 : -13.89° to -9.21°
    • 1884/85: -9.21° to -4.59°
    • 1888/89: -6.89° to -4.59°

    I have also verified by comparison that there is a perfect match as regards missing monthly data values between the file on the GISS FTP server and the file generated by my program, so these “holes” should also be visible in the official plot, but are not.

  3. Pingback: Code and Data | Peter O'Neill's Blog

  4. oneillp says:

    July 01: I have updated this post to include the time series of anomalies for stations which have no data after 2004 in the graphics, and added a plot of station record counts for the two subsets. (Station records rather than stations, as multiple records remain uncombined after Step1 for a number of stations, and in a few cases one of these records extends past 2004 while the remaining record or records end before 2005).

    The two subset input files, as well as the C# project used to split the Step1 output file Ts.txt, can be downloaded at

  5. Verity Jones says:

    you’ve be really thorough and I am actually shocked by the third figure. The effect on older temperatures of 0.1-0.3 deg C is bad enough but this is ‘average’ and damps the greater effects at other latitudes. I look forward to more.

    Have a great holiday (I hope the weather is kind).

    • oneillp says:

      Not quite so thorough I’m afraid – I forgot to add the station record counts for each year to the supplemental data, something which I have just added now.

      Thanks for drawing attention to the third figure and the effect at different latitudes – this is also something I should have mentioned in greater detail, as the consistency at different latitudes is part of the reason that I do not think statistical uncertainty will be sufficient explanation for the difference between the two subsets.

      • Verity Jones says:

        I’ve been working up to a blog post on this myself. The ‘drop-out’ is only really signicant in specific countries at specific times. I now wonder if I can tie these to the effects you report at specific latitudes. I’ll have another look at it and perhaps send you an email if it seems significant.

      • oneillp says:

        I can also easily make available the corresponding time series for both hemispheres (monthly and annual), and annual for latitude bands 24N-90N, 24S-24N, 90S-24S, 64N-90N, 44N-64N, 24N-44N, EQU-24N, 24S-EQU, 44S-24S, 64S-44S and 90S-64S. These are part of Gistemp output, so I have them already saved. Other bands would be possible, but not until next month.

  6. Ron Broberg says:

    Heh, just saw this.

    1) code? I’d like to take a crack at running it. email is ronbroberg over at that thingee.

    2) ftp? If you like, I can post your code at my ftp service. Your choice.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s