Cooling The Past In Siberia – some supplementary information

Paul Homewood has a post Cooling The Past In Siberia , where I commented. This post adds some further information to my comments, with images which I could not post inline at Paul’s blog, and more detailed information about the adjustment process involved for Krasnojarsk. For a detailed worked example of the Gistemp adjustment process, see my posts: Post 1 of 6 in GISTEMP example and following posts through post 4, and the additional example, Dublin Airport adjustment.

These comments relate in particular to two stations, Krasnojarsk and Bratsk, discussed in Paul’s post. Bratsk, as a consequence of incorrect location metadata used by GISS, is treated by GISS as a rural station which is then available for use in adjusting Krasnojarsk. Another station, Kirensk, although the location used by GISS is close to that reported by the WMO, is changed to urban when the WMO location is used, and has been included below as well. Yet another station, Rubcovsk, regarded as urban by GISS, is reclassified as rural when the WMO location, 8.5 km away, is used instead, and so becomes  available for use when adjusting Krasnojarsk. Both Kirensk and Rubcovsk are however over 900 km distant from Krasnojarsk, and are given much less weight than Bratsk when used in adjustment. I am including them here for completeness rather than because they have much impact.

This post is now complete (delayed rather longer than I would have liked by the arrival of visitors). There are however aspects of this post, and related topics, to which I may still return in future posts.

Station locations in Google Earth

As I commented at Paul’s blog:

GISS uses the GHCN latitude and longitude values, many of which are inaccurate. This may not be a serious problem for GHCN – but it is for Gistemp, where classifying a station as urban or rural by examination of night time illumination is unreliable, to say the least, if you examine a location distant from the true location. Bratsk is only classified as rural by looking at a far distant rural location. The GHCN coordinates for Bratsk are for a location 110 km distant from the coordinates given by the WMO, and even the WMO coordinates are likely to be a bit inaccurate here, as shown on the right in the following image:

Bratsk. WMO and GHCN coordinates

Bratsk. WMO and GHCN coordinates

Reto Ruedy is aware that there are errors, including serious errors, in the GHCN coordinates (“I’m not surprised at all that there are serious mistakes in this inventory file. It has been traditionally treated with less than the proper care”), but told me that “Unfortunately, we don’t have the manpower to check out all entries of that file”. While Hansen et al. (2010) states “Station location in the meteorological data records is provided with a resolution of 0.01 degrees of latitude and longitude, corresponding to a distance of about 1 km. This resolution is useful for investigating urban effects on regional atmospheric temperature”, this claimed resolution is often unreal.

Crosschecking against the WMO data for GHCN stations which are also WMO stations currently shows 34 differences of 100 km or more, 8 between 50 and 100 km, 130 between 10 and 50 km, and 164 between 5 and 10 km, all distances which may be sufficient to give an incorrect urban/rural classification. I’ll add a further comment shortly on the number of stations actually wrongly classified.

The Rubcovsk location is shifted 8.5 km when the WMO rather than GHCN location is used.

Rubcovsk location in Google Earth

Rubcovsk location in Google Earth

The NOAA metadata exists in two versions. I have used the version which is distributed with the ghcnm data files, which contains the same location error for Bratsk as the metadata used by GISS. I suspect that this version and the GISS metadata file have matching errors for other stations as well. Paul points out that there is another version of the NOAA metadata where the location of Bratsk is corrected. I shudder when I encounter conflicting versions simultaneously online.

Analysis with GISS metadata and nightlights

Krasnojarsk. GISS analysis, data to March 2014

Krasnojarsk. GISS analysis, data to March 2014

 

Krasnojarsk_Giss_adjust

Bratsk. GISS analysis, data to March 2014

Bratsk. GISS analysis, data to March 2014

Kirensk. GISS analysis. Data to March 2014

Kirensk. GISS analysis. Data to March 2014

The adjustment process is logged in the Gistemp file PApars.GHCN.CL.1000.20.log, and the relevant section is shown below. (My Gistemp implementation displays this information in more user friendly fashion than the original, adding station names. Russian Federation (Asian) has been shortened to RF (Asian) below to reduce line spillage)
urb stnID:222295700000 # rur: 23 ranges: 1891 2013 1000.
longest rur range: 1880-2004 125 [wgt: 0.337 663.5 km] 222298380000 *[BARNAUL] RF (ASIAN)
add stn 2 range: 1892-2013 118 [wgt: 0.052 948.8 km] 222302300000 [KIRENSK] RF (ASIAN)
data added: 118 overlap: 110 years
add stn 3 range: 1902-2013 108 [wgt: 0.431 568.7 km] 222303090000 *[BRATSK] RF (ASIAN)
data added: 108 overlap: 108 years
add stn 4 range: 1935-2013 79 [wgt: 0.357 643.5 km] 222238840000 [BOR] RF (ASIAN)
data added: 79 overlap: 78 years
add stn 5 range: 1936-2013 78 [wgt: 0.119 881.2 km] 211362080000 *[LENINOGORSK] KAZAKHSTAN
data added: 78 overlap: 78 years
add stn 6 range: 1937-2013 77 [wgt: 0.047 954.4 km] 215442720000 *[ULIASTAI] MONGOLIA
data added: 77 overlap: 77 years
add stn 7 range: 1937-2013 77 [wgt: 0.107 893.9 km] 215442180000 *[HOVD] MONGOLIA
data added: 77 overlap: 77 years
add stn 8 range: 1941-2013 73 [wgt: 0.133 867.8 km] 215442310000 *[MUREN] MONGOLIA
data added: 73 overlap: 73 years
add stn 9 range: 1938-2013 65 [wgt: 0.023 977.8 km] 205510760000 [ALTAY] CHINA
data added: 65 overlap: 65 years
add stn 10 range: 1933-2013 65 [wgt: 0.267 733.6 km] 222249080000 [VANAVARA] RF (ASIAN)
data added: 65 overlap: 65 years
add stn 11 range: 1935-2013 64 [wgt: 0.333 666.9 km] 222238910000 [BAJKIT] RF (ASIAN)
data added: 64 overlap: 64 years
add stn 12 range: 1938-2013 60 [wgt: 0.207 793.6 km] 222305210000 *[ZIGALOVO] RF (ASIAN)
data added: 60 overlap: 60 years
add stn 13 range: 1961-2013 53 [wgt: 0.010 990.5 km] 215442320000 [HUTAG] MONGOLIA
data added: 53 overlap: 53 years
add stn 14 range: 1959-2012 53 [wgt: 0.187 813.9 km] 215442140000 *[UIGI] MONGOLIA
data added: 53 overlap: 53 years
add stn 15 range: 1964-2013 50 [wgt: 0.107 893.4 km] 215442250000 [TOSONTSENGEL] MONGOLIA
data added: 50 overlap: 50 years
add stn 16 range: 1963-2013 47 [wgt: 0.059 941.9 km] 215442300000 [TARIALAN] MONGOLIA
data added: 47 overlap: 47 years
add stn 17 range: 1962-2013 46 [wgt: 0.291 709.1 km] 215442130000 [BARUUNTURUUN] MONGOLIA
data added: 46 overlap: 46 years
add stn 18 range: 1963-2013 46 [wgt: 0.207 793.8 km] 215442070000 [HATGAL] MONGOLIA
data added: 46 overlap: 46 years
add stn 19 range: 1963-2013 45 [wgt: 0.218 782.4 km] 215442150000 [OMNO-GOBI] MONGOLIA
data added: 45 overlap: 45 years
add stn 20 range: 1943-1981 38 [wgt: 0.309 691.4 km] 215442120010 *[BAYAN-OL] MONGOLIA
data added: 38 overlap: 38 years
add stn 21 range: 1975-2013 35 [wgt: 0.290 710.1 km] 215442030000 [RINCHINLHUMBE] MONGOLIA
data added: 35 overlap: 35 years
add stn 22 range: 1958-1990 32 [wgt: 0.020 981.1 km] 205510530000 [KABA HE] CHINA
data added: 32 overlap: 32 years
add stn 23 range: 1962-1983 22 [wgt: 0.259 741.3 km] 215442130010 [BAYAN UUL, DZAVHAN] MONGOLIA
data added: 22 overlap: 22 years
possible range increase 47 106 112

Bratsk:

Bratsk in GISS analysis.

Bratsk in GISS analysis.

Kirensk:

Kirensk. GISS analysis. Data to March 2014

Kirensk. GISS analysis. Data to March 2014

Rubcovsk:

Rubcovsk in GISS analysis

Rubcovsk in GISS analysis

Rubcovsk adjustment in GISS analysis

Rubcovsk adjustment in GISS analysis

urb stnID:222360340000 # rur:   4 ranges: 1936 1989     500.
longest rur range: 1880-2004   125  [wgt: 0.475  262.7 km] 222298380000 *[BARNAUL] RUSSIAN FEDERATION (ASIAN)
add stn    2 range: 1894-2013   107 [wgt: 0.375  312.3 km] 211365350000  [KOKPEKTY] KAZAKHSTAN
data added:  107  overlap: 98  years
add stn    3 range: 1936-2013    78 [wgt: 0.570  214.9 km] 211362080000 *[LENINOGORSK] KAZAKHSTAN
data added:  78  overlap: 78  years
add stn    4 range: 1938-2013    76 [wgt: 0.119  440.4 km] 222298070000  [IRTYSSK] RUSSIAN FEDERATION (ASIAN)
data added:  76  overlap: 76  years
possible range increase 27 54 54

Analysis with corrected metadata and nightlights

In this case the WMO locations for the various stations have been used instead of those in the GHCN/GISS metadata, and revised nighttime illumination determined.

Krasnojarsk. Corrected analysis

Krasnojarsk. Corrected analysis

Krasnojarsk. Adjustment in corrected analysis

Krasnojarsk. Adjustment in corrected analysis

urb stnID:222295700000 # rur:  22 ranges: 1891 2013    1000.
longest rur range: 1880-2004   125  [wgt: 0.337  663.5 km] 222298380000 *[BARNAUL] RUSSIAN FEDERATION (ASIAN)
add stn    2 range: 1935-2013    79 [wgt: 0.357  643.5 km] 222238840000  [BOR] RF (ASIAN)
data added:  79  overlap: 70  years
add stn    3 range: 1936-2013    78 [wgt: 0.119  881.2 km] 211362080000 *[LENINOGORSK] KAZAKHSTAN
data added:  78  overlap: 78  years
add stn    4 range: 1937-2013    77 [wgt: 0.047  954.4 km] 215442720000 *[ULIASTAI] MONGOLIA
data added:  77  overlap: 77  years
add stn    5 range: 1937-2013    77 [wgt: 0.107  893.9 km] 215442180000 *[HOVD] MONGOLIA
data added:  77  overlap: 77  years
add stn    6 range: 1941-2013    73 [wgt: 0.133  867.8 km] 215442310000 *[MUREN] MONGOLIA
data added:  73  overlap: 73  years
add stn    7 range: 1938-2013    65 [wgt: 0.023  977.8 km] 205510760000  [ALTAY] CHINA
data added:  65  overlap: 65  years
add stn    8 range: 1933-2013    65 [wgt: 0.267  733.6 km] 222249080000  [VANAVARA] RUSSIAN FEDERATION (ASIAN)
data added:  65  overlap: 65  years
add stn    9 range: 1935-2013    64 [wgt: 0.333  666.9 km] 222238910000  [BAJKIT] RUSSIAN FEDERATION (ASIAN)
data added:  64  overlap: 64  years
add stn   10 range: 1938-2013    60 [wgt: 0.207  793.6 km] 222305210000 *[ZIGALOVO] RUSSIAN FEDERATION (ASIAN)
data added:  60  overlap: 60  years
add stn   11 range: 1936-1989    54 [wgt: 0.097  904.2 km] 222360340000 *[RUBCOVSK] RUSSIAN FEDERATION (ASIAN)
data added:  54  overlap: 54  years
add stn   12 range: 1961-2013    53 [wgt: 0.010  990.5 km] 215442320000  [HUTAG] MONGOLIA
data added:  53  overlap: 53  years
add stn   13 range: 1959-2012    53 [wgt: 0.187  813.9 km] 215442140000 *[UIGI] MONGOLIA
data added:  53  overlap: 53  years
add stn   14 range: 1964-2013    50 [wgt: 0.110  890.6 km] 215442250000  [TOSONTSENGEL] MONGOLIA
data added:  50  overlap: 50  years
add stn   15 range: 1963-2013    47 [wgt: 0.059  941.9 km] 215442300000  [TARIALAN] MONGOLIA
data added:  47  overlap: 47  years
add stn   16 range: 1962-2013    46 [wgt: 0.291  709.1 km] 215442130000  [BARUUNTURUUN] MONGOLIA
data added:  46  overlap: 46  years
add stn   17 range: 1963-2013    46 [wgt: 0.207  793.8 km] 215442070000  [HATGAL] MONGOLIA
data added:  46  overlap: 46  years
add stn   18 range: 1963-2013    45 [wgt: 0.218  782.4 km] 215442150000  [OMNO-GOBI] MONGOLIA
data added:  45  overlap: 45  years
add stn   19 range: 1943-1981    38 [wgt: 0.309  691.4 km] 215442120010 *[MONGOLIAN STATION,BAYAN-OL] MONGOLIA
data added:  38  overlap: 38  years
add stn   20 range: 1975-2013    35 [wgt: 0.290  710.1 km] 215442030000  [RINCHINLHUMBE] MONGOLIA
data added:  35  overlap: 35  years
add stn   21 range: 1958-1990    32 [wgt: 0.020  981.1 km] 205510530000  [KABA HE] CHINA
data added:  32  overlap: 32  years
add stn   22 range: 1962-1983    22 [wgt: 0.259  741.3 km] 215442130010  [BAYAN UUL, DZAVHAN] MONGOLIA
data added:  22  overlap: 22  years
possible range increase 37 77 79

Bratsk, now classified as urban rather than rural, is itself adjusted and no longer adjusts Krasnojarsk:

Bratsk. Corrected analysis

Bratsk. Corrected analysis

Bratsk. Adjustment in corrected analysis

Bratsk. Adjustment in corrected analysis

urb stnID:222303090000 # rur:   3 ranges: 1902 2013     500.
longest rur range: 1933-2013    65  [wgt: 0.108  446.0 km] 222249080000  [VANAVARA] RUSSIAN FEDERATION (ASIAN)
add stn    2 range: 1938-2013    60 [wgt: 0.457  271.3 km] 222305210000 *[ZIGALOVO] RUSSIAN FEDERATION (ASIAN)
data added:  60  overlap: 59  years
add stn    3 range: 1952-2013    47 [wgt: 0.025  487.6 km] 222304330000  [NIZNEANGARSK] RUSSIAN FEDERATION (ASIAN)
data added:  47  overlap: 47  years
possible range increase 7 46 62

Similarly, Kirensk, now classified as urban rather than rural, is itself adjusted and no longer adjusts Krasnojarsk:

Kirensk. Corrected analysis

Kirensk. Corrected analysis

Kirensk. Adjustment in corrected analysis

Kirensk. Adjustment in corrected analysis

urb stnID:222302300000 # rur:   8 ranges: 1892 2013     500.
longest rur range: 1900-2013    91  [wgt: 0.047  476.7 km] 222306360000  [BARGUZIN] RUSSIAN FEDERATION (ASIAN)
add stn    2 range: 1930-2013    76 [wgt: 0.356  321.8 km] 222300540000  [VITIM] RUSSIAN FEDERATION (ASIAN)
data added:  76  overlap: 68  years
add stn    3 range: 1936-2013    76 [wgt: 0.221  389.5 km] 222248170000  [ERBOGACEN] RUSSIAN FEDERATION (ASIAN)
data added:  76  overlap: 68  years
add stn    4 range: 1933-2013    65 [wgt: 0.135  432.8 km] 222249080000  [VANAVARA] RUSSIAN FEDERATION (ASIAN)
data added:  65  overlap: 65  years
add stn    5 range: 1938-2013    60 [wgt: 0.243  378.7 km] 222305210000 *[ZIGALOVO] RUSSIAN FEDERATION (ASIAN)
data added:  60  overlap: 60  years
add stn    6 range: 1951-2004    52 [wgt: 0.015  492.8 km] 222306350000  [UST'-BARGUZIN] RUSSIAN FEDERATION (ASIAN)
data added:  52  overlap: 52  years
add stn    7 range: 1938-1989    52 [wgt: 0.057  471.5 km] 222305550010  [TROICKIJ PRIISK] RUSSIAN FEDERATION (ASIAN)
data added:  52  overlap: 52  years
add stn    8 range: 1952-2013    47 [wgt: 0.519  240.5 km] 222304330000  [NIZNEANGARSK] RUSSIAN FEDERATION (ASIAN)
data added:  47  overlap: 47  years
possible range increase 26 71 81

And finally Rubcovsk has become a rural station, available to adjust Krasnojarsk

Rubcovsk. Corrected analysis

Rubcovsk. Corrected analysis

Analysis with GISS metadata before nightlights

Anticipating the discussion, the GISS analysis for data up to October 2009, using the GHCN R/S/U (rural/periurban/urban) coding prior to the introduction of worldwide classification by nighttime illumination, shows some major differences. I have not yet now (see below) extracted and plotted the station records from the GISS files I have archived, but here is the adjustment log (original GISS version – the added detail such as distances, weights, country codes and station names were added by my more user friendly Gistemp implementation). I have added just the station names and countries.

All the stations marked ‘*’ before the station names in the adjustment logs above have now disappeared – these were coded as periurban or urban by NOAA/GHCN, and so not available as adjusting stations for Gistemp in 2009. Two further stations, Bogucany and Aleksandrovsk, now classified as urban by nighttime illumination, were classified as rural and used for adjustment.

urb stnID:295700006 # rur: 1 ranges: 1891 2009 500.
longest rur range: 1931-2009 79 292820004
trying full radius 1000.000000
urb stnID:295700006 # rur: 15 ranges: 1891 2009 1000.
longest rur range: 1892-2009 115 302300008  [KIRENSK] RF (ASIAN)
add stn 2 range: 1931-2009 79 292820004     [BOGUCANY] RF (ASIAN)
data added: 79 overlap: 77 years
add stn 3 range: 1935-2009 75 238840007     [BOR] RF (ASIAN)
data added: 75 overlap: 75 years
add stn 4 range: 1932-2009 61 239550000     [ALEKSANDROVSK] RF (ASIAN)
data added: 61 overlap: 61 years
add stn 5 range: 1938-2009 60 510760002     [ALTAY] CHINA
data added: 60 overlap: 60 years
add stn 6 range: 1933-2009 59 249080000     [VANAVARA] RF (ASIAN)
data added: 59 overlap: 59 years
add stn 7 range: 1935-2009 58 238910000     [BAJKIT] RF (ASIAN)
data added: 58 overlap: 58 years
add stn 8 range: 1958-1990 33 510530000     [KABA HE] CHINA
data added: 33 overlap: 33 years
add stn 9 range: 1961-1983 23 442320000     [HUTAG] MONGOLIA
data added: 23 overlap: 23 years
add stn 10 range: 1961-1983 23 442130000    [BARUUNTURUUN] MONGOLIA
data added: 23 overlap: 23 years
add stn 11 range: 1962-1983 22 442130010    [BAYAN UUL, DZAVHAN] MONGOLIA
data added: 22 overlap: 22 years
add stn 12 range: 1963-1983 21 442300000    [TARIALAN] MONGOLIA
data added: 21 overlap: 21 years
add stn 13 range: 1963-1983 21 442150000    [OMNO-GOBI] MONGOLIA
data added: 21 overlap: 21 years
add stn 14 range: 1963-1983 21 442070000    [HATGAL] MONGOLIA
data added: 21 overlap: 21 years
add stn 15 range: 1964-1983 20 442250000    [TOSONTSENGEL] MONGOLIA
data added: 20 overlap: 20 years
possible range increase 38 77 78

Krasnojarsk. GISS output from 2009, data to October. No nightlights used

Krasnojarsk. GISS output from 2009, data to October. No nightlights used

Krasnojarsk adjustment. GISS output from 2009, data to October. No nightlights used

Krasnojarsk adjustment. GISS output from 2009, data to October. No nightlights used

As my Gistemp implementation also allows stations to be listed as excluded from the adjustment process, I have also run a current analysis, first excluding the 9 stations Barnaul, Leninogorsk, Uliastai, Hovd, Muren, Zigalovo, Rubcovsk, Uigi and Bayan-ol which were classified as urban by GHCN:

Krasnojarsk. Corrected analysis, but removing the stations classified as urban by GHCN

Krasnojarsk. Corrected analysis, but removing the stations classified as urban by GHCN

Krasnojarsk adjustment. Corrected analysis, but removing the stations classified as urban by GHCN

Krasnojarsk adjustment. Corrected analysis, but removing the stations classified as urban by GHCN

and then also adding back the three stations Kirensk, Bogucany and Aleksandrovsk which were classified as rural by GHCN but excluded as urban once the nightlights criterion was applied.

Krasnojarsk. Corrected analysis, but removing the stations classified as urban by GHCN, adding the stations classified as rural by GHCN

Krasnojarsk. Corrected analysis, but removing the stations classified as urban by GHCN, adding the stations classified as rural by GHCN

Krasnojarsk adjustment. Corrected analysis, but removing the stations classified as urban by GHCN, adding the stations classified as rural by GHCN

Krasnojarsk adjustment. Corrected analysis, but removing the stations classified as urban by GHCN, adding the stations classified as rural by GHCN

(I have omitted the adjustment logs for these last two analyses. If anyone feels that they are needed, please post a comment and I will add them).

Discussion

The most striking aspect of this Krasnojarsk adjustment study is the variation in the trend imposed on the urban record by the different choices made for adjusting rural stations. This is not unique to Krasnojarsk, but the number of stations being reclassified by the different choices is greater here than for other stations which I have examined. This already suggests to me that it might be worthwhile examining all stations adjusted to see how many stations move in and out of the set of rural adjusting stations for the different choices involved, and to see if these changes are in any way geographically clustered (I expect that US stations, for which the metadata is more reliable, will show few changes).

°C/Century
Urban/rural classification Trend (full record) Trend (final 30 years)
Raw (GHCN adjusted: ghcnm.tavg.v3.2.2.20140411.qca.dat) 0.45 3.68
Raw (2009: GHCN v2) 0.69 4.57
GHCN classification (2009) 0.48 5.35
GISS nightlights classification 1.32 4.13
My nightlights classification 0.79 5.26
(removing GHCN non-rural) 0.37 5.50
(removing GHCN non-rural, adding 3) 0.57 5.18

Changes in the identification of urban stations and in the mix of rural adjusting stations arise in two ways:

  • Due to the extension worldwide by GISS of urban/rural classification by examination of nighttime luminance in 2010
  • Due to the replacement of suspect latitude/longitude data by corrected values.

These will now be discussed separately.

Extension worldwide of urban/rural classification by examination of nighttime luminance

Hansen et al. (2010) extends the use of satellite-observed nightlights worldwide to identify measurement stations as rural, peri-urban or urban, but fail to discuss the impact of this change on the number of stations classified as rural or urban. The magnitude of this change can be seen in Table 1:

Stations STEP2 (input) Rural Urban STEP2 (used) Rural Urban
Prior analysis 7281 3061 4220 6308 2508 3800
Nightlights 7281 3767 3514 6308 3125 3183
Change +706 -706 +617 -617

GISS analysis station classification. Eighty three stations with records ending before 1880, or marked as strange, are dropped before STEP2, the UHI adjustment step.

The distinction between urban and peri-urban stations is ignored, both being regarded here as urban, as the UHI adjustment described in this paper treats both identically. The data used for analysis here is that for the August 2010 GISS analysis of global surface temperature change, up to and including July 2010.

It seems strange that Hansen et al 2010 made no reference to the extent of the urban/rural reclassification resulting from their worldwide extension of the use of nighttime luminance, and is content to claim that

Station location in the meteorological data records is provided with a resolution of 0.01 degrees of latitude and longitude, corresponding to a distance of about 1 km.

although Reto Ruedy at least was aware prior to publication that this was not strictly true, belatedly confirming, on August 23rd 2010, coincidentally the same date that Hansen et al 2010 was accepted for publication by Reviews of Geophysics, that

I’m not surprised at all that there are serious mistakes in this inventory file. It has been traditionally treated with less than the proper care; e.g. it took years after I notified them until they fixed the error of systematically dropping the 1000s in all altitudes.

It should be evident that an adjustment procedure for urban stations based on nearby rural stations requires an ability to correctly identify urban and rural stations. Each of the reclassified stations noted above must be incorrectly classified in one or the other of the two classifications, and we do not know which. Other stations which were not reclassified may potentially be incorrectly classified under both classifications. A large number of stations were reclassified, and there appears to be no evidence in Hansen et al 2010 to show that this classification was improved by the worldwide extension of the use of nighttime luminance. Certainly some at least of the stations reclassified were incorrectly reclassified as a result of incorrect latitude and longitude values (see next section). It is possible indeed that the worldwide extension of the use of nighttime luminance could have introduced more classification errors than it corrected. The original classification on the basis of population could possibly be more accurate. We do not know, and this is a question which should have been addressed by the authors, but was not.

This is also a question which I had raised with Reto Ruedy in December 2009, before the implementation of the change (I had spotted a trial run on the GISS FTP site, a transparency which I regret to say has since been reversed), and before the appearance of the first draft of Hansen et al 2010 with an invitation for comments. It may be worth showing that initial December 2009 e-mail on this topic here to show that this was no mere incidental comment, but explicitly drew attention to the possible results of using the GHCN metadata for station classification:

I have some comments on efforts to use the Global Lights for urban/rural classification, and the quality of the data you are trying to use for this. I have looked at the data for a few Irish locations, and a few European locations I would be familiar with (I use location rather than station here as the latitude/longitude values are so coarse that it makes little sense to talk of stations if these values are being used to locate lighting for these locations. For example the two Cherbourg stations:

61507039001 CHERBOURG/CHANTEREYNE FRAN      49.70   -1.60   12    8S   31HIxxCO 1x-9COASTAL EDGES   B    7
61507039002 CHERBOURG-MAUPERTUS             49.70   -1.50  139   12S   31HIxxCO 2A 5WATER           A    0

Are both well out to sea based on these values, rather than at the port and the airport.

So here are some of the classification issues I have noted for a small subset of locations, and I am rather sure that such issues will arise again and again elsewhere.

Shannon Airport becomes ceases to be classified as rural when lights are used, which is as it should be, considering that this is an airport which has had considerable industry located nearby for many years. Lights classification makes sense here.

62103962000 SHANNON AIRPO                   52.70   -8.92   20   14R   -9FLxxCO 1A-9WARM CROPS      B   12

Fort William (Scotland) changes from rural to urban, while Ben Nevis remains rural. At first sight this might seem to make sense, until you notice that the GNCN data for these stations only covers the period from 1884-1903, at which time Fort William was fairly certainly still rural. Lights classification fails (These two stations are of interest as they are within 7 km of each other, but vertically separated by more than 1300 m, and provided early reliable comparison of conditions at altitude with those at sea level. But I wonder why they came to be included in the GHCN data, particularly in view of the otherwise surprisingly poor Scottish representation)

65103038000 FORT WILLIAM                    56.83   -5.10   20  229R   -9MVxxCO 1x-9WARM GRASS/SHRUBC   13
65103038001 BEN NEVIS           UK          56.80   -5.10 1343  229R   -9MVxxCO 1x-9WARM GRASS/SHRUBB    0

I’ve travelled to France a few times on the ferry from Rosslare to Cherbourg. Using lights, Rosslare changes from rural to urban, while both Cherbourg stations change from urban to rural. Compare the surroundings on the maps pasted below.

62103957000 ROSSLARE                        52.25   -6.33   25    2R   -9FLxxCO 1x-9WATER           B   11

Bilbao changing from urban to rural is a surprise

64308025000 BILBAO              SPAIN       43.30   -2.80   16  378U  450HIxxCO 6A 2WARM DECIDUOUS  C   10

As is York

65103355001 YORK                UK          53.90   -1.10 -999   22U  102FLxxno-9x-9HEATHS, MOORS   A    7

It looks to me as if changing to global lights is likely to introduce as many new misclassified stations as it corrects, and that the only way this classification can really be improved is by actually examining each location, painful as that exercise may be.

I’ve added an obvious extra line to PApars.GHCN.CL.1000.20.log which you might find useful when investigating this classification issue:

all 3515 0.826953342816498 0.125331076856742
urb warm 1939 5.09911191335741 0.75478098233507
urb cool 1557 4.48326075786769 0.701657234759009

and also suggested summarising the urban cooling effect in their output as well as the urban warming effect already summarised (bold and red in the original e-mail – I already suspected that the classification of rural/urban stations might also affect the balance between urban warming and urban cooling effects). Hansen et al 2010 suggests/notes that:

The urban influence on long‐term global temperature
change is generally found to be small. It is possible that the
overall small urban effect is, in part, a consequence of partial
cancellation of urban warming and urban cooling effects.

It seems premature to draw the conclusion that “The urban influence on long‐term global temperature change is generally found to be small” before a reliable classification of urban/rural has been established. The overall small urban effect may indeed, in part, be a consequence of partial cancellation of urban warming and urban cooling effects, and as shown by my added line above these two effects are quite similar, and do indeed partly cancel each other.  Is this still the case if latitude/longitude errors are corrected? Lacking a complete set of corrected latitude/longitude values I cannot offer a definitive answer to this question. I have so far collected “corrected” values for about half the station inventory (“corrected” in the sense that the sources of these corrections may themselves contain some errors, but mostly come from the meteorological services operating the the stations in question, rather than some third party). In the context of GHCN data this of course may introduce a new error: the most recent location known to the owning meteorological service may not be the location corresponding to the most recent GHCN data where, for whatever reason, GHCN no longer collects current data. There may have been a subsequent station relocation. There may also be other issues in relation to GHCN data collection, which I will describe soon in another post in the specific context of Irish stations. As it is rather unlikely that there has been any conspiracy to distort the Irish record, I would suspect that similar problems can be found if other countries are examined. It just happens that being Irish I happen to have the corresponding data from the Irish meteorological service to hand for comparison purposes, and have from time to time made such comparisons. I suggest that you make a similar comparison for your own country, and this may well show similar problems.

When the change to worldwide use of nighttime luminance went online I followed up with a further list of substantial errors which could be found with just a few minutes work, and, in April 2010, in response to James Hansen’s invitation for comments on the draft Hansen et al 2010, I sent similar comments directly to James Hansen, including the comment:

As it is highly unlikely that I found all the gross location errors in v2.inv in just a few minutes, or that the relatively few airports I viewed in Google Earth just happened to include all those with incorrect coordinates, these and all other such errors in v2.inv need to be corrected before nightlights can provide the objective approach you suggest.

receiving the response “Peter, thanks much — will look into this soon.  Jim”.

As Hansen et al 2010 still appeared with the misleading claim that “Station location in the meteorological data records is provided with a resolution of 0.01 degrees” (misleading in that it is true that the resolution provided is indeed 0.01 degrees. but this is an unreal resolution, divorced from the actual accuracy of the data). I find it difficult to reach any other conclusion than that this is an issue which which GISS would prefer not to address. Accordingly, while I have continued to accumulate further corrected data, I have refrained from disturbing GISS with unwelcome corrections.

However strange it may seen that GISS failed to address this issue, it should seem even stranger that no peer reviewer thought fit to demand that this omission be addressed. I am however less than impressed by peer review at AGU journals, with the honourable exception of Water Resources Research, at least during the period when I was a regular reader, or indeed by the AGU itself. I doubt for example that Michael Mann’s two smoothing papers, [2004 and 2008, another blog post to justify this negative comment following as you may suspect] would have passed peer review in any statistical journal. I have yet to see a satisfactory account of the provenance of the forged “Heartland” document circulated by Peter Gleick, then chair of the AGU Ethics committee, textual analysis of which lead to his outing at the identity thief involved, and so am less than impressed by his subsequent rehabilitation. I am of course open to consideration of any serious explanation of the provenance of that document, but please refrain from linking to any material from PR outfits with dubious financial backing.

One of the positive advantages of writing a blog post is that wandering into anecdote is permitted. I can thank the shortcomings of an article in a geophysical journal for my first job offer after graduation (unfortunately, 47 years later, I cannot recall which – if calculation of geophysical resistivity curves by a Mexican and US author pair, circa 1965, rings a bell, please post below. It may be totally irrelevant here, and now, but it would be nice to be reminded of the exact reference). As a trainee/stagiaire/Praktikant (choose one – it was a non English speaking environment) I was given the supplementary information for the article to work with. This was unfortunately typeset rather than offset printed, which at that time would have been the sensible choice., and may well have been produced by the authors’ university rather than by the journal. In any case, the tabulated resistivity curves were riddled with typesetting errors. Even at that time my training was first to examine that data for fitness for purpose rather than simply thinking that fitness for purpose was a matter for someone else. My objection to using the data as published was endorsed further up the company hierarchy, and I was assigned the task of first eliminating the many typesetting errors, replacing them by interpolated values (time constraints and the computer resources then available ruled out replication of the original calculations). That attitude, and another similar detection of faulty data before further use, played, I was told, a significant part in the decision to offer me a permanent post shortly afterwards, a month after I started work as a summer trainee. I regret to say that a willingness to use data without questioning the fitness for purpose of that data does not seem to be confined to this specific use of GHCN metadata by GISS – it has also appeared elsewhere in the literature I have examined. It should also be noted that fitness for purpose of course depends on purpose. The metadata collected by GHCN for the purposes of GHCN may be fully fit for those purposes. It is when it is used elsewhere for purposes which were not initially envisaged by the collecting agency that problems may arise. The location errors in the GHCN metadata may not matter for the use of that data by GHCN (or may matter – this is not a question I have examined), but it certainly matters when GISS uses that same data for urban/rural classification. As GHCN had not collected that data for that purpose, the onus is on GISS to ensure that the data is fit for this purpose, and to correct it where necessary. The excuse that “Unfortunately, we don’t have the manpower to check out all entries of that file” does not impress.

Replacement of suspect latitude/longitude data by corrected values

In the last section I considered the effect of the extension of the urban/rural reclassification resulting from the worldwide extension of the use of nighttime luminance, and touched on the question of the validity of the latitude/longitude metadata provided by GHCN for this purpose. While GISS uses the excuse that “Unfortunately, we don’t have the manpower to check out all entries of that file” to avoid any checking of the entries in that file, it should not be beyond even the “limited” manpower of GISS to make a start on that process. Roughly half the stations in the GHCN inventory are also WMO stations. The WMO is collecting higher precision location metadata for these stations. See WMO programme. Not all stations have revised coordinates yet, and there have been errors which I have spotted, generally an offset of one or more degrees. Those which I have notified to the WMO have been promptly corrected (US agencies “please note”), but coordinates where the error is in minutes or seconds are obviously also possible, but harder to detect. On one occasion the WMO published updates on their website with a number of coordinates such as “12 02 88E” and “56 29 60N”, suggesting an automatic process with inadequate validity checking (to spare the blushes of the national meteorological agencies involved I have chosen latitude and longitude values here from different countries). Again, the erroneous values were immediately corrected by the WMO when I brought these errors to their attention. The fact that such erroneous coordinates entered the system at all points to weaknesses at some point(s) in the data collection process.

The second obvious tool available for location checking is visual checking in Google Earth or other similar mapping program. Land based stations located out at sea, or airport stations located distant from the relevant airport, are obvious candidates for correction. This approach is obviously more manpower intensive.

Earlier posts on this blog have documented such location errors, and I do not propose to cover the same ground here. If my “corrected” coordinates, for somewhere between a quarter (if only stations for which higher precision coordinates have already been submitted to the WMO are considered) and a half (if all WMO coordinates are used to replace GHCN coordinates even if only lower precision coordinates are available) of the stations in the GHCN inventory are correct, a revised Gistemp analysis would suggest that the revised UHI contribution may potentially double the contribution claimed by GISS. Even if some of these revised coordinates are erroneous, this would still indicate that the sensitivity of this urban contribution to station location changes needs to be considered.

As almost all the station locations corrected are also WMO stations it is possible, if unlikely, that the quality of location metadata for the non-WMO stations is superior to that for the WMO stations, or that corrections to the metadata for these stations may affect the urban contribution differently, possibly even cancelling the increased contribution noted for corrected WMO stations. For now, until sufficient corrected location metadata becomes available, it is only possible to suggest that the urban contribution may be underestimated, and that there is an obvious need for accurate metadata.

Hansen et all 2010 may report that “The effect of urban adjustment on global temperature change is only of the order of 0.01°C for either night light or population adjustment”, but this may also result from adjustments which do not reflect the true urban/rural classification of the stations used.

Writing this post has suggested further aspects for investigation, which may lead to future posts, either specifically related to these Siberian stations, or more widely based.

A final point which I will shoehorn into this post is that the night light satellite data  obtained from Marc L. Imhoff and used by GISS seems to be a version which I understand is deprecated. I have used the F16 radiance calibrated image instead. If radiance contours for the night light image used by GISS are examined some very improbable artifacts will be found, leading to dubious classification of some stations.

 

Advertisements
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

9 Responses to Cooling The Past In Siberia – some supplementary information

  1. Pingback: For Crickham- Critique of GISS Urbanization Adjustments - US Message Board - Political Discussion Forum

  2. Pingback: Cooling The Past In Siberia – Update | NOT A LOT OF PEOPLE KNOW THAT

  3. Robin says:

    Peter, I too look at climate data series using numerical and graphical methods, and am thus very interested in your work. All the (obviously very important) work on correcting published series is way outside my degree of both patience and competence, but I do know about examining data series, and I would be very interested in looking at the finalised (i.e. corrected) data that you have used in your current assessment.

    Something that always puzzles me is the way in which regression data are reported. Typically a (slope) coefficient and R-Squared seem to thought of as being informative enough. I take a different view. The software that I use is stuff I’ve written, and in fact sold around a thousand copies to people/organisations ranging from private individuals to universities such as Oxford and Cambridge, undergraduates and PhD students, government establishments like the signals people at Malvern, and many copies to the NHS – who often were more than somewhat slow in paying! Thus it is pretty solid stuff,which I use virtually daily. Having developed it originally when I was still working in research it had to get correct answers, and ones that that gave a comprehensive overview of, for example, regression lines. Thus I always calculate and report the standard errors of coefficients together with their t values and confidence intervals. When I plot the results of a simple regression (single predictor variables or powers of them) I always display the confidence intervals for the regression line and for a further datum from the same series. These intervals are of course pairs of hyperbolae for a single variable regression. I always wonder why other people don’t do this. For me, r-squared (or preferably r-squared adjusted for degrees of freedom) is not very informative because it gives me little feel about how adequate the regression is in practical terms.

    I also wonder about the ethics of trying to model climate data using standard linear regression over data sets that cover extensive time periods when simple plotting often shows that the numbers are far from a straight line, and may indeed oscillate as many climate indexes do. I would want to believe in the practical value of the calculated coefficients only if their realm was associated with clearly linear portions of the data plot. Working on this basis for around 20 years I have formed a strong opinion that climate may proceed via remarkably steady periods (in effect no or very little change) punctuated by short periods of very rapid change, often so rapid as to be deemed a step change. I could show you plenty of examples if I could send some graphics, but only know how to that by email, not via a blog :-((

    If you are interested in this and would be prepared to send me some of your finalised data sets – the ones you have displayed – I could look at them and then explain further the full methods that I use. It would be a bit lengthy to set it all out here.

    Hope to hear from you ere long.

    Best wishes, Robin Edwards, Bromsgrove

    • I agree r-squared is not particularly informative. I included it in my station graphs at one stage along with the trend over the entire record, to allow comparison with another set of graphs (whose I cannot recall at this stage), and added the trend over the final n years of the record, with n defaulting to 30 years, in recognition of the fact that simple linear regression over the entire record may be inadequate. Rather than handcrafting individual graphs my Gistemp implementation allows me generate a set of graphs for a chosen station or gridcell, not just the temperature and adjustment graphs shown in this post, but also temperature anomalies, and the records for the adjusting stations used for example. So unless there is a reason to be selective, such as something more important being obscured by something less important such as those r-squared values, I generally just output everything generated.

      And yes, in other work I would also report standard errors, etc. I could of course have also generated an envelope for the regression, but the aim here has been to compare the actual raw and adjusted station records, not to model the data using linear regression. The regression lines are merely added to illustrate the often quoted rates of temperature change.

      As you say, the numbers are often far from a straight line. For my generated gridcell graphs for example I have chosen to include a loess smooth rather than a linear regression:
      Gridcell within which Dublin is located
      The other gridcells covering Ireland are similar. You may find it informative to compare this with a recent claim by our darling of the local media, self-styled “specialist environmental writer and commentator”, and unfortunately taken seriously by the same local media, member of the An Taisce (Irish National Trust) Climate Change committee, John Gibbons:

      In the 20 years since the Earth Summit, Ireland’s average temperature has increased by 0.75C, exactly in line with a projected 4C calamity this century.

      Mr Gibbons is simply repeating here his earlier exercise in reading miscomprehension performed on the report in the Irish Times (May 31st 2012, page 3) of the announcement by the Irish Met Service of the 1981-2010 Normals. The issue is pay-walled, but the relevant text reads:

      The long-term average temperatures of the period 1981-2010 show a 0.5 degree increase in comparison to 1961-1990, which is the baseline for measuring climate averages.
      However, when the overlapping years between 1981-1990 are discounted, the difference between the periods 1961-1980 and 1991-2010 is 0.75 degrees.

      The arithmetic is correct, although the report might not be regarded as a model of clarity, and indeed starts by making the same error. However, a “specialist environmental writer and commentator” might be expected to read further than the first sentence and to notice that the interval referred to is not the 20 years since 1992.

      I’ll send you the R scripts generated for the station graphics used in this post if these would be of use to you, I just need to rerun them. They will generate more than just the graphs displayed, and a little more information on those displayed which I suppressed for clarity – GISS uses either a simple straight line or a broken straight line with a “knee” in the adjustment process, and I have started to look at the possibility of replacing these with a loess smooth of the same difference between the urban and combined rural adjusting records.

      • Robin says:

        Thanks, Peter, for your very interesting reply, and for your offer to send the R script. Unfortunately, this would not be very useful to me, I have to confess, since I don’t use R. However, data sets are very useful indeed, since I can handle them readily, and my take on data often is a bit different from what is usually seen! I would be very interested in the numerical data that you have used for your illustration. I don’t have digitiser, so can’t work out the values very easily from the diagram.
        I understand the value of smoothing in climate data, and have indeed implemented some smoothing techniques in my software – at the request of some clients – but I seldom if ever use them myself. I tend to think of them as a means of showing to readers (especially the media) who are not prepared or able to handle data themselves, the general inferences that may help to summarise the data. From your graph I can see immediately that my interpretation would differ a lot from yours, and I think that you would find it intriguing.

        Your idea of using smoothed adjustments when “compensating” for supposed changes in the way that the original data were obtained is interesting. I have not got my head around to accepting that adjustments of any kind are valid, apart from something intended to cope with UHI. I have not really understood fully how they “allow” for this. Do they subtract something from current urban sites’ data or add something to older data. I have never seen it set out in a simple fashion. “Allowing for urban warming” doesn’t tell me anything tangible. Anyway it seems valid enough in principle to do something about what must be a real artificial effect, but the complex nature of using the nearest several “rural” locations as a kind of monitor is open to questioning, especially as they seem to use an algorithm rather than using individual corrections, and this I feel is open to question. How was the algorithm set up, and is it “sensible” in all cases?

        I’ve not tried to insert a diagram in this reply – and don’t even know if it can be done. If it is possible I’ll be able to show you some things that underlie my view of “climate change”. Hope to hear from you again in due course.

        Best! Robin

        • Do they subtract something from current urban sites’ data or add something to older data.

          The Gistemp UHI adjustment strategy is removal of any trend in the urban station data while retaining the month-to-month variation, and substituting the distance weighted average trend of nearby rural stations, assumed unaffected by UHI.

  4. That discussion will be coming soon – have been gathering further information about the stations which are rural on nightlights, whether using GISS or corrected coordinates, but were urban according to the GHCN metadata used earlier.

  5. Pingback: GHCN data collection issues (from an Irish perspective) | Peter O'Neill's Blog

  6. robinedwards36 says:

    Hello again Peter,
    I’ve been looking (rather casually I fear due to assorted pressures) at your plots of station data from Siberia, together with fitted lines. I too have worked with several data sets from Siberia, and long ago reached the conclusion (possibly an interim one) that fitting a simple linear model to data over the period of, say 1950 to 2000, is not a good method to adopt. My reason for saying this is that many, in fact almost all, of the sites I’ve looked at have what is to me a prominent feature. This is a sharp discontinuity at around 1987 to 1988, or slightly later with some eastern sites. For me, then, the slope of a regression fit spanning this date(s) is an artefact. The real model is one which acknowledges the discontinuity and produces two segments having different means and often non-significant slopes, in strong contrast the the full span regression.

    What i’d recommend to you try is to divide (arbitrarily at this stage) the data into pre and post late1987 (approx), and fit two regression lines. Alternatively one can run a dummy variable regression, perhaps with an interaction term for a possible difference in slope between the two segments.

    I have become convinced that Europe – particularly the mid and north areas, underwent a step change of between 0.6 and 1.0 C, very often close to November 1987 . Just look at Meto Suisse’ data for places like Davos, Lugano, Berne, and many other places at various altitudes, and it becomes very obvious. All have exactly the same form of cusum plot for what I call Monthly Differences, with a pronounced discontinuity at that time. It can’t just be chance! The UK Met Office station data sets all have very closely coupled cusums, demonstrating effectively identical behaviour, even down to very small irregularities. It is really fascinating.

    Hope you can find time to comment!

    Best wishes, Robin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s