GHCN Metadata (v2.inv) and Gistemp

This is a consolidated and expanded version of comments posted at Ron Broberg’s blog, The Whiteboard,  http://rhinohide.wordpress.com/2010/03/10/ghcn-metadata-horseshoes-and-hand-grenades/, with some images updated to outline the area likely to have been examined to determine night light radiance.

I have revisited Hansen et al. 2001 (Hansen, J.E., R. Ruedy, Mki. Sato, M. Imhoff, W. Lawrence, D. Easterling, T. Peterson, and T. Karl, 2001: A closer look at United States and global surface temperature change. J. Geophys. Res., 106, 23947-23963) to find the spatial resolution of the radiance data, about 2.7 km. I have modified the R script first posted at The Whiteboard to add two boxes centred at each station, the width and height of the inner box set at 2.7 km, this box also showing a “crosshairs”, while the width and height of the outer box are set at twice those of the inner box, or 5.4 km. The  outer box shows the outer limits of 2.7 km squares which still include the given station location.

The problem [accuracy of the latitude/longitude coordinates in the metadata] is, as they say, “even worse than we thought”. One of the consumers of GHCN metadata is of course Gistemp, and the implications of imprecise latitude/longitude for Gistemp are now considerably greater, following the change last month to use of satellite-observed night light radiance to classify stations as rural or urban throughout the world, rather than just in the contiguous United States as was the case previously. As about a fifth of all GHCN stations changed classification as a result, this is certainly not a minor change. But how can you judge night light radiance of stations which are not where you believe them to be?

I notified Goddard/Gistemp two weeks ago of problems even greater than those outlined at the Whiteboard, but have not as yet received any reply (surprisingly, as I have previously received responses within one working day when passing on code errors or other comments). In view of this I also emailed Russell Vose at NOAA directly late last week, rather than assuming he would be contacted by Goddard. He has replied, appreciating feedback, is passing on the information to those working on a new version of the temperature dataset, and indicates that “Hopefully some of these can be fixed quickly, but others may take a little longer”.

Update (June 4): I should have added earlier that I received a reply from James Hansen on April 14th, having e-mailed him directly on March 30th, repeating my earlier comments this time as a response to his invitation for comments on the March 19th draft of his new paper. His response was that he/they “will look into this soon”.

I have now compared the revised June 1st draft, and note with regret that while there has apparently been time to update some of the figures and discussion to include April data, and among other revisions to add suggestions that “much of the media is owned by or strongly influenced by special economic interests”, that “The difficulty is compounded by continual attacks on the credibility of scientists” and that “the best hope may be repeated clear description of the science”, there unfortunately does not appear to have been time to correct the misplaced degree of confidence in the data he is using as demonstrated by the continuing inclusion of “Station location in the meteorological data records is provided with a resolution of 0.01 degrees of latitude and longitude, corresponding to a distance of about 1 km”, which is clearly incorrect.

In fact, I find no discussion whatever of the impact of the change to nightlights on the number of stations classified as rural or urban, nor of the uncertainty inherent in this classification when, as we can see below, the location of the nightlight examined may in some cases be up to 300 km from the actual location of the station (and perhaps even further – I have only examined a modest subset of stations, and identified the gross errors detailed below, but similar gross errors outside Europe may well also exist; I have not however looked for similar gross errors outside Europe, as I would be less likely to spot them). In addition to such gross errors, examining some of the stations located at airports, I have found a substantial number of the corresponding nightlight locations are located sufficiently far from the runways that the nightlight value may not reflect that of the stations in question. Yet the February, March and April Gistemp analyses have all appeared since I brought these data quality issues to the attention of the Gistemp team, without even a token correction of the gross location errors.

I might respectfully suggest that the time to bemoan “continual attacks on the credibility of scientists” is after, not before, issues such as these have been addressed.

I had previously notified Goddard/Gistemp in late December that I had concerns regarding the quality of the metadata (and for that matter also notified some concerns regarding v2.mean as well earlier in December), citing specifically the relocation out at sea of the two Cherbourg stations referred to below as an example, as well as discussing the urban/rural changes for a few locations “close to home”, and suggesting that

“It looks to me as if changing to global lights is likely to introduce as many new misclassified stations as it corrects, and that the only way this classification can really be improved is by actually examining each location, painful as that exercise may be.”

although at that time I had not examined v2.inv more closely, and was not yet aware of the magnitude of some of the location errors, nor of the number of imprecise coordinates. That notification was prompted by the trial run by Goddard/Gistemp using night light radiance. As soon as I spotted that Gistemp had moved to global classification (the 2010_01 update in the archive) using night light radiance, I looked more closely at the v2.inv metadata, as described below, and immediately notified Goddard/Gistemp of what I found. I had intended to wait until the next Gistemp update appears to allow them time to consider how to handle this, even without the courtesy of a reply, but posted at The Whiteboard once this is was out in the open. The next Gistemp update (2010_02 in the archive) has now appeared, still using night light radiance globally, and without any further comment. Before considering some further examples of metadata quality problems below, here is how Goddard announced the change:

January 16, 2010:The urban adjustment, previously based on satellite-observed nightlight radiance in the contiguous United States and population in the rest of the world (Hansen et al., 2001), is now based on nightlight radiances everywhere, as described in an upcoming publication. The effect on the global temperature trend is small: Based on the 1900-2009 period, that change reduces it by about 0.005 °C per century.       (http://data.giss.nasa.gov/gistemp/updates/)

The result of this change on station classification of the 7364 stations in v2.inv (Gistemp drops some of these before the gridding step) is that 456 716 rural stations become urban, while 1172 716 urban “non-rural stations become rural, a nett increase of 716 rural stations.

Correction to the correction: The hard disk from my failed new laptop has returned, now fitted in an otherwise complete replacement. Reviewing the data I have corrected here now that the file I used to prepare the data is available again I find that I had misunderstood my own data and “corrected” values which did not in fact need correction, but did need a cleared description. The new values added are also correct, but the table below needs to be edited and described better. I will do this in the next day or two, and edit this note again when this has been done.

Correction: I should have spotted earlier that the increase of 456 above is the increase in the number of periurban stations, not of rural stations, and the decrease of 1172 stations is the decrease in “fully” urban stations. The nett decrease in “non-rural” stations is thus 716, matching the increase in the number of rural stations, as of course it should if the total number of stations remains unchanged.

The unbalanced changes should have alerted me that there was an error. I suspect however that failure to spot this may have resulted from unwise late night editing. I am also reformatting the table below, which I hope will now become clearer, as well as adding a breakdown of the number of stations actually used in Gistemp STEP 2 (based on the April 2010 data – the actual numbers vary slightly from month to month) .

Before After Change
Rural 3097 3097 3813 3813 +716
PeriUrban 2206 2662
Urban 2061 4267 889 3551 -716
Total Stations 7364 7364 0
Rural 2508 2508 3124 3124 +616
Urban 3796 3796 3180 3180 -616
STEP2 Stations 6304 6304 0


1628 1432 1628 stations were reclassified. (I also suggested to Goddard/Gistemp that a change on this scale should have been announced on the main Gistemp page, not buried in the updates page).

Do we assume that all these stations were incorrectly classified before, and are correctly classified now? How many stations are now misclassified as a result of using the radiance value of an area some distance away?  Perhaps the minimal change claimed may reflect the reclassification of a substantial number of stations by a process more akin to randomisation than to classification on the basis of an appropriate radiance value.

Consider the following Google Earth image:

Albacete - searching for the true location

Albacete - searching for the true location

The yellow pushpin on the right shows the location of the previously urban Spanish station at Albacete, as found in v2.inv – certainly a location which could be expected to be dark at night. The two pushpins on the left show stages in finding the true location, an interesting story.

The v2.inv (Gistemp version with radiance) for Albacete is:

64308373001 ALBACETE            SPAIN       39.00    1.80   43    0U    83FLxxno-9x-9WATER           A    0

and at first I thought that the problem was simply an east/west error, and that the longitude for Albacete should be -1.80, not +1.80, and this did relocate the station closer to Albacete, as the right hand pushpin in this Google Earth image shows:

albacete1

Albacete - the immediate area

but something still seemed wrong. Then I looked at another Spanish station, Soria:

64308171001 SORIA               SPAIN       41.70    1.20 1068  424R    -9HIFOno-9x-9MED. GRAZING    A   14

Soria - the Greenwich-Madrid shift

Soria - the Greenwich-Madrid shift

which showed a similar eastward shift, but one which could not be explained by a change of sign. Then I recalled that the maps I had used many years ago walking in the Picos de Europa in Spain had shown longitude relative to Madrid : 3.6879 degrees west of Greenwich, and all became clear. The v2.inv longitudes for these two stations were relative to Madrid, not Greenwich, and must have been located on mapping other than ONC!

These however were the only two of the Spanish stations referenced relative to Madrid, the remaining stations stations seem to have been referenced relative to Greenwich. But there were still a couple of trivial (though perhaps not so trivial politically) errors among these “Spanish” stations.

64308160000 TENKODOGO                       11.77    0.38 -999  288U  449HIxxno-9A10WARM GRASS/SHRUBA    0

and

64308330001 SINTRA/GRANJA                   38.80   -9.30  133  196U  1100HIxxCO 5A10COASTAL EDGES   C   32

are located in Burkina Faso and Portugal respectively. Portugal already has a genuine historic border dispute with Spain (Olivença: http://en.wikipedia.org/wiki/Olivenza), without adding an imaginary one.

I located three major errors in about ten minutes of starting to look for errors, relocations so distant that no knowledge of the true coordinates was needed to find them. I had already noticed some time ago that the two stations at Cherbourg and Cherbourg airport had been relocated out into La Manche/English Channel, albeit not as extremely as some of the stations discussed here, and it was in fact this which prompted me to check other stations now, and suggested an easy way to check at least some stations – I simply imported the coordinates of all European stations into Microsoft AutoRoute, and looked at the relatively few pushpins which appeared to be out at sea, and checked these to see if there was a “supporting island”. Albacete, Limassol and Isola Gorgona were obviously wrong. There were also some others closer to shore, and in areas such as Turkey where the map detail was insufficient.

61017600001 LIMASSOL            CYPRUS      34.70   32.00    8    0U    82HIxxCO 1x-9WATER           A    0
62316197001 ISOLA GORGONA                   42.40    9.90  254    0R    -9HIxxCO 1x-9WATER           A    0

Limassol - out at sea

Limassol - out at sea

Isola Gorgona - shifted south

Isola Gorgona - shifted south

And of course these may be repeated elsewhere – I could only examine European waters in this way.

Considering Albacete led me to find Soria, and then to look at the stations located at airports, a group of stations where the general location of the station could be inferred from the airport boundaries. I found that a considerable number of the coordinates in v2.inv were located at a considerable distance from the airport boundary, where I have regarded “a considerable distance” as “by multiples of the length of the main runway”, and considerably more than the precision implied by coordinates given as degrees to two decimal places. I generated KML files setting up tours of these stations in Google Earth, with pushpins and identifying information at each station location.

Among examples of doubtful coordinates from Algeria, the first country in the file, are:

Tebessa

Tebessa: changed from urban to rural

Oran

Oran: changed from urban to rural

Tlemcen

Tlemcen: changed from urban to rural

Mecheria

Mecheria: remains rural

Sidi Bel Abbes

Sidi Bel Abbes: remains urban, but this is not the airport

El Oued

El Oued: changes from urban to rural!

Hassi-Messoud

Hassi-Messoud: remains rural

"Hassi-Messoud" close-up?

"Hassi-Messoud" close-up?

The real Hassi Messoud?

The real Hassi Messoud?

All 50 Algerian stations in v2.inv:

10160355000 SKIKDA                          36.93    6.95    7   18U  107HIxxCO 1x-9WARM DECIDUOUS  C   49
10160360000 ANNABA                          36.83    7.82    4   33U  256FLxxCO 1A 7WARM CROPS      C   12
10160390000 DAR-EL-BEIDA                    36.72    3.25   25   34U 1365FLxxCO10A 6WARM CROPS      C   25
10160395001 FT. NATIONAL                    36.52    4.18  942  805R   -9MVDEno-9x-9WARM CROPS      A    0
10160400001 CAP CARBON                      36.80    5.10  230   28R   -9HIxxCO 1x-9WATER           A   13
10160402000 BEJAIA                          36.72    5.07    2  121U   90HIxxCO 1A 3WATER           B   17
10160403000 GUELMA                          36.47    7.47  227  287S   47HIxxno-9x-9WARM CROPS      C   26
10160419000 CONSTANTINE                     36.28    6.62  694  563U  335MVxxno-9A 7WARM FOR./FIELD B    0
10160425000 CHLEF                           36.22    1.33  143  242U  106HIxxno-9A 3WARM CROPS      C    8
10160425001 ORLEANSVILLE                    36.17    1.50  112  219R   -9HIDEno-9x-9WARM CROPS      A    7
10160430000 MILIANA                         36.30    2.23  715 1167R   -9MVDEno-9x-9WARM DECIDUOUS  C   11
10160444000 BORDJ BOU ARR                   36.07    4.77  928 1051U   57MVxxno-9x-9WARM FOR./FIELD C   29
10160445000 SETIF                           36.18    5.42 1081 1060U  144MVxxno-9x-9WARM FOR./FIELD C   31
10160457000 MOSTAGANEM VI                   35.88    0.12  137  157U  102HIxxCO 4A 2WARM CROPS      B   10
10160461000 ORAN                            35.70   -0.65   60   78U  492HIxxCO 3x-9WARM CROPS      C   55
10160468000 BATNA                           35.55    6.18 1052 1015U   85MVxxno-9x-9WARM DECIDUOUS  C   37
10160475000 TEBESSA                         35.48    8.13  813  818U   67MVxxno-9A 2WARM FOR./FIELD B    0
10160490000 ORAN/ES SENIA       ALGERIA     35.60   -0.60   90   98U  492HIxxCO10A 6WARM CROPS      B    9
10160518000 BENI-SAF                        35.30   -1.35   68  103R   -9HIDECO 1x-9WARM CROPS      B   11
10160522000 MAGHNIA                         34.82   -1.78  426  409R   -9MVDEno-9x-9WARM DECIDUOUS  A    0
10160525000 BISKRA                          34.80    5.73   87   91U   91HIxxno-9A 3MED. GRAZING    B    0
10160531000 TLEMCEN (ZENA                   35.02   -1.47  247  223U  109MVxxCO25A20WARM CROPS      B    8
10160531001 TLEMCEN             ALGERIA     34.80   -1.30 -999 1037U  109MVxxno-9x-9WARM CROPS      A    0
10160535000 DJELFA                          34.68    3.25 1144 1071U   51HIxxno-9x-9MED. GRAZING    C   32
10160536000 SAIDA                           34.87    0.15  750  802U   62MVxxno-9A 4MED. GRAZING    C   20
10160540000 EL KHEITER                      34.15    0.07 1000 1113R   -9FLDEno-9x-9WARM DECIDUOUS  B    0
10160545000 LAGHOUAT                        33.77    2.93  765  763U   59HIxxno-9A 3HIGHLAND SHRUB  B    0
10160545001 AYATA               ALGERIA     33.50    3.70   41  771R   -9HIDEno-9x-9WARM GRASS/SHRUBA    0
10160549000 MECHERIA            ALGERIA     33.60   -0.30  176 1501R   -9MVDEno-9A-9MED. GRAZING    A    0
10160549001 SIDI-BEL-ABBAS                  35.18   -0.63  486  527U  116HIxxno-9A 2WARM CROPS      C   23
10160550000 ELBAYADH                        33.67    1.00 1341 1446S   39MVxxno-9x-9HIGHLAND SHRUB  B   16
10160550001 GERYVILLE           ALGERIA     33.80    1.00 1306 1315R   -9MVDEno-9x-9HIGHLAND SHRUB  A    0
10160555000 TOUGGOURT                       33.12    6.13   85  106U   76FLxxno-9x-9HOT DESERT      B   18
10160559000 EL OUED                         33.50    6.12   63  107U   72FLxxno-9A15HOT DESERT      A    0
10160560000 AIN SEFRA                       32.77   -0.60 1058 1811R   -9MVDEno-9x-9WARM GRASS/SHRUBA   26
10160566000 GHARDAIA                        32.38    3.82  450  454U   71HIxxno-9A12HOT DESERT      A    7
10160571000 BECHAR                          31.62   -2.23  773  781U   73HIxxno-9x-9HIGHLAND SHRUB  C   31
10160580000 OUARGLA                         31.92    5.40  141  156U   77FLxxno-9A 6SAND DESERT     B   14
10160581000 HASSI-MESSOUD                   31.70    2.90  398  630R   -9FLDEno-9A-9HOT DESERT      A    0
10160590000 EL GOLEA                        30.57    2.87  397  410S   27FLxxno-9x-9SAND DESERT     B   12
10160602000 BENI ABBES                      30.13   -2.17  499  520R   -9HIDEno-9A-9SAND DESERT     C   10
10160607000 TIMIMOUN                        29.25    0.28  312  314S   22FLxxno-9x-9HOT DESERT      B    0
10160611000 IN AMENAS                       28.05    9.63  562  566R   -9HIDEno-9A-9HOT DESERT      A    0
10160620000 ADRAR                           27.88   -0.28  263  264S   20FLxxno-9x-9COOL FOR./FIELD C   30
10160630000 IN SALAH                        27.20    2.47  293  288S   18FLxxno-9x-9HOT DESERT      C   16
10160630001 OUALLEN ALGERIA                 24.60    1.30  347  316R   -9HIDEno-9A-9HOT DESERT      A    0
10160630002 AOULEF ALGERIA                  27.00    1.10  290  266R   -9FLDEno-9A-9HOT DESERT      C   10
10160656000 TINDOUF                         27.67   -8.13  431  395R   -9FLDEno-9x-9HOT DESERT      C   21
10160670000 DJANET                          24.55    9.47 1054 1067R   -9MVDEno-9x-9SAND DESERT     B   13
10160680000 TAMANRASSET                     22.78    5.52 1378 1440R   -9MVDEno-9x-9HOT DESERT      C   26

and there are many more countries in the metadata file. For anyone interested in examining these stations themselves, here is my R script to generate a KML tour.

KML is the scripting language for Google Earth. The generated KML file (which is plain text, an XML file which you can open and read) contains two sections. The first section is a set of latitude/longitude pairs to be “flown to” in sequence (the tour), the second section contains the placemarker data, lat/long, viewing altitude and orientation, etc., to generate pushpins and grids at the stopping points. By default, opening and running the file in Google Earth will “fly” between points, waiting 4 seconds at each point. The “flying time” is 10 seconds, and speeding this up too much requires preloading of mapping data, resource heavy, rather than loading on the fly.

Long lines may wrap below, but I hope these are clear enough.  As listed below, the current GISS v2.inv (i.e. with radiance values added) is downloaded and saved locally, then all the airport stations are extracted to form a Google Earth tour. The default time between features and wait at features set in Google Earth are used. Pausing the tour during the wait at a feature allows zooming in or out to examine surroundings, after which the tour may be resumed. The altitude value used, 12000, was chosen to allow runways to be easily visible if within the view window, but zooming out may be necessary to find runways where the v2.inv coordinate errors are larger.

Each station has information prepended as it appears at the pushpin, as:

AUC(36.83,7.82)12_ANNABA
|||  |          |
|||  |          radiance
|||  lat,lon
||GHCN brightness code: A/B/C, + 1/2/3 index for Canada, Mexico, US
|urban/rural code: R/S/U
airport code: x/A

In addition, a flag, * or ?, is added where the station has changed from urban to rural or rural to urban (peri-urban being considered as urban for this purpose).

A variable usePreviousLightsForUS is set TRUE to classify “US” stations before the change according to the previous Gistemp rules (155 Rural -> Urban changes, 437 Urban -> Rural changes for airport stations). Setting this to FALSE allows the previous classification to be determined for all stations according to the R/S/U codes for comparison (and would show 282 Rural -> Urban changes, 374 Urban -> Rural changes for airport stations).

File names, a tour name, and the usePreviousLightsForUS value are all set at the beginning of the script. Three further options can also be set as TRUE or FALSE: airportsOnly, UtoRonly (only stations changing from urban to rural) and RtoUonly (only stations changing from rural to urban). boxWidth sets the width (and height) of the inner box, in km. An additional file, with the same file name as the KML output file but with a .txt extension, will also be generated listing the v2.inv entries for the stations in the tour generated.

(2010/04/12: Syntax colouring added, based on Perl as no specific R colouring is available)

# Comment out next three lines after file download, save local copy
url <- "ftp://data.giss.nasa.gov/pub/gistemp/GISS_Obs_analysis/GISTEMP_sources/STEP1/input_files/v2.inv"
fred <- readLines(url)
write(fred, "d://v2.inv")
# Replace path and/or file name above and below as appropriate
# ---------------------------------------------------------------------
url <- "d://v2.inv"
outfile <- "d://GoogleTourA.kml"
tourName <- "Airports in file order"
usePreviousLightsForUS <- TRUE
airportsOnly <- TRUE
UtoRonly <- FALSE
RtoUonly <- FALSE
boxWidth <- 2.7
# ---------------------------------------------------------------------
txtfile <- paste(substr(outfile, 1, nchar(outfile) - 3), "txt", sep="")
fred<-readLines(url)
#10160355000 SKIKDA                          36.93    6.95    7   18U  107HIxxCO 1x-9WARM DECIDUOUS  C   49
#10160360000 ANNABA                          36.83    7.82    4   33U  256FLxxCO 1A 7WARM CROPS      C   12
write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", outfile)
write("<kml xmlns=\"http://www.opengis.net/kml/2.2\"", outfile, append=TRUE)
write("  xmlns:gx=\"http://www.google.com/kml/ext/2.2\">", outfile, append=TRUE)
write("<gx:Tour>", outfile, append=TRUE)
write(paste("<name>", tourName, "</name>", sep=""), outfile, append=TRUE)
write("<gx:Playlist>", outfile, append=TRUE)
write("", txtfile)
nRU <- 0
nUR <- 0
nUNC <- 0
nU0 <- 0
nS0 <- 0
nR0 <- 0
nU <- 0
nS <- 0
nR <- 0
lats <- c()
lons <- c()
nmes <- c()
boxY <- boxWidth / 111.0
boxC <- cos(boxWidth / 6371.0)
for (i in 1:length(fred)) {
  if (airportsOnly) {
    if (substr(fred[i],82,82) != "A") { next } # select airports only
  }
  mark <- ""
  nUNC <- nUNC + 1
  idx <- as.integer(substr(fred[i],104,106))
  if (substr(fred[i],68,68) == "R") { # rural
    if (usePreviousLightsForUS) {
      if (substr(fred[i],102,102) == " ") {
        nR0 <- nR0 + 1
        if (idx > 10)
        {
          mark <- "?"; nRU <- nRU + 1
          if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
        } else {
          nR <- nR + 1
          if (RtoUonly) { next }
        }
        if (UtoRonly) { next }
      } else {
        if (substr(fred[i],102,102) == "1") {
          nR0 <- nR0 + 1
          if (idx > 10) {
            mark <- "?"; nRU <- nRU + 1
            if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
          } else {
            nR <- nR + 1
            if (RtoUonly) { next }
          }
          if (UtoRonly) { next }
        } else {
          if (substr(fred[i],102,102) == "3") { nU0 <- nU0 + 1 } else { nS0 <- nS0 + 1 }
          if (idx < 11) {
            mark <- "*"; nUR <- nUR + 1; nR <- nR + 1
          } else {
            if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
            if (UtoRonly) { next }
          }
          if (RtoUonly) { next }
        }
      }
    } else { # don't usePreviousLightsForUS
      nR0 <- nR0 + 1
      if (idx > 10) {
        mark <- "?"; nRU <- nRU + 1
        if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
      } else {
        nR <- nR + 1
        if (RtoUonly) { next }
      }
      if (UtoRonly) { next }
    }
  } else { # urban or periurban
    if (usePreviousLightsForUS) {
      if (substr(fred[i],102,102) == " ") {
        if (substr(fred[i],68,68) == "U") { nU0 <- nU0 + 1 } else { nS0 <- nS0 + 1 }
        if (idx < 11) {
          mark <- "*"; nUR <- nUR + 1; nR <- nR + 1
        } else {
          if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
          if (UtoRonly) { next }
        }
        if (RtoUonly) { next }
      } else {
        if (substr(fred[i],102,102) == "1") {
          nR0 <- nR0 + 1
          if (idx > 10) {
            mark <- "?"; nRU <- nRU + 1
            if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
          } else {
            nR <- nR + 1;
            if (RtoUonly) { next }
          }
          if (UtoRonly) { next }
        } else {
          if (substr(fred[i],68,68) == "U") { nU0 <- nU0 + 1 } else { nS0 <- nS0 + 1 }
          if (idx < 11) {
            mark <- "*"; nUR <- nUR + 1; nR <- nR + 1
          } else {
            if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
            if (UtoRonly) { next }
          }
          if (RtoUonly) { next }
        }
      }
    } else { # don't usePreviousLightsForUS
      if (substr(fred[i],68,68) == "U") { nU0 <- nU0 + 1 } else { nS0 <- nS0 + 1 }
      if (idx < 11) {
        mark <- "*"; nUR <- nUR + 1; nR <- nR + 1
      } else {
        if (idx > 35) { nU <- nU + 1 } else { nS <- nS + 1 }
        if (UtoRonly) { next }
      }
      if (RtoUonly) { next }
    }
  }
  lat <- gsub(" ","",substr(fred[i],44,49))
  lon <- gsub(" ","",substr(fred[i],51,57))
  o <- paste(substr(fred[i],82,82), substr(fred[i],68,68),
    gsub(" ","",substr(fred[i],101,102)), "(",
    lat, ",", lon, ")",
    gsub(" ","",substr(fred[i],104,106)), mark,
    "_",substr(fred[i],13,42), sep="")
  o <- sub("[ ]*$", "", o)
  o <- sub("[ ]{2,}", " ", o)
  lats <- c(lats, lat)
  lons <- c(lons, lon)
  nmes <- c(nmes, o)
  write(fred[i], txtfile, append=TRUE)
  write("<gx:FlyTo><LookAt>", outfile, append=TRUE)
  write(paste("<longitude>", lon, "</longitude><latitude>", lat, "</latitude>",
    sep=""), outfile, append=TRUE)
  write("<altitude>0</altitude><heading>0</heading><tilt>0</tilt><roll>0</roll><altitudeMode>absolute</altitudeMode></LookAt></gx:FlyTo>",
    outfile, append=TRUE)
}
write(paste("</gx:Playlist></gx:Tour><Folder><name>", tourName, "</name>", sep=""), outfile, append=TRUE)
write("<Style id=\"pushpin\"><IconStyle><Icon><href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href></Icon></IconStyle></Style>",
  outfile, append=TRUE)
write("<Style id=\"redline\"><LineStyle><color>ff0000ff</color><width>3</width></LineStyle></Style>", outfile, append=TRUE)
for (i in 1:length(lats)) {
  boxX <- as.numeric(lats[i]) * pi / 180.0
  boxX <- (boxC - sin(boxX) * sin(boxX)) / (cos(boxX) * cos(boxX))
  boxX <- acos(boxX) * 180.0 / pi
  write(paste("<Placemark id=\"pin", i, "\"><name>", nmes[i], "</name><styleUrl>pushpin</styleUrl>", sep=""), outfile, append=TRUE)
  write(paste("<Camera>", sep=""), outfile, append=TRUE)
  write(paste("<longitude>", lons[i], "</longitude><latitude>", lats[i], "</latitude>", sep=""), outfile, append=TRUE)
  write("<altitude>12000</altitude><heading>0</heading><tilt>0</tilt><roll>0</roll><altitudeMode>absolute</altitudeMode></Camera>",
    outfile, append=TRUE)
  write("<styleUrl>#redline</styleUrl><MultiGeometry>", outfile, append=TRUE)
  write("<LineString><tessellate>1</tessellate><coordinates>", outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + 0.5 * boxX, ",", as.numeric(lats[i]) + 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) - 0.5 * boxX, ",", as.numeric(lats[i]) + 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) - 0.5 * boxX, ",", as.numeric(lats[i]) - 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + 0.5 * boxX, ",", as.numeric(lats[i]) - 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + 0.5 * boxX, ",", as.numeric(lats[i]) + 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write("</coordinates></LineString>", outfile, append=TRUE)

  write("<LineString><tessellate>1</tessellate><coordinates>", outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + boxX, ",", as.numeric(lats[i]) + boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) - boxX, ",", as.numeric(lats[i]) + boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) - boxX, ",", as.numeric(lats[i]) - boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + boxX, ",", as.numeric(lats[i]) - boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + boxX, ",", as.numeric(lats[i]) + boxY, ",10", sep=""), outfile, append=TRUE)
  write("</coordinates></LineString>", outfile, append=TRUE)

  write("<LineString><tessellate>1</tessellate><coordinates>", outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) + 0.5 * boxX, ",", as.numeric(lats[i]), ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]) - 0.5 * boxX, ",", as.numeric(lats[i]), ",10", sep=""), outfile, append=TRUE)
  write("</coordinates></LineString><LineString><tessellate>1</tessellate><coordinates>", outfile, append=TRUE)
  write(paste(as.numeric(lons[i]), ",", as.numeric(lats[i]) + 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write(paste(as.numeric(lons[i]), ",", as.numeric(lats[i]) - 0.5 * boxY, ",10", sep=""), outfile, append=TRUE)
  write("</coordinates></LineString>", outfile, append=TRUE)
  write(paste("<Point><coordinates>", lons[i], ",", lats[i], ",0</coordinates></Point>", sep=""), outfile, append=TRUE)
  write("</MultiGeometry></Placemark>", outfile, append=TRUE)
}
write("</Folder></kml>", outfile, append=TRUE)
cat(nRU, " Rural -> Urban changes", "\n");cat(nUR, " Urban -> Rural changes", "\n")
cat(nUNC - nRU - nUR, " unchanged", "\n")
cat(length(lats), "output", "\n")
cat("Rural:      (before) ", nR0, "  (after) ", nR, "\n")
cat("PeriUrban:  (before) ", nS0, "  (after) ", nS, "\n")
cat("Urban:      (before) ", nU0, "  (after) ", nU, "\n")

Note: As a new blog, I have left the comments setting as the default of moderated first comments from new commenters. I may remove this restriction if spam is not a problem. My time zone is GMT, which may explain delays in approval at certain times.I have also noticed that the “radiance 2.7 km” boxes are sometimes not completely drawn in Google Earth when other processes are making significant demands on processing time. Starting the tour again when the other processes have completed seems to be the easiest solution.

Before After
Rural 3097 3097 3813 3813
PeriUrban 2206 2662
Urban 2061 4267 889 3551
7364 7364
About these ads
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

21 Responses to GHCN Metadata (v2.inv) and Gistemp

  1. Great work,

    I was following your comments on the WhiteBoard.

    I’ll add the irradiance data to my GHCN metadata file. and give you a link to it to check the work

  2. Anthony Watts says:

    Oh what a mess.

    You can actually see the GISS nightlight data in Google Earth, and thus judge how far off this will be in relation to the station metadata by doing the following in Google Earth:

    1. In the Layers dialog at left, click on “Gallery” to expand it

    2. Under Gallery, click on “NASA” to expand it.

    3. Click Earth City Lights to enable the image overlay

    4. Load up your GHCN coordinates, have fun

    Armed with this new tool, a guest post at WUWT is open to you should you want it. – Anthony

    • oneillp says:

      Thanks – I had not seen the Earth City Lights data. However, this seems to be of rather better spatial resolution than that used for v2.inv.

      Hansen et al., 2001 indicates

      As an alternative approach to identifying stations subject to human influence, we test in this paper the use
      of satellite observations of nighttime light emissions. Specifically, we use observations from a United States
      Defense Meteorological Satellite taken with a highly sensitive photomultiplier tube [Imhoff et al., 1997].
      Observations employed are generally those taken under a new moon to minimize reflected moonlight. A composite
      of many images is used to eliminate ephemeral light sources such as lightning and fires. The observations were
      acquired in 1995, so they do a good job of describing current urban development. The same data have been used to
      quantify the effect of urban development on primary productivity [Imhoff et al., 2000]. The spatial resolution of the
      data used here is about 2.7 km.

      and Imhoff et al., 1997 refers to

      a pixel (2.7X2.7 km) appearing “lighted” …

      and Imhoff et al., 2000 refers to

      the DMSP/OLS operates at extreme sensitivity collecting image
      data at a moderate spatial resolution (2.7 km pixel)…

      I’ll see if I can modify my script to give an option to tour arriving at each station “at night”, then hiding the night light layer to reveal the surface detail.

  3. The other interesting issues occur with Stations that are coastal and by Lakes.

    Question: can you post a file that is simply the GHCN indentfier, brightness index

    So the first element in the .inv and the last element?

    • oneillp says:

      Is the following format suitable?

      10160355000 49
      10160360000 12
      10160390000 25
      10160395001 ...

      i.e. in FORTRAN terms I3, I8, I4

      If not, suggest a preferred format, and I’ll post an R script.

      And further to my reply below to your other comment: as well as “CO” there is a “LA” code for stations next to a large (greater than 25 sq km) lake. CO takes precedence over LA when both apply, and CO may be followed by the distance in km to the coast (or -9 if not given)

      • Actually the best format is this:

        GHCN ident, metadata

        But I think a space or tab inbetween should work just fine.

        If you can just post a URL to the end result that is just fine.

        Zeke has a place were he posts all his files.

        I’m aware of the LA data feild and the rest. You can go click here and see what I’m up to.

        http://surfacestations.dabbledb.com/publish/surfacestations

        • oneillp says:

          Comma separated values in other words. As I’m not too familiar with WordPress as yet, it seems that to post the end result as a file I would need to rename and upload it as a “doc” file. So I have taken the simpler route of simply posting the end result as a new post.

  4. vjones says:

    This is truely shocking. It brings a whole new meaning to “moving the goal posts”.

  5. Ron Broberg says:

    Congrats on your blog post!
    Looks like you attracted some attention.

  6. Is there any simple way to figure out which stations are on islands and output a list of the ghcnids for those on islands?
    other than plowing throw the whole list?

    RomanM may want this

    • oneillp says:

      Not so simple: you can replace

      if (airportsOnly) {
          if (substr(fred[i],82,82) != "A") { next } # select airports only
      }
      

      in my script by

      if (airportsOnly) {
          if (substr(fred[i],78,79) != "CO") { next } # select CO (within 30 km of coast) only
          if (substr(fred[i],85,89) != "WATER") { next } # select WATER for nearest 0.5x0.5 degree grid only
      }
      

      (and change the output file name [outfile] and tour name [tourName]) and instead of selecting only airports you will select only stations within 30 km of the coast (2264 stations) and then further restrict this to stations next to water (948 stations). You can review the selection in Google Earth, and delete non-island stations from the matching txt file. This may or may not do what you want, but it is the best solution I can come up with.

      • oneillp says:

        “Not so simple” was intended to refer to the solution for the original request, not to Ron Broberg’s suggestion, which I had not spotted when I posted my own suggestion!

    • Ron Broberg says:

      According to the code commentator – columns 78,79 should designate an island by IS.
      They don’t. So you are left with the “CO” coastal indicator.

      grep “^.\{77\}CO” v2.temperature.inv

      • The island thing threw me.

        Especially for a bigger island where the station is on the island but not 30km from the coast of the island.

        and then for stations near the coast but not on an island.

        I was thinking of having jeffID get me a list of stations NOT within the land mask..

        Anyway So metadata holes that should be worth filling

        1. Island ( small/large?)
        2. Ice near coast during the year?
        3. Vegatative index. i still like this one for some reason.
        4. nightlights value by year ( for the years people have)
        5. 1990 density.
        6. pop density at 20Km

        Ok later

  7. E.M.Smith says:

    Stellar. Just stellar.

    One small point: Have you considered looking at WHEN a station becomes urban? Of those 7364 GHCN stations, only about 1200 report presently. Are we using 2010 light data to classify 1980 (or perhaps even 1950′s) data? …

    One of the major “issues” I see in the GHCN Metadata is that it is only “now” and not “over time”. So, for example, the Tempelhof Airport in Germany. It starts life in the 1700s, eventually becomes a major airport (the Berlin Airlift and home to Lufthansa) then is retired. It’s not been an active airport for a few years (shut in 2008 per the wiki) yet last time I looked it still has an “Airstation” flag in GHCN. Yet, when that flag is updated to reflect the conversion to a shopping mall, the entire history of being an Airport (and a major one) will be lost in GHCN.

    So it goes from a Nights Templar marching field (thus the name) to major air port to shopping mall. And you get one flag for one point in time (nominally the “present”) to classify ALL that data.

    The same “problem” is endemic to the “urban” data as well. WHEN did a city become a city? We don’t know. And it matters. The UHI for Rome started long before that for Stockton California (that grew dramatically in the 1970′s-90s station 42572492000 or the nearby Lodi at 42572492001 that also grows late in life.)

    Basically, the fundamental metadata data structure is broken as it is only “now”, yet the data are “all history”. Meta data needs to be ‘by year’ just as the temperatures are by year. Otherwise your “corrections” based on the metadata are guaranteed to be wrong at some part of the history.

    • oneillp says:

      The “lights” are 1996/97 data rather than 2010 data, but you underestimate the degree of historical revisionism – I commented on this issue to the Gistemp team last December, when I first noticed their night lights test. I picked a fairly extreme example near to home, one of the stations which was previously used to adjust Dublin Airport:

      Fort William (Scotland) changes from rural to urban, while Ben Nevis remains rural. At first sight this might seem to make sense, until you notice that the GNCN data for these stations only covers the period from 1884-1903, at which time Fort William was fairly certainly still rural. Lights classification fails (These two stations are of interest as they are within 7 km of each other, but vertically separated by more than 1300 m, and provided early reliable comparison of conditions at altitude with those at sea level. But I wonder why they came to be included in the GHCN data, particularly in view of the otherwise surprisingly poor Scottish representation)

      65103038000 FORT WILLIAM                    56.83   -5.10   20  229R   -9MVxxCO 1x-9WARM GRASS/SHRUBC   13
      65103038001 BEN NEVIS           UK          56.80   -5.10 1343  229R   -9MVxxCO 1x-9WARM GRASS/SHRUBB    0

      I should add in fairness that the loss of Fort William as an “adjuster” in this case has no significant effect on the urban stations which it previously adjusted, nor on the gridded data. But this would not hold true for all stations reclassified.

  8. oneillp says:

    I forgot to update this post when I received a reply from James Hansen. I have now added this update to the post (see text in red in the post).

    I have also reformatted the table of rural/urban station numbers before and after the change to nightlights, added these counts for the stations actually used in STEP2 in addition to the counts already tabulated for all stations in v2.inv, and corrected some stupid errors in the discussion of the numbers in this table. The numbers in the table were correct, but I misread my own table while discussing it! (again, look for text in red in the post).

    The revised version (June 1st) of the new Hansen et al paper has appeared. I have made some comments on it in the update to this post, but it probably merits a new post to combine these and some further comments in one post. Not tonight however.

  9. Pingback: Nightlights and Shifting Sands | Digging in the Clay

  10. Pingback: GIStemp: Plus ça change (plus c’est la même chose). | Digging in the Clay

  11. Pingback: Nightlights: First Pinciples « Steven Mosher's Blog

  12. Pingback: Giss Nightlights Replicated « Steven Mosher's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s