GHCN data collection issues (from an Irish perspective)

As I noted in a previous post I have noticed some GHCN data collection issues in the specific context of Irish stations. These include missing values, incorrect values, and past values going missing, being found (or changed), going missing again, being found again, …

For simplicity throughout this post I refer to GHCN rather than to NOAA or NCDC.

Such problems could arise at the originating stage, transmission of the relevant data from the Irish meteorological service. I have however, in one of the cases described below (Valentia Observatory), followed the data abroad to both GHCN and http://www.ogimet.com, and found that on the same dates the CLIMAT messages at Ogimet decoded correctly, giving the value recorded by the Irish meteorological service, whereas the corresponding values shown by GHCN changed from day to day, and were incorrect. This would seem to indicate that valid data was being sent out, and that the problem arose elsewhere. This ‘elsewhere’ may not necessarily be at GHCN – data is also provided through third parties. The end result however remains missing or incorrect data, wherever this may have arisen or gone missing.As it is rather unlikely that there has been any conspiracy to distort the Irish record, I would suspect that similar problems can be found if other countries are examined. It just happens that being Irish I happen to have the corresponding data from the Irish meteorological service to hand for comparison purposes, and have from time to time made such comparisons. I suggest that you make a similar comparison for your own country, and this may well show similar problems.

For a number of years I have collected the GHCN monthly temperature data, raw and adjusted, several times each month, as well as the Antarctic data used as input to NASA Gistemp, and the limited Gistemp output now provided each month (unfortunately the almost complete set of output files previously made available each month on the GISS FTP server have disappeared). For the last three months I have automated this process, downloading the files which have changed each night.

Dublin Airport

Data for the last four years for the unadjusted and adjusted GHCN monthly data and the most recent Gistemp adjusted data, as downloaded at midnight BST (most recent year first, to match the Met Éireann data), preceded by the data for Dublin Airport as shown today on the Met Éireann (Irish meteorological service) website can be seen below. Small differences of 0.1°C or 0.2°C are ignored – these presumably are cases where a value was transmitted, recorded and retained by GHCN before the value was finalised by Met Éireann. It is the larger differences, highlighted in red for the Met Éireann and raw GHCN data, and in green for the adjusted GHCN data (used as input by Gistemp) and the adjusted Gistemp output data. (I have not highlighted the missing Gistemp June 2014 value – the latest Gistemp output available is the run from June 2014, and so only includes values up to May)

dublinairportdata20140711

Six values in the past 30 months are shown as 1090, all differing from the corresponding Met Éireann data. The source flag “C” which follows five of these values indicates that the source is “Monthly Climatic Data of the World (MCDW) QC completed but value is not yet published”. The source flag “P” which follows the May 2014 value indicates that the source is “CLIMAT (Data transmitted over the GTS, not yet fully processed for the MCDW)”. The quality control flag “W” which follows the May 2014 and August/September 2013 values indicates that “monthly value is duplicated from the previous month, based on regional and spatial criteria and is only applied from the year 2000 to the present”.

Summarising, three monthly values have been incorrectly recorded as 10.90°C, and a further three monthly values have been contaminated by duplicating these incorrect values.

If we look back at preceding GHCN  data files however we find that the April 2014 value was correctly recorded as 8.90°C from May 3rd to May 12th, (ghcnm.tavg.v3.2.2.20140503.qcu.dat to ghcnm.tavg.v3.2.2.20140512.qcu.dat), flagged “P” as CLIMAT data, taking on the erroneous 10.90°C value from May 13th onwards, but retaining that “P” CLIMAT flag up to July 9th. Similarly the May 2014 value was correctly recorded as 11.30°C from June 3rd to June 25th, flagged “P” as CLIMAT data, taking on the erroneous 10.90°C value from June 26th to July 1st, with flags “W” duplicated and “P” CLIMAT flags, and then becoming -9999, a missing value, from to July 2nd on.

As I only files saved a few times each month before April 2014 I cannot document the history of the erroneous values in 2013 and 2012 with daily resolution. The April 2013 value appeared correctly as 6.9°C on May 10th and May 19th 2013, flagged “P”, as -9999 on June 3rd, as 10.90°C, flagged “P”, on June 16th and June 22nd, back to the correct value 6.9°C, flagged “K” (received by the UK Met Office), on  June 28th, July 9th, July 17th and July 30th, and returned as 10.90°C, flagged “C”, on August 8th, remaining  10.90°C, flagged “C”, after that.  The July, August and September 2012 values appear correctly recorded, with flag “K”, until July 9th 2014, but become the erroneous 11.30°C on July 10th 2014, with flags “C” and “WC”:

 621039690002012TAVG  620  C  660  C  800  C  670  C  990  C 1270  C 1090  C 1090 WC 1090 WC  840  C  650  C  530  C

I will watch these values in coming days and update as appropriate.

Update: July 11th repeats the erroneous July 10th values. What is it about 10.90°C that makes it such a popular value?

Here is a slide from a presentation back in 2009 showing the errors and missing values in the GHCN v2 data at that time, as displayed at the Gistemp website, where it appeared as input data. the same errors and missing values have been carried forward into GHCN v3.

dublinairportdata2009

In April 2010 I looked again at the Dublin Airport data for later months in 2010, but this time I also looked at the CLIMAT reports at Ogimet:

The Gistemp data:

Dublin Airport Gistemp data

The MET éireann data:

Dublin Airport MET Eireann data

and the OGIMET data: (click on this link to see Sections 3 and 4 and the undecoded reports as well)

Dublin Airport OGIMET data

Valentia Observatory

Here we want to examine the April 2012 data values for Valentia Observatory:

ghcnm.tavg.v3.1.0.20120515.qcu.dat: 621039530002012TAVG  900  K  890  K 1020  K  420 OP-9999   -9999   …
ghcnm.tavg.v3.1.0.20120521.qcu.dat: 621039530002012TAVG  900  K  890  K 1020  K   30 OP-9999   -9999   …
ghcnm.tavg.v3.1.0.20120528.qcu.dat: 621039530002012TAVG  900  K  890  K 1020  K  840  K-9999   -9999   …

8.40°C is the correct value. The values 4.20°C and 0.30°C recorded earlier in May 2012 are nonsense values.

The codes “OP” following these values (and “K” following the Jan-Mar values, and the April value at May 28th) are interpreted:

O = monthly value that is >= 5 bi-weight standard deviations from the bi-weight mean.  Bi-weight statistics are calculated from a series of all non-missing values in the station’s record for that particular month.

P = CLIMAT (Data transmitted over the GTS, not yet fully processed for the MCDW)

K = received by the UK Met Office

The CLIMAT report from Ogimet:

ValentiaCLIMAT

shows the correct value, 8.3°C in group 3 of section 1: 30083011. Climate reports downloaded from Ogimet on the dates with erroneous April 2012 values showed this correct value rather than these erroneous values. The same applies if I download the 2014 CLIMAT reports today from Ogimet for Dublin Airport:

DublinCLIMAT

None of the errors or missing values noted earlier in the GHCN data appear here (nor do they in the 2012 or 2013 CLIMAT reports, which I have not shown here for that reason. As noted at the beginning, this suggests that data loss and corruption is occurring after the data has left these shores.

GHCN station history search

A minor point. While the GHCN (monthly) station inventory files correctly identify 62103969000 as WMO station 03969, Dublin Airport, the NOAA/NCDC  Historical Observing Metadata Repository search page returns Dublin Phoenix Park instead for the WMO ID 03969.

DublinStationHistory

The daily data returned by this page seems likely to be that for Dublin Phoenix Park rather than Dublin Airport (I do not have daily data for the Phoenix Park readily available, but I do have some years daily data for Dublin Airport to hand, and comparison of this data for a sample month with that returned from the page shows differences, and day-to-day changes in maximum and minimum temperatures, too great to be explained by the NOAA disclaimer that “These data are quality controlled and may not be identical to the original observations”. The GHCN (monthly) data for Dublin Airport goes back to 1831, at which time there certainly was no Dublin Airport. Synoptic observations started at Dublin Airport in 1939. My understanding is that the long data series is a combination of earlier data from Phoenix Park and later data from Dublin Airport. The two locations are about 9 km apart, and at similar altitude. The WMO ID is Dublin Airport rather than Phoenix Park.

A note on the format of CLIMAT reports

I notice that group 7 (sunshine) of section 1 for the April 2012 Valentia Observatory CLIMAT report does not conform to 71.1.1 of WMO-No. 306 (MANUAL ON CODES  VOLUME I.1, PART A – ALPHANUMERIC CODES) This specification indicates that group 7 should be omitted here, not encoded as “7//////”. [When one or several parameters of a group are not available, the missing parameter(s) shall be coded with a set of solidi (/). If all parameters of a group are not available, the group shall be omitted from the report].

This departure from the WMO specification may (or may not!) be a factor in the erroneous decoding noted above. It is a departure from the WMO specification which might be anticipated, and OGIMET in fact does decode it correctly. NCDC would probably be well advised to accept this variation as well if it does not already do so, even though it does not conform to the specification, to avoid the risk of importing garbage as data, which may arise if the importing program expects strict conformance with the specification and does not anticipate possible variations.

In this case the data was flagged as implausible, being more than five standard deviation units below the mean of previous April values for Valentia, but it could well have been a plausible value instead if simply picked up as random data from computer memory due to an unforseen case not handled in computer code, and so could have been used as a valid value in further processing.

Regardless of whether other agencies are able to decode the variation from the WMO specification, following the specification to the letter by the person or program preparing the CLIMAT report would of course also be advisable to avoid the risk of incorrect decoding by others.

 

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

1 Response to GHCN data collection issues (from an Irish perspective)

  1. robinedwards36 says:

    This has prompted me to look at my GHCN data, downloaded in 2007 I think, which clearly misses recent events (or non-events). However, the historical record back to 1831 is interesting. What I find is that there are two very pronounced discontinuities in the data. There is a downward step at November 1876 of about 1.0 C and an upward step at February 1931, of 0.4 C . If you plot the cumulative sum of the monthly anomalies these discontinuities are starkly obvious. What I wonder is whether there is any meta data covering the times of these hypothesised discontinuities. Don’t know where to look, unfortunately. Apart from the gross discontinuities there seem to be modest upward trends judging from the cusum plot, though I’ve not looked at the appropriate regression yet.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.