The initial GISTEMP steps are described in gistemp.txt in the new GISTEMP (v3) source:
Step 0 : Merging of sources (do_comb_step0.sh) --------------------------- In GHCN v3, reports from several sources were merged into a single history. Discontinuities created by that merging procedure were eliminated by the provider in the adjusted data set. The data are reformatted to match the format of GHCN v2 to avoid changes in the subsequent steps. SCAR contains single source reports but in different formats/units and with different or no identification numbers. We extended the WMO number if it existed or created a new ID if it did not (2 cases). SCAR stations are treated as new sources. Adding SCAR data to GHCN: The tables were reformatted and the data rescaled to fit the GHCN format; the new stations were added to the inventory file. The site temperature.html has not been updated for several years; we found and corrected a few typos in that file. (Any SCAR data marked "preliminary" are skipped) Filling in missing data for Hohenpeissenberg: This is a version of a GHCN report with missing data filled in, so it is used to fill the gaps of the corresponding GHCN series. Result: v3.mean_comb Step 1 : Elimination of dubious records (do_comb_step1.sh) ---------------------------------------------------------- Data and station information are combined in a data base. Some unphysical looking segments of data records were eliminated after manual inspection of unusual looking annual mean graphs and comparing them to the corresponding graphs of all neighboring stations. The data are converted back to a text version. Result: Ts.txt
The corresponding description of these steps in gistemp.txt for GISTEMP (v2) is shown below. Comparison of these descriptions shows that Step 1 has been simplified by removing that part of the process which combined the various sources at a single location into one record. Step 1 now simply eliminates dubious records. Now GHCN v3 adjusted data is used instead in Step 0. In this GHCN v3 adjusted data reports from several sources have already been combined into a single history and discontinuities removed by NOAA. Step 0 has also been simplified by elimination of the process of replacing USHCN-unmodified data in the GHCN data by USHCN-corrected data.
Step 0 : Merging of sources (do_comb_step0.sh) --------------------------- GHCN contains reports from several sources, so there often are multiple records for the same location. Occasionally, a single record was divided up by NOAA into several pieces, e.g. if suspicious discontinuities were discovered. USHCN and SCAR contain single source reports but in different formats/units and with different or no identification numbers. For USHCN, the table "ushcn2.tbl" gives a translation key, for SCAR we extended the WMO number if it existed or created a new ID if it did not (2 cases). SCAR stations are treated as new sources. Adding SCAR data to GHCN: The tables were reformatted and the data rescaled to fit the GHCN format; the new stations were added to the inventory file. The site temperature.html has not been updated for several years; we found and corrected a few typos in that file. (Any SCAR data marked "preliminary" are skipped) Replacing USHCN-unmodified by USHCN-corrected data: The reports were converted from F to C and reformatted; data marked as being filled in using interpolation methods were removed. USHCN-IDs were replaced by the corresponding GHCN-ID. The latest common 10 years for each station were used to compare corrected and uncorrected data. The offset so obtained was subtracted from the corrected USHCN reports to match any new incoming GHCN reports for that station (GHCN reports are updated monthly; in the past, USHCN data used to lag by 1-5 years). Filling in missing data for Hohenpeissenberg: This is a version of a GHCN report with missing data filled in, so it is used to fill the gaps of the corresponding GHCN series. Result: v2.mean_comb Step 1 : Simplifications, elimination of dubious records, 2 adjustments (do_comb_step1.sh) -------------------------------------------------------------------- The various sources at a single location are combined into one record, if possible, using a version of the reference station method. The adjustments are determined in this case using series of estimated annual means. Non-overlapping records are viewed as a single record, unless this would result introducing a discontinuity; in the documented case of St.Helena the discontinuity is eliminated by adding 1C to the early part. After noticing an unusual warming trend in Hawaii, closer investigation showed its origin to be in the Lihue record; it had a discontinuity around 1950 not present in any neighboring station. Based on those data, we added 0.8C to the part before the discontinuity. Some unphysical looking segments were eliminated after manual inspection of unusual looking annual mean graphs and comparing them to the corresponding graphs of all neighboring stations. Result: Ts.txt
My impressions of the merits of this change are so far based on a very limited sample. Apart from including some Reykjavik plots in GISTEMP and GHCN v3 – two stations illustrated I have only looked in more detail at Dublin Airport, the other station plotted in that post, where I also noted a substantial change in the adjusted GISTEMP output temperature series between data downloaded from GISS on March 12th when compared to data downloaded from GISS on March 6th, discussed further in GISTEMP and GHCN v3 – Dublin Airport downloads in March 2012. I have now examined the GHCN input for the Irish and UK stations used by GISTEMP to adjust Dublin Airport, and found that two of these stations, 62103970000 Claremorris (data from 1950 to 1990) and 65103072001 Braemar (data from pre-1880 to 1969) show a substantial change for some years, 1950 to 1958 changed by 0.46 degrees in the case of Claremorris, 1856 to 1886 changed by 0.43 degrees in the case of Braemar. Only one of the remaining Irish and UK stations, which were not involved in the adjustment of Dublin Airport, showed a change approaching this magnitude, of 0.3 degrees for some years.
Whether these two stations alone are sufficient to account for the change in GISTEMP output for Dublin Airport is a question I do not intend to address until I have implemented the change to use GHCN v3 in my own GISTEMP implementation, something I will probably not have time to do until late April, but these substantial changes in the GHCN v3 adjusted data (qca file) for historic data for two stations which have had no new data for more than twenty years, particularly in the absence of similar changes at nearby stations, lead me to suspect a coding error for GHCN. Unfortunately, while I found two relevant GHCN v3 adjusted data (qca) files to download at the GISS FTP site, only one of these had the corresponding unadjusted (qcu) GHCN v3 file available for download, so I have not been able to compare these GHCN input files. I have however been able to spot something in one qca file for Claremorris (at 1958!) which if reproduced in the corresponding qcu file, and not handled correctly, may be the cause of these differences. I will bring this to the attention of the GHCN group at NOAA. Update: I’ve found an explanation on closer examination before e-mailing NOAA for the apparent file difference for Claremorris between the two qca files. Even though it showed in the file differencing program between 1958 and 1959, at the very point where the differences returned to zero, it was in fact an artifact of the way the file differencing program displayed the differences. My suspicion had been that there might have been an extra non-printing character at this point in one of the two files, but a hex dump of this section of the two files showed no such extra character present. For the qca and qcu files I had downloaded and retained, the changed values for this station only occurred for ghcnm.tavg.v184.108.40.20620217.qca.dat (for which I do not have a corresponding qcu file). It looks as if I need to continue downloading pairs of corresponding qca and qcu files to watch for any repetition of this jump in the early values to try to find an explanation (unless someone has ghcnm.tavg.v220.127.116.1120217.qcu.dat available to send me!).
I might add here that Nick Stokes at Moyhu: Reykjavik and GHCN adjustments is not quite accurate in the following text:
No, history was not rewritten. What the folks there don’t seem to want to acknowledge is that GHCN circulates two files, described here. The file everyone there wants to focus on is the adjusted file (QCA). This, as explained, has been homogenized. This is a preparatory step for its use in compiling a global index. It tries to put all stations on the same basis, and also adjust them, if necessary, to be representative of the region. It is not an attempt to modify the historical record.
That record is contained on the other data file distributed – the unadjusted QCU file. This contains records as they were reported initially. It is generally free of any climatological adjustments.
The unadjusted QCU file is not in fact the historical record. See the GHCN v3 README file
2.2.2 STATIONS WITH MULTIPLE TIME SERIES The GHCNM v2 contained several thousand stations that had multiple time series of monthly mean temperature data. The 12th digit of each data record, indicated the time series number, and thus there was a potential maximum of 10 time series (e.g. 0 through 9). These same stations in v3 have undergone a merge process, to reduce the station time series to one single series, based upon these original and at most 10 time series. A simple algorithm was applied to perform the merge. The algorithm consisted of first finding the length (based upon number of non missing observations) for each of the time series and then combining all of the series into one based upon a priority scheme that would "write" data to the series for the longest series last. Therefore, if station A, had 3 time series of TAVG data, as follows: 1900 to 1978 (79 years of data) [series 1] 1950 to 1985 (36 years of data) [series 2] 1990 to 2007 (18 years of data) [series 3] The final series would consist of: 1900 to 1978 [series 1] 1979 to 1985 [series 2] 1990 to 2007 [series 3] The original series number in GHCNM v2, is retained in the GHCNM v3 data source flag. One caveat to this merge process, is that in the final GHCNM v3 processing there is still a master level construction process performed daily, where the entire dataset is construction according to a source order overwrite hiearchy (section 2.3), and it is possible that higher order data sources may be interspersed within the 3 series listed above.
so the QCU file, with only a single series for each station, provides the historical record after this combination of multiple time series.
Intuition, not always a reliable guide however, suggests to me that this multiple time series combination and the Menne and Williams approach to homogenization are preferable to the previous GISTEMP approach of combining various sources at a single location one record, if possible, using a version of the reference station method. GHCN v3 for example picks up the Dublin Airport station relocation in May 1994. I have also raised some questions concerning the use of the reference station method in this context with GISTEMP as far back as May 2009, although this is not an aspect of GISTEMP I have investigated in detail since. For “historic interest” I’ll add a post based on that May 2009 e-mail to GISS shortly (I have a draft post on the topic from May 2010, never published, but I need to refresh my mind on the details before completing it)