Thursday, 22 March 2018

Precipitation in HadISD

We have included precipitation accumulations in HadISD since its launch (netCDF field “precip1_depth”), as this information is used as part of the quality control suite to check for high humidity periods in the dewpoint depression check.  We wanted to make the HadISD fully traceable, so that users could check our quality control decisions for themselves, should they wish to.  These precipitation accumulations are not quality controlled and so we have urged users to take care when using these data in their analyses.  

In the ISD data format there are four possible entries for the precipitation data.  These are indicated by character code “AA1” to “AA4”.  Each of these has period, depth, condition and quality entries.  To assist in the quality control of the dewpoint temperature fields, we extracted the first of these four precipitation fields.  The netCDF names we assigned these variables are “precip1_depth” and “precip1_period”, as these were from the first ISD precipitation field.   

Recently, Kimberly Channell of the Great Lakes Integrated Sciences and Assessments at the University of Michigan highlighted a confusion with the description of the precip1_depth field in the netCDF files.  The metadata for versions up to and including v2.0.2.2017p states “Depth of Precipitation Reported over time period”. This, combined with the hourly time stamps, could easily result in an assumption that the precip1_depth field only contains hourly accumulations.  Furthermore, our naming of the netCDF variable inadvertently supports this interpretation, that the “1” in the “precip1_depth” suggests hourly accumulation values.  Unfortunately, neither of these are the case.  


The accumulation period for the precip1_depth is given by the precip1_period.  Even if there are timestamps every hour, the accumulation period may be a mix of time periods (from hourly to daily). We now appreciate that the metadata for these two variables could have been clearer, and that our chosen naming could be confusing, especially without knowledge of the ISD naming conventions.  We apologise to users if these issues have caused problems with their analyses.  To properly use the precipitation information, the depth information should be combined with the period.

Within one of the ISD precipitation fields, it is possible to have a number of accumulation periods, rather than just a single one across the entire length of the station record.  The ISD is itself made up of a number of underlying databases, drawing observations from across a variety of observation networks (e.g. SYNOP, METAR, GTS).  Each of these may have a different accumulation period, and also conventions as to the time of observation of e.g. 24 hour accumulations (and to which day these are assigned).  These have been combined together to form threaded records for single station locations where possible during both the ISD and HadISD development.

It may be that a station report type (e.g. GTS) was the primary source in the early period (e.g. filling AA1), but that for a later period, a different source with a different standard accumulation period has a higher priority in a merging process, and so supersedes this.  Therefore, at observation times where both sources have data, this could move the hourly accumulations down into the later precipitation fields (AA2-4), resulting in the interleaving of the different accumulation periods in the first entry.  The example station in Figure 1 exhibits behaviour consistent with this.  It is possible that further hourly accumulation values are


present in AA2-4 of the ISD file for this station, but as we have not extracted those, they are not available to users of the HadISD at this time.
 
 
Figure 1 (top) precip1_depth, and (bottom) precip1_period for 724380-93819 (Indianapolis Airport).  This station has been merged from two ISD stations (99999-93819 and 724380-93819).  As the precipitation information is not quality controlled, likely erroneous observations like the ~160mm in the late 1970s are still present in the data files (using HadISD v2.0.2.2017p).

Therefore, for any given station in the HadISD, it is very likely that the period over which the precipitation depth has been accumulated is not constant over the entire record.  However, there still may be valid precip1_depth measurements present at each hourly timestamp, but these may be a combination of hourly1, but also 3, 6, 12, and 24 hourly measurements.  We advise users wishing to take advantage of the precipitation information to make plots like those in Figure 1 to check for themselves what data have been included.

In light of this possible confusion from our netCDF variable names, we have/will take a number of actions:

1)    Added notes to the HadISD webpages to clarify our naming scheme and inform users about the need to use both the “precip1_depth” and “precip1_period” fields.  We’ve also improved the metadata for these two variables on the webpages too.
2)    Improve the metadata of the “precip1_depth” and “precip1_period” fields in the netCDF files in the next update (v2.0.2.2017f).
3)    In the longer term, extract all four ISD precipitation fields where available, and attempt to disaggregate into 1, 3 , 6, 12, and 24 hourly accumulation fields within the netCDF files.  However, it is unlikely that we will be doing any quality control on these data and so we will still advise caution when using these. 


We note again that the precipitation information in HadISD is not quality controlled at the moment. 


Please do get in touch if you would like more information.

Tuesday, 23 January 2018

HadISD v2.0.2.2017p

We have just released version 2.0.2.2017p of HadISD on the Hadobs website.  The data now cover 1931/1/1 to 2017/12/31.

Downloading the data from the ISD finished on Monday 15th January and the quality control and other processes ran over the following days.

There are 8103 stations in this version of HadISD, a full 2000 more than in HadISD version 1.0.x.  However, there have been no changes to the quality control tests over v2.01.2016f.

As always, if you notice anything untoward in the dataset please do get in touch.  We intend to run a final version in a few months time if there have been changes to the ISD data in 2017 or earlier years in the intervening time.

We hope to move to monthly updates during 2018, which entail some minor changes to the QC code, but which should not impact the annual update methods.  We will post on here in due course when this project is nearing completion.

Tuesday, 22 August 2017

Digitisation and reporting resolution

A couple of years ago James Goldie (UNSW) contacted me about an issue he found in HadISD relating to the reporting resolution of temperature and humidity information for stations in Australia.

In the HadISD, the data vary between single-degree, half-degree and 1/10th degree resolution.  However, variations between these can cause some interesting striations in derived quantities.

James has written up his work, with some cool animated plots at his blog

Wednesday, 7 June 2017

High windspeed values

Thanks to Phil Jones (UEA) and colleagues for pointing out this issue.

There are a number of stations which have wind values of 88 m/s which also stands out as a repeating value (see Figure 1). 

Fig 1. Station 151080-99999 (Ceahlau Toaca, 46.983N, 25.950E, 1898.0m) showing the wind speeds and inhomogeneities (vertical lines).  The cluster of high values between 1991 and 2001 is clear (v2.0.1.2016f).
These may be the result of a mistyped missing data code in the original data.  It is also clear that this station may have rounding or conversion problems - we have not had the chance to investigate in detail so far.

The maximum wind speed used for the record check is 113.3m/s (derived from a maximum gust speed - https://wmo.asu.edu/content/world-maximum-surface-wind-gust), so this would not exclude these values.  The wind speeds are not passed through the distributional or frequent value checks as the shape of the distribution is not gaussian and to this point, these tests have been written assuming this shape.  Nor is the spike check applied.  Therefore, unfortunately, our QC suite is not (yet) clever enough at identifying these erroneous values.

At the current time we do not have a solution to these issues - we would rather make folks aware than try and implement a "quick fix" which causes issues elsewhere.  We will look into this during the course of this year and hope to roll out improvements to the wind QC in the next update.

The stations which have been noted as affected by repeated high values are:
151080-99999
156150-99999
156270-99999
228370-99999






Though others are noted to have one or a few high values.


Please do not hesitate to get in touch if you do spot any issues or would like more information on these.

Thursday, 9 February 2017

HadISD v2.0.1.2016p

We have just released HadISD version 2.0.1.2016p.  All plots and files should be on the website.  Between the release of v2.0.0.2015p in September, there have been no updates to years in the past.  The ISD raw data were downloaded on 19th January 2017 and processed over the following days.  

The station selection was re-run, and so the station list has updated, with now 7877 stations present in this version.  There have also been some minor changes to the quality control tests (affecting wind measurements) outlined below.  A file indicating which stations are new to HadISD and which are no longer included compared to v2.0.0 is available.

As a result of requests from users, in this version we have passed the wind speed observations through the spike check, and also the wind direction observations through the repeated values (streak) check.

The threshold values used to activate flagging in the spike check are calculated from the properties of the data themselves, using the distribution of differences between one observation and the next.

For the streak check, although the parameters are calculated using the distribution of repeated values, these are only used to flag values if they are less than the defaults used in HadISD versions 1.0.x.  We ensure that no calm periods are assessed when applying the streak check.  The default values depend on the resolution of the wind direction and are in the table below (see Table 4 in Dunn et al, 2012 for more information).

Wind Direction Streak Check
Resolution
(degrees)
Repeated Streak (h) Repeated Streak (d) Repeated Hours Repeated Days
90 120 28 28 10
45 96 28 28 10
22 72 21 21 7
10 48 14 14 7
1 24 7 14 5



However if you find something strange, do let us know using the contact details on the HadISD website.  Please note the stations which are known to have issues are documented on this blog and on the website.

The quality control code used in this version will be uploaded to the github repository in the coming days.

Friday, 16 September 2016

HadISD version 2.0.0.2015p

We submitted the HadISD version 2 paper to Climate of the Past in September 2015, with the dataset tentatively versioned as v2.0.0.2014p.  It went through the review process with a number of helpful comments being given by the two referees.  

Some of these comments were addressed to the way the stations were selected and to add more stringent tests during this part of our code.  We adjusted our code in light of these comments, but as we were then about to perform our standard update on HadISD version 1 (to 1.0.4.2015p), we waited until we were able to include data from 2015 into an updated version 2 as well before re-running our station selection code.

Another of the comments was that the paper would be best suited to a different journal, and a sister journal of Climate of the Past - Geoscientific Instrumentation, Methods and Data Systems (GI) - was suggested.

We re-ran the updated station selection code, all the quality control procedures, homogeneity assessment and comparison to HadISD.1.0.4.2015p to create HadISD.2.0.0.2015p.  We submitted our updated manuscript to GI on 17th March 2016.  

After a long time in discussions with the editorial team to explain why the manuscript was very similar to our Climate of the Past Discussion paper, the paper was published in GID on 12th September 2016.  The editorial team have decided that as this manuscript has already gone through one round of open peer review in an EGU journal, the discussion phase could be closed before the usual period had completed, on 13th September.  We are now in the process of submitting our final files to GI.

We have released HadISD.2.0.0.2015p on the HadOBS server at www.metoffice.gov.uk/hadobs/hadisd.  We expected further comments from review process at GI, and had intended to update the dataset in light of these, creating v2.0.0.2015f.  However this will now not happen, and the first release of HadISD2 will remain at v2.0.0.2015p.  We intend to perform annual updates on HadISD2 on similar timescales as for HadISD, and so expect the next update to be in January 2017 to create v2.0.1.2016p.

Please read the papers on HadISD version 2 before using this dataset and also get in touch if you have any concerns with the data so we can check and address them if necessary.

The major changes when going to version 2 from version 1 is the increase in the timespan covered by the dataset, back to 1931.  We have also refreshed the station selection criteria, and there are 7677 unique station IDs present in this version; but this will change on each new release.  A number of the QC tests have had minor tweaks, and we have also improved the level of QC applied to wind speed and direction observations.

Update - 29/09/2016

The paper describing HadISD.2.0.0.2015p has been published in Geoscientific Instrumentation, Methods and Data Systems.   As stated above, the data are available on the HadOBS server.

Monday, 1 August 2016

HadISD available at CEDA/BADC with updated licence

We have been working with the British Atmospheric Data Centre (BADC) over the past year or so to make some of the datasets currently available on the HadOBS website also available through the BADC Centre for Environmental Data Analysis (CEDA) data archive as part of the CLIPC project.

HadISD v1.0.3.2014f and v1.0.4.2015p are now available through that portal.  Not all the facets available in the netCDF files hosted by HadOBS are included in these data, but the climate variables and quality information are present.

We intend that CEDA becomes an alternate route for users to access our data and also be able use some of their download tools (e.g. via OpenDAP).

As part of this process we have updated the licence under which HadISD is released.  This also ensures that the data comply with the licence issued by NOAA/NCEI on the ISD dataset.  The new licence is as follows:

"HadISD is distributed under the Non Commercial Government Licence. The data are available for non-commercial use with attribution to the data providers. Please cite Dunn et al (2012). This product may contain data which are governed by WMO Policy following WMO Resolution 40 Annex 1 alongside additional data that may have restrictions placed on their commercial use by the data owners. Any redistribution of this product should be accompanied by a similar statement of usage policy."

Any users who have concerns about what HadISD can be used for are encouraged to get in touch to discuss their project.