Thursday, 9 February 2017

HadISD v2.0.1.2016p

We have just released HadISD version 2.0.1.2016p.  All plots and files should be on the website.  Between the release of v2.0.0.2015p in September, there have been no updates to years in the past.  The ISD raw data were downloaded on 19th January 2017 and processed over the following days.  

The station selection was re-run, and so the station list has updated, with now 7877 stations present in this version.  There have also been some minor changes to the quality control tests (affecting wind measurements) outlined below.  A file indicating which stations are new to HadISD and which are no longer included compared to v2.0.0 is available.

As a result of requests from users, in this version we have passed the wind speed observations through the spike check, and also the wind direction observations through the repeated values (streak) check.

The threshold values used to activate flagging in the spike check are calculated from the properties of the data themselves, using the distribution of differences between one observation and the next.

For the streak check, although the parameters are calculated using the distribution of repeated values, these are only used to flag values if they are less than the defaults used in HadISD versions 1.0.x.  We ensure that no calm periods are assessed when applying the streak check.  The default values depend on the resolution of the wind direction and are in the table below (see Table 4 in Dunn et al, 2012 for more information).

Wind Direction Streak Check
Resolution
(degrees)
Repeated Streak (h) Repeated Streak (d) Repeated Hours Repeated Days
90 120 28 28 10
45 96 28 28 10
22 72 21 21 7
10 48 14 14 7
1 24 7 14 5



However if you find something strange, do let us know using the contact details on the HadISD website.  Please note the stations which are known to have issues are documented on this blog and on the website.

The quality control code used in this version will be uploaded to the github repository in the coming days.

Friday, 16 September 2016

HadISD version 2.0.0.2015p

We submitted the HadISD version 2 paper to Climate of the Past in September 2015, with the dataset tentatively versioned as v2.0.0.2014p.  It went through the review process with a number of helpful comments being given by the two referees.  

Some of these comments were addressed to the way the stations were selected and to add more stringent tests during this part of our code.  We adjusted our code in light of these comments, but as we were then about to perform our standard update on HadISD version 1 (to 1.0.4.2015p), we waited until we were able to include data from 2015 into an updated version 2 as well before re-running our station selection code.

Another of the comments was that the paper would be best suited to a different journal, and a sister journal of Climate of the Past - Geoscientific Instrumentation, Methods and Data Systems (GI) - was suggested.

We re-ran the updated station selection code, all the quality control procedures, homogeneity assessment and comparison to HadISD.1.0.4.2015p to create HadISD.2.0.0.2015p.  We submitted our updated manuscript to GI on 17th March 2016.  

After a long time in discussions with the editorial team to explain why the manuscript was very similar to our Climate of the Past Discussion paper, the paper was published in GID on 12th September 2016.  The editorial team have decided that as this manuscript has already gone through one round of open peer review in an EGU journal, the discussion phase could be closed before the usual period had completed, on 13th September.  We are now in the process of submitting our final files to GI.

We have released HadISD.2.0.0.2015p on the HadOBS server at www.metoffice.gov.uk/hadobs/hadisd.  We expected further comments from review process at GI, and had intended to update the dataset in light of these, creating v2.0.0.2015f.  However this will now not happen, and the first release of HadISD2 will remain at v2.0.0.2015p.  We intend to perform annual updates on HadISD2 on similar timescales as for HadISD, and so expect the next update to be in January 2017 to create v2.0.1.2016p.

Please read the papers on HadISD version 2 before using this dataset and also get in touch if you have any concerns with the data so we can check and address them if necessary.

The major changes when going to version 2 from version 1 is the increase in the timespan covered by the dataset, back to 1931.  We have also refreshed the station selection criteria, and there are 7677 unique station IDs present in this version; but this will change on each new release.  A number of the QC tests have had minor tweaks, and we have also improved the level of QC applied to wind speed and direction observations.

Update - 29/09/2016

The paper describing HadISD.2.0.0.2015p has been published in Geoscientific Instrumentation, Methods and Data Systems.   As stated above, the data are available on the HadOBS server.

Monday, 1 August 2016

HadISD available at CEDA/BADC with updated licence

We have been working with the British Atmospheric Data Centre (BADC) over the past year or so to make some of the datasets currently available on the HadOBS website also available through the BADC Centre for Environmental Data Analysis (CEDA) data archive as part of the CLIPC project.

HadISD v1.0.3.2014f and v1.0.4.2015p are now available through that portal.  Not all the facets available in the netCDF files hosted by HadOBS are included in these data, but the climate variables and quality information are present.

We intend that CEDA becomes an alternate route for users to access our data and also be able use some of their download tools (e.g. via OpenDAP).

As part of this process we have updated the licence under which HadISD is released.  This also ensures that the data comply with the licence issued by NOAA/NCEI on the ISD dataset.  The new licence is as follows:

"HadISD is distributed under the Non Commercial Government Licence. The data are available for non-commercial use with attribution to the data providers. Please cite Dunn et al (2012). This product may contain data which are governed by WMO Policy following WMO Resolution 40 Annex 1 alongside additional data that may have restrictions placed on their commercial use by the data owners. Any redistribution of this product should be accompanied by a similar statement of usage policy."

Any users who have concerns about what HadISD can be used for are encouraged to get in touch to discuss their project.  

Friday, 22 July 2016

HadISD 1.0.4.2015f

We have just released HadISD version 1.0.4.2015f.  All plots and files should be on the website. This version  supersedes the preliminary version from earlier this year (v1.0.4.2015p).  There have been further updates to the ISD source data for the years 2007-10, 2012-13 and 2015 since the preliminary dataset was created in January, but no changes in other years. 

The raw data were downloaded on 17th June 2016, and processed over the last month  We have made no substantial changes to the codes which do the conversion to NetCDF files or the Quality Control suite, and so there is only the update in the version number is from preliminary to final.  

This version still contains 6103 stations, with 4050 passing the final filtering checks, down slightly from the 4060 in v1.0.3.2014f (see the HadISD paper Section 6).  The patterns of flagging are very similar to v1.0.3.2014f.  However if you find something strange, do let us know using the contact details on the HadISD website.  Please note the stations which are known to have issues are documented on this blog and on the website.



Fig.1 The fraction of all temperature records flagged for each station

Fig. 2 The fraction of all dewpoint temperature records flagged for each station


Fig. 3 The fraction of all sea-level pressure records flagged for each station






The Homogeneity information for this version is also available on the website using the same procedure (PHA) as outlined in Dunn et al, 2014.

As always, if you see anything untoward in the data or are having problems using it, please do not hesitate to get in touch.

Licencing

The licence under which HadISD is released has been changed to a Non Commercial Government Licence.  Full details are on the HadISD websitePlease ensure you read and follow the terms of this licence before using the data.

Monday, 25 January 2016

HadISD.1.0.4.2015p

We have just released HadISD version 1.0.4.2015p.  All plots and files should be on the website This update extends the coverage of the dataset to the end of 2015 (31 December at 2300 inclusive).  It remains a preliminary dataset as there could still be further updates to the ISD dataset in the next few months.  We hope to do a processing run for the final version some time around Easter (to create 1.0.4.2015f). 

The raw data were downloaded on 7th January 2015, and processed over the subsequent days.  There have been changes to all of the raw files in 2012-2014 as part of the normal ISD update process  We have made no substantial changes to the codes which do the conversion to NetCDF files or the Quality Control suite.  Hence the version number has only incremented by 0.0.1 and the year.  Any updates to these systems will be included in the future HadISD.2.0.0. 

This version still contains 6103 stations, with 4049 passing the final filtering checks, down slightly from the 4060 in v1.0.3.2014f (see the HadISD paper Section 6).  The patterns of flagging are very similar to v1.0.3.2014f.  However if you find something strange, do let us know using the contact details on the HadISD website.  Please note the stations which are known to have issues are documented on this blog and on the website.



Fig. 1 The fraction of all temperature records flagged for each station

Fig. 2 The fraction of all dewpoint temperature records flagged for each station
Fig. 3 The fraction of all sea-level pressure records flagged for each station

The Homogeneity information for this version is also available on the website using the same procedure (PHA) as outlined in Dunn et al, 2014.

As always, if you see anything untoward in the data or are having problems using it, please do not hesitate to get in touch.

Wednesday, 30 September 2015

HadISD version 2

The paper describing HadISD version 2.0.0 has just appeared in the Discussions section of Climate of the Past:

http://www.clim-past-discuss.net/11/4569/2015/cpd-11-4569-2015.html



There now follows 8 weeks of reviewing process. Two anonymous referees will be asked to make comments, which will appear online, and also anyone can make attributed comments (i.e. under their name) which will also appear.  After that we will have the chance to respond (these will also be published) and then the final paper will appear thereafter.  Once all that is done, then we can release the dataset and also the quality control code.  Hopefully this will all go through before the end of the year so that I can also run an update in January to v2.0.1.2015p.

For those of you who follow this blog, then a number of the sections in the paper will be familiar, however, the jist of the paper is expanding the time coverage of HadISD from 1973 back to 1931.  At the same time we've readdressed the way stations are selected and merged, and so v2.0.0.2014f has 8113 stations, with around 2000 of these being composite.

As part of the creation of HadISD.2.0.0, we have also re-written all code into Python for ease of use - and as such we were able to check and in some cases alter some of the QC tests to work a bit better.  We have also added new checks for wind speed and direction.

We believe that the result of these changes are that HadISD.2.0.0 is a more useful dataset for the study of extreme events, but also model validation, for ingestion into reanalyses and many other applications.

Update: January 2016

After some useful review comments from the referees and a discussion with the editorial team, it was suggested that we re-submit this paper to Geoscientific Instrumentation, Methods and Data Systems a partner journal of Climate of the Past.  As we are currently updating HadISD.1.0.4, we will do this once all the annual dataset updates are complete, and at the same time update HadISD.2.0.0 to include data from 2015.  We aim to resubmit this in early spring.

 

Wednesday, 29 April 2015

Neighbour (buddy) Check for v2.0.0

In HadISD v1.0.x, the neighbour check selects stations within 500m height and 300km distance of the target station.  The bearing is also used to assign stations to quadrants (90-degree bins), and the closest 10 are chosen, ensuring that each quadrant contains at least two stations.  When fewer neighbours are available the distribution of stations across the quadrants can be lop-sided.

For HadISD v2.0.0, I wanted to improve the station selection as, just because a neighbour is close, it may not be very useful when running the buddy checks.  So the new neighbour selection uses the correlation coefficient of the target and neighbour time series as well as the data overlap (very important in early years).  The details of both at this point are as follows.

Initially stations are selected on the basis of distance to ensure that the neighbours experience similar weather as the target.  Then, the correlation of the two timeseries is obtained.  However, so that the correlation is not dominated by the annual or diurnal cycle, the timeseries are processed to removed these.  Firstly daily means are calculated for all days which have more than 6 observations, which are used to create the climate anomalies for each observation.  To further remove the diurnal cycle, hourly means are calculated and used to create "anomalised climate anomalies" for each observation.  These time series are used to calculate the correlation coefficients.

The reason for using the data overlap as another criteria results from the lengthened data coverage of HadISD v2.0.0.  Few stations will have coverage over the entire 1931-2014 period, and so it would be highly likely that neighbours selected in terms of distance alone have no concurrent data.  I use the fraction of observations that are also present in the neighbour as the overlap value. 

The neighbours are then sorted by the linear combination of the correlation coefficient and the overlap fraction, and the top 10 are selected, again ensuring that there are at least two in each quadrant if possible.

In a perfect world (or at least one with infinite computing resources), I would select all stations within the 500m height--300km distance criteria and calculate the correlations and overlaps for all.  However, this takes a while (it probably could be faster, but at some level, lots of file-read operations have to occur) and it is important that this dataset can be quality controlled within a reasonable time frame.  Therefore, at the moment, only the nearest 20 stations are assessed for their correlation and overlap with the target.  

The process appears to take around the 5 minute mark per station on a ~2GHz processor - so 28 days of processing for 8000 stations if done just one.  I'm hoping to use many more than just one to do my bidding!