Click here to see the SAS code.
Click here to see the example.
---------------------------------------------------------------
Free NWS data found here:
http://cdo.ncdc.noaa.gov/cgi-bin/climatenormals/climatenormals.pl?directive=prod_select2&prodtype=CLIM2002&subrnum=
http://cdo.ncdc.noaa.gov/climatenormals/clim20-02/NWS_SNOW_MNFALL_fmt.dat
http://cdo.ncdc.noaa.gov/climatenormals/clim20-02/NWS_SNOW_MNFALL_dly.dat
-----
After doing several plots of the data, I noticed an oddity/problem with the data,
and sent the following email to NOAA:
-----------------------------------------------------------------------------------
From: Robert Allison
Sent: Friday, December 18, 2009 11:59 AM
To: ncdc.info@noaa.gov
Cc: Robert Allison
Subject: Your 30-year Snowfall data has a bad problem...
Importance: High
NCDC/NOAA, Please forward this to the folks in charge of the “snowfall” data…
Summary: It appears your 30-year daily average snowfall data for the individual weather stations has a
very bad corruption/bias towards the beginning/end of the month. See the 2 attached graphs, and
see details below…
-----
I was viewing your Snowfall data on the following page:
http://cdo.ncdc.noaa.gov/cgi-
bin/climatenormals/climatenormals.pl?directive=prod_select2&prodtype=CLIM2002&subrnum=
And specifically these tables of data:
http://cdo.ncdc.noaa.gov/climatenormals/clim20-02/NWS_SNOW_MNFALL_fmt.dat
I noticed that for many of the weather stations, there were “mysteriously” high snowfall amounts at
the beginning/end of the month, when there was little/no snowfall for several days before/after –
this seemed very odd & unlikely.
I imported the raw data (http://cdo.ncdc.noaa.gov/climatenormals/clim20-
02/NWS_SNOW_MNFALL_dly.dat ) into our SAS software, and did some various plots of the data, and
the plots vividly show that there is a serious spike/bias toward having values at the beginning/end of
the month (and occasionally on the 15th/middle of the month), where there are no (or much lower)
snow values for the days before/after. This is most evident for the locations with somewhat sparse
snow, therefore my plots show the data for locations where the maximum snowfall is less than
2/10ths of an inch.
(see 2 attached plots)
The following file (http://cdo.ncdc.noaa.gov/climatenormals/clim20-02/normalsnwssnow.pdf ) has a
“computational methodology” section which indicates that …
“Daily snowfall and snow depth values are not simple means of the observed daily values.
They are interpolated from the much less variable monthly normals by use of the natural
spline function (Greville, 1967). The procedure involved constructing a cumulative
series of monthly sums from the monthly normals. The cumulative series was for a 24-
month period (July, August, …, December, January, …, December, January, …, June), so
that the interpolating function could adequately fit the end points in the annual series.”
I suspect this technique is not a good one to use on this data, or the technique was incorrectly applied
to the data(?) Otherwise, perhaps the raw data is biased/corrupted(?)
Whatever the underlying cause of the corruption, the end result is that it makes it look like (for
example) there is snow on March 1st and April 1st, when there is no snow for the weeks prior & after
those dates. I would recommend re-running this data, and using a technique that does not bias the
values towards the beginning of the month like this (imho, a simple daily mean would be more useful
that these biased summaries).
------------------------------------------------------------------------------------
I received the following reply:
From: Tom.Whitehurst [Tom.Whitehurst@noaa.gov]
Sent: Wednesday, January 20, 2010 1:47 PM
To: Robert Allison
Subject: Snow Climatology Concerns
The information below came from the product developer and confirms your
suspicions about daily snowfall problems for stations that have very
little snow. We will be posting a notice on our snowfall normals web
access page alerting customers to the problems in the data.
Thank you for bringing this to our attention.
Tom Whitehurst
The issue is driven by a subroutine that returns daily snowfall values for
months with totals that are less than the number of days in a month times 0.1"
(e.g., January monthly totals less than 3.1").
For such months, the previous and next month's monthly totals are evaluated.
If the previous month's total is greater than the next month's total, then the
daily values are distributed at the beginning of the month (and vice-versa).
If the previous and next month's values are the same, then the daily values
are distributed in the middle of the month.
Case 1: Huntsville, AL, February
January: 1.3" and March: 0.4" (previous greater than next, distribution at
beginning of month):
DAY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 MONTH
------------------------------------------------------------------------------
-------------------
FEB 1 1 1 1 1 1 1 T T T T T T T T T T T T T T T
T T T T T T 7
Case 2: Tuscalosa, AL, November
October: 0" and December: 0.2" (previous less than next, distribution at end
of month):
DAY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 MONTH
------------------------------------------------------------------------------
-------------------
NOV 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 T 1 1
Case 3: Montgomery, AL, February
January: 0.2" and March: 0.2" (previous greater than next, distribution in
middle of month):
Mean Snowfall (tenths of an inch, T =
Trace) DAY 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 MONTH
------------------------------------------------------------------------------
-------------------
FEB T 0 0 0 0 0 0 0 0 0 0 0 0 T 1 T 0 0 0 0 0 0
0 0 0 0 0 T 1
Obviously, there are some real deficiencies to this approach, and it should be
identified as problematic to the public in the context of the arbitrary nature
of daily snowfall values generated from a spline fit to begin with.
-----
Back to Samples Index