Historical Temperature and Precipitation Data for Phildelphia, PA
These data are taken from the Franklin Institute's weather
website. This site states that the Franklin Institute "has been an observing site for the National
Weather Service since 1993." However, their weather station is currently located on the roof of their
building in downtown Philadelphia – a very poor choice. There are overlapping data
files starting in 1993. This application uses the historical file
HERE, through 1999, to which are appended the
yearly files HERE, starting in 2000.
The data source(s) for the
historical data file are not given. The "official" Philadelphia weather site has
moved several times since the 1870's. It was moved to Philadelphia International Airport, another
poor choice for climate research purposes, in 1948. The lack of documentation about sources is
a problem for using these data in climate research – a problem cetainly not unique to Philadelphia.
Some formatting
differences in the Franklin Institute files have been resolved in the file used here.
Inches are used for snow and rain and temperatures are in °F; these the standard reporting
units for US weather stations even though conversions to metric units and °C are sometimes
made in datasets used for scientific purposes. Trace amounts of precipitation (rain
and snow) are indicated by a value of -1 rather than T or trace. Prior to 2002, the Franklin Institute files
expressed rain in hundredths of an inch (that is,
a value of 72 means 0.72") and snow values in tenths of an inch (10 means 1" of snow). In the file used here,
precipitation and snow (including accumulated snow starting in 2002) are always given in inches.
Missing values are always given as -999. All years contain 366 days, with -999
entered for February 29
in non-leap years. Snowfall reporting begins in October 1884. Snow on the ground reporting begins in 2002;
these data often seem inconsistent and they are not used in this application.
The format is:
month day year max_T min_T rain(") snow(") snow_depth(")
1 1 1872 -999 -999 -999 -999 -999
1 2 1872 -999 -999 -999 -999 -999
...
12 30 2012 35 32 0.22 0.4 0.4
12 31 2012 38 30 0 0 0
where the values are separated by spaces or tabs. The only data in 1872 are precipitation starting April 1.
Note that data from other sites could be accessed with this application (although the
earliest year with data will probably be different). The 8th column (snow depth) isn't used, so it doesn't
have to be included in the file. Without making changes to the code, the data file must be named
philtemp.txt (case-sensitive).
The dataset as downloaded from the Franklin Institute website contained only a handful of days with missing data. Because
the source of the data is not known, it is also not known whether missing data have sometimes been replaced with estimated
values. But, for the very few remaining missing days, temperature values have been calculated as the average of the temperatures from
the day before and after. There were only two days on which it seemed likely that missing
precipitation values might have been
non-zero, and these values have simply been estimated based on the day before and after. No indication about when missing
data have been replaced with estimated
values is given in these data files.
In summary, the Franklin Institute data do not include the quality control flags that are
are required for data intended for scientific research purposes. However,
whatever problems
might be raised by estimating a very few missing temperature and precipitation values
seem more than adequately offset by the result of having a continuous record as needed for calculating
cumulative values such as total precipitation and heating/cooling degree days.
Heating and cooling degree days (HDD and CDD) are calculated in the following way:
HDD = MAX[Tbase – (Tmax + Tmin)/2, 0]
CDD = MAX[(Tmax + Tmin)/2 – Tbase, 0]
That is, when the daily average is above the base temperature, it is a cooling day (CDD). When the daily average is
below the base temperature, it is a heating day (HDD). There are no negative values for HDD or CDD. Using the average of the daily
maximum and minimum temperature is a commonly used approximation for the daily average temperature.
Typically, a base temperature of 65°F is used. A calculation
for growing degree days, related to cooling degree days, is often done for agricultural purposes – predicting
time to crop maturity and timing the application of pesticides, for example. A base temperature of
50°F is often used for agricultural regions in temperate climates. These cumulative values are excellent single-value
indicators of year-to-year changes in temperature – possibly more useful than annual average temperature.
All things considered, these data from Philadelphia are interesting and useful
for examining trends because of
the length of the record,
but they should be used with great caution in climate research. There are, however,
several opportunities for inquiry projects. All of these projects will help develop skills in these areas:
- Mathematics
- Trend analysis, statistics, mathematical modeling
- Spreadsheets
- Data analysis, applying spreadsheet formulas, using appropriate graphics to display data
- Communications
- Writing clear experiment definitions and inquiry protocols, writing concisely and staying "on topic," supporting arguments based on evidence, developing oral presentation skills
- Computer progreamming
- Although many interesting analyses can be done within spreadsheets, others may need computer
programs to read and process the data
In no particular order, here are some suggested inquiry-based questions, with brief explanatory notes:
- Are there trends in any of the parameters available from these data, including the calculated values for heating
and cooling degree days?
- The basic goal in examaining historical weather data is to determine
whether there are long-term trends that demonstrate changes in climate rather than fluctuations in weather.
The challenge in finding climate trends is that those trends will be small compared to year-to-year
variability. It is always possible to "do the math" for trendlines, but the results are subject to interpretation. Based
on the assumption that a linear regression is appropriate, the correlation coefficient r2 may be
significantly less than 1.
- If there appear to be trends in temperature or precipitation, is it possible to manipulate the interpretation of the data based on how the trends
are modeled mathematically or on what time intervals are used?
-
Linear regressions applied to the entire historical record may over- or underestimate trends. Using an exponential model
rather than a linear one will result in larger changes in future predicted values. Changing the environment
around a site may initiate
a temperature trend. Trends can be made to appear larger or
smaller by changing the time period examined.
- Is the length of the growing season changing?
- Using a base temperature of 50°F, the start of the growing season is marked by the appearance of cooling
(growing) degree days
above 0 and the end is marked by the appearance of heating degree days above 0. Sometimes, the beginning and end of the
growing season is limited by calendar dates, too, so as to discount the possibly spurious occurrence of positive CDD and HDD values
early and late in the year. For example, CDD calculations in northern temperature climates might be
started on March 1 and ended on October 31. It might be better to count the start and end of the growing season from when the CDD and
HDD values exceeds some small number, perhaps 10, to avoid "weather noise" at the beginning and end of the season.
- Is the distribution of precipitation during the year changing?
-
- Perhaps the timing of precipitation during the year changes even if there are no year-to-year trends. How many days per year
have precipitation? (Do you want to count days with trace precipitation") How many days per year have precipitation greater
than some specified amount? (Maybe the number of heavy rainfall events is changing.)
- Is there "digit bias" in reported temperatures values?
- For records that pre-date automated reporting of temperature values,
it is entirely possible that there is some "digit bias"
in the reporting of values. For example, if the scale on a thermometer displays even digits (70, 72, 74, etc.) it may be
that reported values are rounded to the nearest even degree. In the absence of this reporting bias,
for large numbers of temperature reports the distribution
of the digits 0-9 in the
"ones" position of the temperature value should be uniform – that is, all digits are equally likely. It is not
difficult to write a computer program to read the data file and count the occurrence of the digits 0-9 in the "ones" position
for every reported temperature value. Use a chi-square test to analyze results.
Answering these inquiry questions for the Philadelphia is an excellent way to start thinking about analyzing these kinds of data.
If you ask the same kinds of questions for many data sites, from the US Historical Climatology Network or the US Climate
Reference Network, then what starts here as inquiry can progress into authentic research about temperature and climate patterns
across the US.
The application for analyzing Philadelphia temperature and precipitation data is written in HTML and PHP.
It must be run from an online server which supports
PHP scripts or a local server on your computer
(WAMP for Windows,
MAMP for Mac).