### Calculating Standard Deviation for Summarized Data

Tuesday, March 29th, 2011This isn’t a geospatial issue per say, but I thought it would be useful to share. I have a spreadsheet where I’m tracking course evaluation responses for the GIS workshops I’m teaching. I have to report the total number of responses, the mean, and standard deviation for each question. The worksheet I designed tracks aggregate responses; the total number of people who responded to each question in each category, on a scale of 5 (strongly agree) to 1 (strongly disagree). For example:

The problem I had was that Excel’s standard deviation formula doesn’t work for summaries – you need to give the formula individual responses or raw scores for arguments. In other words:

So I was fixated on trying to find a formula, through the help and by searching the forums, where somehow I could calculate standard deviation using summaries or aggregates. It finally dawned on me (duh) that I could plug in the standard deviation formula myself and modify it.

To calculate the standard deviation for an entire population you compute the difference of each data point from the mean and square each result. Then you calculate the average of all these values and take the square root.

So for each question I subtract the mean score for that question from the score category for that question, square it, and then multiply the result by the number of people who answered in that category. So if 10 people strongly agreed with the question and strongly agreed is associated with a score of 5, I subtract the average score (4.71) from 5, square the result, and multiply it by 10 (since ten people responded that they strongly agreed).

((score value – mean score)^2)*respondents

I perform the same operation for each category. So if 4 people said they agreed with a question and agreed is a value of 4, subtract 4.71 from 4, square it, and multiply by 4. After I do this this for each category, I sum the values for each one and take the square root of the whole thing.

SQRT ((((score5 – mean score)^2)*respondents)+(((score4 – mean score)^2)*respondents))

SQRT ((((5-4.71)^2)*10)+(((4-4.71)^2)*4))

For my spreadsheet the formula is repeated for each of the 5 possible scores, references are used to pull in the mean and respondent values from other cells, and I round the entire result to 1 decimal place. The number of parens makes it a little confusing; I’ve inserted a color-coded image below so it’s a little clearer.

=ROUND((SQRT(((((5-H9)^2)*B9)+(((4-H9)^2)*C9)+(((3-H9)^2)*D9)+(((2-H9)^2)*E9)+(((1-H9)^2)*F9))/G9)),1)

Given all that can go wrong with one misplaced parens, I tested this by inputting some raw scores by hand and running the STDEVPA formula to verify that I get the same result.