Sunday, February 10, 2008

Micro Statistics Tutorial 06: Histograms - The Vytorin data so far

Drawing histograms is a useful skill. Choosing the correct number of bars (size of interval) is important. Too few yield an uninformative plot. Too many will be difficult to read. This shows the number of studies designed to prove any clinical benefit of the drug Vytorin (zetia ezetimibe/simvastatin-combination) and the outcome.

Figure 1: The data so far

On a serious note: It is always important to visualize data before doing complicated statistical tests.
PLOT THE BLOODY DATA. That is the most important rule of statistics (and of honest reporting of findings).

To draw a histogram for continuous data, you need to group measurement into discrete intervals. There is some "best" number of intervals that maximizes visual information. Several scientific papers have been written on the optimum number of intervals (K) given the number of data points (N). The Sturgis formula (1) states:

K = 1 + 3.322 log10(N)
  1. Sturgis, H.A., “The Choice of a Class Interval”, Journal of the American Statistical Association, Vol. 21, pp.65-66, March, 1926.

See here for Collated Micro-Statistics Tutorials

Earlier|Later|Main Page

No comments: