This
software is intended to be useful
in planning statistical studies. It is not intended to be
used
for analysis of data that have already been collected.
Each selection provides a graphical
interface for studying the power of one or more tests. They
include sliders
(convertible to number-entry fields) for varying parameters, and a
simple
provision for graphing one variable against another.
Each dialog window also
offers a Help
menu. Please read the Help menus before
contacting me with
questions.
The "Balanced ANOVA" selection
provides another dialog with a list of several popular experimental
designs, plus
a provision for specifying your own model.
Note:
The dialogs open in separate
windows. If you're
running this on an Apple Macintosh, the applets' menus are added to the
screen menubar -- so, for example,
you'll
have
two "Help" menus there!
You
may also downloadthis
software to run it
on your own PC.
Note:
These require a web
browser capable of running Java applets (version 1.3 or higher). If you
do not see a selection list above, chances are that you either have
disabled Java, or you have an outdated implementation of Java. In
the latter case, you need to download and install the JRE plug-in from java.sun.com.
Due to a
compatibility bug, many plug-ins size the applet window before allowing
for an additional strip with a security warning.; to compensate, drag the bottom of
the window downward a bit.
Note:
[27 August 2008] Very minor changes were made in some applets.
All components were re-compiled to the same target JVM (version
1.3). With a little luck, this will solve some past problems.
If it worked just before Aug 27, 2008 but not now, please let me
know and give a description of your OS and Java version number.
If it doesn't work, here is an older version
that you can download and run locally: Right-click and do a "save as" (be sure to save it with a .jar extension) and run it by
double-clicking its icon.
Newer additions
An applet for comparing two variances was added in November, 2007.
An applet that provides online tables for common distributions was added in March, 2009
Citing
this software
If you use this software in preparing a research paper, grant proposal,
or other prublication, I would appreciate your acknowledging it by
citing it in the references. Here is a suggested bibliography
entry in APA or "author (date)" style:
Lenth, R. V.
(2006-9). Java Applets
for Power and Sample Size [Computer software]. Retrieved month
day, year, from
http://www.stat.uiowa.edu/~rlenth/Power.
This form of the citation is appropriate whether you run it online
(give the date you ran it) or the stand-alone version (give the date
you downloaded it).
Download
to run locally
The file piface.jar
may be
downloaded so that you can run these applications locally. [Note: Some mail software
(that thinks
it is smarter than you) renames this file piface.zip.
If this happens, simply rename it piface.jar;
do not
unzip the file.]
You
may also want the icon file piface.ico
if you put it on your desktop or a toolbar. You
will need to have the Java Runtime Environment (JRE) or the Java
Development Kit (JDK) installed on your system. You probably
already have it; but if not, these are available for free download for
several platforms from Sun.
If
you have JDK or JRE version 1.2 or later, then you can probably run the
application just by double-clicking on piface.jar.
Otherwise,
you may run it from the command line in a terminal or DOS window, using
a command like
java -jar piface.jar
This will bring up a selector list similar to the one in this web
page. A particular dialog can also be run directly from the
command line, if you know its name (can be discovered by browsing piface.jar
with a zip file utility such as WinZip).
For example, the two-sample t-test
dialog may be run using
java
-cp
piface.jar rvl.piface.apps.TwoTGUI
Questions?
This software is made available as-is, with no guarantees; use it at
your own risk. I welcome comments on bugs, additional
capabilities you'd like to see, etc. I am also willing to
provide
minimal support if you truly don't understand what inputs are
required. However, each applet has a help menu, and I do
request
that you carefully read that before you e-mail me with
questions.
If you need statistical advice on your research problem, you should
contact a statistical consultant; and if you want expert advice, you
should expect to pay for it. Most universities with
statistics
departments or statistics programs also offer a consulting
service. If you think your research is important, then it is
also
important to get good advice on the statistical design (i.e., before you start
collecting data)
and analysis.
If you have carefully
read the
above two paragraphs, and still find it
appropriate to contact me, my e-mail address is russell-lenth@uiowa.edu.
Advice
Here are two
very wrong things that people try to do with my software:
Retrospective
power
(a.k.a. observed power, post hoc power). You've got the data,
did
the analysis, and did not achieve "significance." So you
compute
power retrospectively to see if the test was powerful enough or
not. This is an empty question. Of course it wasn't
powerful enough -- that's why the result isn't significant.
Power
calculations are useful for design, not analysis.
(Note: These comments refer to power computed based on
the
observed effect size and sample size. Considering a different
sample size is obviously prospective in nature. Considering a
different effect size might make sense, but probably what you really
need to do instead is an equivalence test; see Hoenig and Heisey, 2001.)
Specify
T-shirt effect sizes
("small", "medium", and "large"). This is an elaborate way to
arrive at the same sample size that has been used in past social
science studies of large, medium, and small size
(respectively).
The method uses a standardized effect size as the goal. Think
about it: for a "medium" effect size, you'll choose the same n regardless of the
accuracy or
reliability of your instrument, or the narrowness or diversity of your
subjects. Clearly, important considerations are being ignored
here. "Medium" is definitely not the message!
Here are three
very right things you can do:
Use
power
prospectively for planning future studies.
Software such
as is provided on this website is useful for determining an appropriate
sample size, or for evaluating a planned study to see if it is likely
to yield useful information.
Put
science
before statistics. It is easy to get caught up
in
statistical significance and such; but studies should be designed to
meet scientific goals, and you need to keep those in sight at all times
(in planning and
analysis). The appropriate inputs to power/sample-size
calculations are effect sizes that are deemed clinically important,
based on careful considerations of the underlying scientific (not
statistical) goals of the study. Statistical considerations
are
used to identify a plan that is effective in meeting scientific goals
-- not the other way around.
Do pilot
studies.
Investigators tend to try to answer all the world's questions with one
study. However, you usually cannot do a definitive study in
one
step. It is far better to work incrementally. A
pilot study
helps you establish procedures, understand and protect against things
that can go wrong, and obtain variance estimates needed in determining
sample size. A pilot study with 20-30 degrees of freedom for
error is generally quite adequate for obtaining reasonably reliable
sample-size estimates.
Many funding agencies require a power/sample-size section in grant
proposals. Following the above guidelines is good for
improving
your chances of being funded. You will have established that
you
have thought through the scientific issues, that your procedures are
sound, and that you have a defensible sample size based on realistic
variance estimates and scientifically tenable effect-size
goals.
To read more, please see the following references:
Lenth, R. V. (2001), ``Some Practical Guidelines for
Effective
Sample Size Determination,'' The American Statistician, 55,
187-193.
Hoenig, John M. and Heisey, Dennis M. (2001), ``The Abuse
of
Power: The Pervasive Fallacy of Power Calculations for Data Analysis,''
The American Statistician, 55,
19-24.
An earlier draft of the Lenth reference above is _here_,
and a shorter summary of some comments I made in a panel discussion at
the 2000 Joint Statistical Meetings in Indianapolis is _here_.
Additional brief comments, prepared as a handout for my
poster
presentation at the 2001 Joint Statistical Meetings in Atlanta, are _here_.
Accuracy
Formula accuracy
Most computations are ``exact'' in the sense that they are based on
exact formulas for sample size, power, etc. The exception is
Satterthwaite approximations; see below.
Machine accuracy
Even with exact formulas, computed values are inexact, as are all
double-precision floating-point computations. Many
computations (especially
noncentral distributions) require summing one or more series, and there
is a serious tradeoff between speed and accuracy. The error
bound
set for cdfs is 1E-8 or smaller, and for quantiles the bound is
1E-6.
Actual errors can be much larger due to accumulated errors or other
reasons.
Quantiles, for example, are computed by numerically solving an equation
involving the cdf; thus, in extreme cases, a small error in the cdf can
create a large error in the quantile.
A warning (typically, ``too many iterations'') is generated when an
error bound is not detected to have been achieved. However,
in
the case of quantile computations, no warning message is generated for
extreme quantiles. If you want a power of .9999 at
alpha=.0001,
you can expect the computed
sample size to not be accurate to the nearest
integer! If
you
specify reasonable criteria, the answers will be pretty reliable.
Satterthwaite approximations
Some of the dialogs (two-sample t, mixed ANOVA) implement Satterthwaite
approximations when certain combinations of inputs require an error
term
to be constructed. These are of course not exact, even in
their
formulation. Moreover, the Satterthwaite degrees of freedom
is
used as-is in computing power from a noncentral t or
noncentral F distribution, and this introduces
further errors
that could be large in some cases.
In the two-sample t setting, I'd expect the worst
errors to
exist
when there is a huge imbalance in sample sizes and/or
variances.
In
the dialogs for mixed ANOVA models (either F tests
or multiple
comparisons/contrasts), I expect these errors to get worse as more
variance components are involved, especially when one or more of them
is given negative weight.
It is not unlikely that one or more of these links is broken.
If that happens, please let me know (especially if you can correct
it!). I also welcome suggestions for other links that you think I
should add. Visitors since August 14, 2006: View
hit-counter statisticshttp://www.stat.uiowa.edu/~rlenth/Power
This page was last modified Wednesday, 25-Mar-2009 13:33:37 CDT. The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been approved by the Division of Mathematical Sciences, the College of Liberal Arts or The University of Iowa.