I am tasked with estimating an appropriate sample size for a sales call center experiment. Two groups A & B will be taking calls. Group A (2/3 of the calls) will follow their normal procedure in selling the product. Group B (1/3 of calls) will be selling using a different strategy. I need to estimate how many calls we will need to observe in order to measure a significant difference of 0%, 1%, 5%, 10% in success rates for group A and group B. I have explored the pwr package using pwr.2p2n.test() function, but am not quite sure how to apply for my example. Total calls per month for both groups will be between 35-50k per month. My thought was to have calls per month and p1 - p2 be variable inputs into pwr.2p2n.test() to get a range of power estimates, then choose the test that maximizes power. Is this a flawed method?
asked Jun 30, 2017 at 15:32 3 1 1 silver badge 2 2 bronze badges$\begingroup$ For clarification, you write: " I need to estimate how many calls we will need to observe in order to measure a significant difference of 0%, 1%, 5%, 10% in success rates for group A and group B." Do you mean a difference in percentage points or actual percent difference? Also, if there is no difference (0%) truly, then you will need nearly infinite n to detect that, relying on statistical noise at that point to get you significance. $\endgroup$
Commented Jun 30, 2017 at 15:57$\begingroup$ Mark, difference in percentage points. If p1=.20, p2=.15, then p1-p2=.05. I would want to know the sample size needed to observe a .05 difference in p1-p2. $\endgroup$
Commented Jun 30, 2017 at 16:52$\begingroup$ It is going to depend what the two proportions are. p1=.20 and p2=.15 is a five point difference, but it won't be the same effect size as p1=.80 and p2=.75. $\endgroup$
Commented Jun 30, 2017 at 16:54$\begingroup$ It sounds like my process would be to let p1, p2, and power vary, then see which n is associated with the max of power. $\endgroup$
Commented Jun 30, 2017 at 18:02Given my comments under your post above:
It sounds to be like you are analyzing a 2 x 2 contingency table: Group A vs. Group B x Success vs. Failure. With these, you can easily calculate an odds ratio (OR), see metafor::escalc() for good documentation on getting an OR from a 2 x 2 contingency table.
I have used epiR::epi.ccsize() to do power analyses for odds ratios before in working with epidemiologists. It is geared toward epidemiologists, but the statistics are the same, and the code is very simple.
Let's say we are expecting an odds ratio of 1.5, where there is a 30% success rate in the control group and there is a 2:1 ratio of participants in the control versus experimental group (i.e., what you describe in your post), and we want 95% power:
epi.ccsize(OR=1.50, p0=.30, n=NA, power=.95, r=2)
Which gives us a list:
$n.total [1] 1578 $n.case [1] 526 $n.control [1] 1052
Translating from epidemiologist-centric language, you need 526 experimental and 1052 controls to get 95% power in that situation.
It might also be tempting to try stats::power.prop.test() , but I'm not sure how to handle your 2:1 ratio using that function. For example, this response says that you just need to make sure your smallest group hits the threshold given by power.prop.test() , but I find that that estimate is unnecessarily high:
power.prop.test(p1=.30, p2=.391304, power=.95) # these values for p1 and p2 give OR of 1.50 Two-sample comparison of proportions power calculation n = 702.1545 p1 = 0.3 p2 = 0.391304 sig.level = 0.05 power = 0.95 alternative = two.sided NOTE: n is number in *each* group
This overestimate jibes well with the comment to the post I linked above, where user Underminer says:
"If you do a 95/5 split, then it'll just take longer to hit the minimum sample size for the variation that is getting the 5%." - while this is a conservative approach to at least satisfying the specified power of the test, you will in actuality be exceeding the specified power entered in power.prop.test if you have one "small" and on "large" group (e.g. n1 = 19746, n2 = 375174). A more exact method of meeting power requirements for unequal sample sizes would likely be desirable
Here's a relevant RPubs link using the pwr package, discussing unequal sample sizes. However, I find the most intuitive way to do this being the way using epiR .