How many samples to estimate variance




















This is sort of the "canonical" case to give you a feel for how to go about the calculation. Based on your plots, your data don't look particularly normal; in particular, there is what appears to be noticeable skewness.

But, this should give you a ballpark idea of what to expect. I would focus on the SD rather than the variance, since it's on a scale that is more easily interpreted. People do sometimes look at confidence intervals for SDs or variances, but the focus is generally on means.

I would just take 2. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Calculating required sample size, precision of variance estimate?

Ask Question. Asked 10 years, 9 months ago. Active 5 years, 2 months ago. Viewed 18k times. Example Figure 1 density estimate of the parameter based on the samples. Improve this question. Community Bot 1. Abe Abe 3, 7 7 gold badges 26 26 silver badges 45 45 bronze badges. Add a comment. Active Oldest Votes. Improve this answer. Abe 3, 7 7 gold badges 26 26 silver badges 45 45 bronze badges. Erik P. I don't know where to go from Wikipedia for more context.

Wikipedia suggests it should also be mentioned in: Montgomery, D. This answer has been very useful and it has been informative to quantify variance uncertainty - I have applied the equation about 10 times in the last day. I can't find it in Casella and Berger. A primary reference would be even better if you know it. The wikipedia page is notably un-referenced.

I'll keep an eye out for it, but at this point I don't have a reference at all. There was one particularly bad typo in the previous version. Sorry about that. That is what the checkmark is there for. We found two simple formulas that estimate the mean using the values of the median m , low and high end of the range a and b , respectively , and n the sample size. Using simulations, we show that median can be used to estimate mean when the sample size is larger than For smaller samples our new formula, devised in this paper, should be used.

We also estimated the variance of an unknown sample using the median, low and high end of the range, and the sample size.

We also include an illustrative example of the potential value of our method using reports from the Cochrane review on the role of erythropoietin in anemia due to malignancy.

Peer Review reports. To perform meta-analysis of continuous data, the meta-analysts need the mean value and the variance or standard deviation in order to pool data. However, sometimes, the published reports of clinical trials only report the median, range and the size of the trial.

In this article we use simple and elementary inequalities in order to estimate the mean and the variance for such trials. In fact, the value of our approximation s is in giving a method for estimating the mean and the variance exactly when there is no indication of the underlying distribution of the data.

However, it has not been shown that median can indeed be used to replace mean values, nor when the range-formulas are appropriate. In this article, we want to estimate the mean, and the standard deviation of this sample of size n.

First we will order this sample by size:. Adding up and diving by n , the middle column is exactly the sample mean,. When the size of the sample is fairly large, the second fraction becomes negligible and the estimate can be written in a simplified form:. We can use this simple expression even if we do not know the size of the sample. The length of the interval which contains the sample mean the interval [LB, UB] , is approximately. On the other hand, if the summary results for a clinical trial include the median and the size of the sample, we can presumably do better than the two range approximations above.

Next section deals with that situation. We obtain the following inequalities:. Therefore, after simplifying:. Note that if we let n grow without bound, the expression 12 becomes the well-known range formula. The formula 4 can also be obtained by dividing the range [ a , b ] into two parts: [ a , m , and [ m , b ]. We then subdivide each of these two parts into subintervals using equally spaced partition points.

In other words, we are estimating each of the data points except for a , m , and b with uniformly spaced approximate points:. After a little algebra, the sample variance can be estimated by. If we let the number of estimation points increase without bounds, i. In order to verify the accuracy of these estimates, we ran several simulations using the computer package Maple where the data were variously distributed, and obtained the tables below. We drew samples from five different distributions, Normal, Log-normal, Beta, Exponential and Weibull.

The size of the sample ranged from 8 to about In the first subsection we present the results of our estimation for a normal distribution, which is what meta-analysts would commonly assume. We also show the results of simulations where the data were selected from a skewed distributions.

We drew random samples of sizes ranging from 8 to from a Normal Distribution with a population mean 50 and standard deviation Then we graphed the average relative error vs.

For sample sizes smaller than 29, formula 5 is actually outperforming the median as a mean estimator. For larger sample sizes, however, the median is more consistent estimator for a normally distributed sample. The variance estimators however show greater distinction. To compare the precision of these estimates on average, we collected the results of our simulation in the Additional file 1.

We also decided to run a simulation where the algorithm selects a sample from a skewed distribution.

These parameters were chosen arbitrarily, and the simulation results did not differ when we used different parameters naturally, larger variance translates into larger relative error for mean estimators for any distribution. Just like in the case of Normal distribution, we ran our algorithm times for each sample size ranging from 8 to For each of the estimation formulas we then calculated the average relative error.

We will summarize the best formula for estimation in Table 1. Therefore, counter intuitively, even for the skewed distributions we tested, it seems like that for a larger sample size usually more than 25 simply replacing sample mean with the reported median is the best estimate of the sample mean. This is an interesting result and we are not aware that it was previously demonstrated.

It gives assurance to meta-analysts that simple replacement of mean with medians in meta-analysis is a viable option.

Formula 5 , even though taking more parameters into account the range and the sample size , on average only outperforms the median for small sample sizes. However, a large number of trials used in meta-analyses do have very small number of patients for each arm as small as 10— For these trials, formula 5 seems to give an alternative to just using the median.

Detailed results of each simulation with a skewed distribution are given in the Additional file 2 , Additional file 3 , Additional file 4 , and Additional file 5. In this section we will discuss the use of these estimating formulas on the effect size for the meta-analysts. The pooled mean difference is then calculated by using weighted sum of these differences, where the weight is the reciprocal of the combined variance for each study. To determine whether our estimates make a huge difference when compared to the actual mean difference and variance, we drew two samples of the same size from a same distribution.

We applied our methods to the Log-Normal [4, 0. First we ran a test-case meta-analysis. After drawing fifteen samples of random sizes between 8 and from our distribution, we used our estimation formulas to estimate the mean and the variance from the median and the range.

Then we performed meta-analysis using STATA, treating the samples as one subgroup and their estimates as another subgroup to determine the pooled means and heterogeneity. Meta-Analysis of random data. After drawing fifteen samples of random sizes between 8 and from the Log-Normal [4, 0. Then we performed meta-analysis using STATA, treating the real samples as one subgroup and their estimates as another subgroup to determine the results and heterogeneity.

In order to capture a more consistent measure of the effect of our estimation on pooled mean difference, we repeated this process by varying the number of trials in the meta-analysis from 8 to In particular we are interested in the difference between the real pooled weighted mean difference in the sample group and the pooled weighted mean difference from a meta-analysis using estimated means and variances.

The actual population mean from which we drew samples is The actual average pooled sample mean difference between two samples one was control, the other experimental group was 0. Using the medians and range, we estimated the means for each sample, and performed the meta-analysis using these estimates. The average pooled estimated mean difference was 0.

Individually, the pooled means both, the real sample pooled means, and the estimated pooled means differed a little more. In Figure 2 the black diamonds represent the actual pooled mean difference using actual sample means. The red circles represent the same pooled mean differences using our estimation formulas we connected the corresponding symbols for clarity. The horizontal axis represents the number of trials in the meta-analysis from 8 to Actual pooled mean difference and estimated pooled mean difference.

The black diamonds represent the actual pooled mean difference using sample means. The red circles represent the pooled mean differences for the same samples using our estimation formulas we connected the corresponding symbols for clarity.

As seen from the Figure 2 , the estimates of the mean were fairly accurate and useful. However, in some situations, using these estimates might still be better than the alternative — excluding the trials which reported the wrong summary data median instead of mean. Using our estimation method, we can see the effect of such trials on pooled summary measures. In the next section we will illustrate our method in an actual systematic review. The results were expressed as the mean increase in hemoglobin in Epo arm compared with the control.

However, a number of the papers reported median increase instead of mean increase and standard deviation. Due to lack of available methods to use median values, the authors of this important review, decided not to use these papers in their meta-analysis.

Recently, the Cochrane review was published attempting to provide more updated analysis of the effects of Epo in anemia related to malignancy [ 5 ]. The Cochrane reviewers did meta-analyze data to calculate an average weighted mean increase in hemoglobin as the result of Epo treatment. However, the Cochrane investigators could not include the totality of evidence in relation to this outcome since a number of the trials reported data as medians instead of means.

Therefore, published meta-analyses related to the effect of Epo in anemia due to malignancy suffer from the phenomena akin to the outcome reporting bias [ 6 ] simply due to fact that methods are not yet developed to allow researches to use data medians.

Here we illustrate that it is actually possible to use medians and pool, and improve inclusiveness of meta-analyses. Their results show that on average Epo increases hemoglobin by 2. However, the Cochrane investigators could not pool data from other available studies in the literature with similar eligibility. However, they did not report the data for the standard deviation of these means.

Since the size of each arm is 15 patients, our formula 16 provides the best estimate of the standard deviation using the median and the range. We used Figure 1 on page in Welch at al. Thatcher et al do report in their paper ranges of hemoglobin for patients treated by Epo and control.

This trial was a three-arm study, in which two doses of Epo were compared against the control. For the purpose of this analysis, we separated the data from each of the Epo arms and compared them against one half of the control group just like the rest of the studies in the Cochrane review.



0コメント

  • 1000 / 1000