SAMPLE SIZE AND CONFIDENCE WHEN APPLYING THE NSSDA Ariza López, Francisco Javier (*); Atkinson Gordo, Alan David (*) (*) Grupo de Investigación en Ingeniería Cartográfica. Dpto. de Ingeniería Cartográfica, Geodésica y Fotogrametría. Universidad de Jaén. Campus “Las Lagunillas” s/n. 23071. Jaén (Spain). e-mail: [email protected] Tel: +34953212469 e-mail: [email protected] Tel: +34927257195 ABSTRACT In this work a simulation process is used in order to study the variation and stability of National Standard Spatial Data Accuracy (NSSDA) results, depending on the sample size. Empirical results show that the NSSDA underestimated the error level presented in the population, and that positional accuracy estimation also has a variability of 11%when using the recommended sample size of 20 points. Simulation results indicate out the use of samples of a hundred points in order to reach an effective confidence level of 95%. The NSSDA is a methodology of shared risk between users and producers when accuracy is “as expected”, but for other cases the relation is altered, as simulation results demonstrated. INTRODUCTION Since positional quality is essential in cartographic production, all mapping agencies have used statistical methods for its control, and we call these methods tests. Among the different methods used, we can highlight the National Map Accuracy Standard (USBB, 1947), the Engineering Map Accuracy Standard (ASCI, 1983; ASP, 1985; Veregin, 1989; Giordano and Veregin, 1994), the ASPRS (Merchant, 1987; ASPRS, 1989), or the more recent National Standard Spatial Data Accuracy (FGDC, 1998). The National Standard Spatial Data Accuracy (NSSDA) established by the Federal Geographic Data Committee in 1998 is a statistical methodology for evaluating the positional quality of a Geographic Data Base (GDB). NSSDA is a compulsorily fulfilled standard for federal agencies of the USA producing analogical and/or digital cartographic data, and is ever more widely used all over the world. Like other positional quality control procedures, the coordinates of a set of points in the GDB are compared to coordinates of the same points in a higher accuracy source, mainly a field survey. In this way an RMSE is derived from discrepancies between pairs of coordinates. The NSSDA does not carry out a study on the presence of systematisms, as it considers that “they might have been eliminated in the best way” (FGDC, 1998). Therefore, the NSSDA only focuses on the study of data dispersion. The NSSDA gives results in a more open way than the previous test because it leaves to the user’s understanding whether or not the derived accuracy reaches expectations, which means, in a practical way, if the product passes or fails the user’s accuracy expectations. So acceptance or rejection is the responsibility of the user. The test only tells us: "the product has been checked/compiled for N meters of horizontal/vertical accuracy at 95% of level of confidence". Table 1 summarizes the steps for applying the standard. From a statistical point of view, one of the most controversial aspects of all the above-mentioned methodologies for positional control is the number and distribution of the control points. With regard to the number, which is our interest here, it should always be large enough for the hypothesis of normality to be fulfilled, this being determined by the laws of large numbers in statistics. For this reason recommendations always suggest at least 20 points (FGDC, 1998; MPLMIC, 1999). Nevertheless, this size seems to be very small and some authors (Li, 1991) and institutions (Ordnance Survey GB) suggest larger sizes. Obviously, since an elimination of gross errors should always be performed, a higher number should be used. This number of points should be enough to ensure, with a given level of confidence, that a GDB with a non-acceptable quality level will not be acquired. On the other hand, the number of points to be used for the control must be the lowest possible in order to minimize the cost of such a control (Ariza, 2002). Under the assumption of no systematic errors, our research focuses on the study of the variability of the estimated positional (only horizontal) accuracy of a GDB when applying the NSSDA with different sample sizes. In order to enable the generalization of results, synthetic Normal (µP= 0, 2P = 1) distributed populations of data are used. Simulation is developed using the bases of the process shown in “Positional quality control by means of the EMAS test and acceptance curves” (Ariza and Atkinson, 2005) also presented to this XXII International Cartographic Conference. The confidence of the results is analyzed with reference to so-called user and producer risks. Table 1.- Summary of the NSSDA when applied to the horizontal component Select a sample of a minimum of 20 check points (n>=20) Compute individual errors for each point i: e xi x ti x mi Compute RMSE for each component: RMSE X e yi y ti y mi e x2 RMSEY i n Compute the horizontal RMSE using appropriate expression: If RMSEX = RMSEY e y2 n i If RMSEX RMSEY and 0.6 < (RMSEmin / RMSEmax) < 1, RMSE R RMSE X2 RMSEY2 ACCURACYR= 1.7308 RMSER = 2.4477 RMSEX = 2.4477 RMSEY ACCURACYR 2.4477 0.5 (RMSEX + RMSEY) Note: If error is normally distributed and independent in each of the x- and y-components, the factor 2.4477 is used to compute horizontal accuracy at the 95% confidence level (Greenwalt and Schults, 1962). This presentation is organized in four sections: The first deals with the analysis of the accuracy variability depending on sample size but with fixed population variability, the second presents properties (underestimation and risks) of this methodology that have not been mentioned previously, and the next presents the analysis of the user’s and producer’s risks when variability of populations is considered. Finally, conclusions are presented. Nowadays there is a proposal for the revision of the NSSDA based on various suggestions, like those coming from the National Digital Elevation Program oriented towards adding instructions for how to test and report vertical accuracy in areas where a normal distribution can not always be attained, or the proposal of Tilley (2002) in order to classify accuracy results derived from the NSSDA, or the claim by McCollum (2003) that the Greenwalt and Shultz (1962) estimator is inappropriately used in the NSSDA to determine a probability of 95%. So that our results can also give more ideas for the redefinition of this standard. VARIABILITY OF NSSDA ACCURACY ESTIMATIONS In this paper simulation has been used as the base tool for analyzing the behaviour of the NSSDA methodology. The simulation process is similar to that applied to other positional control methodologies (for more detail see Ariza and Atkinson 2005); it basically consists of three main steps: ¬ Simulation of populations. A hundred synthetic populations of well known parameters (µP= 0, 2P = 1, where "P" means population) are derived from a controlled statistical random values generation process. Single population values are considered positional error values. ¬ Simulation of samples. A thousand samples of different sizes (n=10, 20, 30, and so on) are extracted from each population. The NSSDA is applied to each sample as if it were a single positional control test. ¬ Statistic computations. Results values are aggregated deriving mean errors and variation of error, the later giving an idea of the stability and reliability of the process. Results of the process are shown in Table 2. Because the simulation is performed using a Normal ≈ N(µP= 0, 2P = 1) distributed population, the theoretical value to be detected by the NSSDA is ACCURACYR = 2.447 m, which corresponds to a circular error estimation with a probability of the 95%. Because of the large number of simulations, the final results are very sound, with decreasing stability values ranging from 0.7 for samples of minimum size, and up to 0.2 for samples with maximum size. As can be observed in Table 2, mean values for 2,416 m 0,382 m are obtained for sample sizes of 10 points. This last supposes a 15.8 % of variability with respect to the mean observed value, and is inadmissible. For the size recommend by the NSSDA (n=20) the observed values obtained is ACCURACYR = 2.434 m 0.241 m. This value is 0.6% less than the corresponding theoretical value to be detected, and in this case variability is in the order of 11%, so that actually accuracy has a 89% of confidence. The variation range decreases when sample size increases, so that for sample sizes of 700 points it is circa 1%. In this way, taking a sample size of 95 points the mean observed value is ACCURACYR = 2.443 m 0.121 m and the variability is within ± 5 % of that value. So if we want to work with a confidence level of 95% it is not advisable to use less than 95-100 points for the control sample. In this case the simulation variability is about 0.4% which means a variation interval between ±4.6% and ±5.46%. Also, if we want to limit the variability to a maximum of 5%(= ±2,5%) a sample with at least n=275 points will be needed. But it is also possible to obtain the maximum variation value for a given sample size. The deviation value presented in column “c” of Table 2 should be multiplied by a K factor related to the desired confidence level, and added to the mean value of the same table. For instance, when n = 275, and the desired confidence level is 95%, K=1.96, so from Table 2 we obtain: ACCURACYR (Maximum) = ACCURACYR ± K Deviation = 2.446 m ± 1.96 x 0.061m = 2.446 m ± 0.120 m Other studies based on population normality (LI, 1991) suggest samples sizes of the same order. Table 2.- Mean ACCURACYR values and variability obtained by simulation of samples and populations n (a) 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 125 150 175 NSSDA m (b) 2,416 2,426 2,432 2,434 2,437 2,438 2,439 2,440 2,441 2,441 2,442 2,442 2,442 2,443 2,443 2,443 2,443 2,444 2,444 2,444 2,446 2,446 Dev. ±m (c) 0,382 0,312 0,270 0,241 0,219 0,203 0,189 0,178 0,168 0,160 0,153 0,147 0,141 0,136 0,131 0,127 0,124 0,121 0,115 0,099 0,088 0,083 Variation ±% (d) 15,8 12,9 11,1 9,9 9,0 8,3 7,8 7,3 6,9 6,6 6,3 6,0 5,8 5,5 5,4 5,2 5,1 5,0 4,7 4,1 3,6 3,4 Stab. % (e) 0,7 0,6 0,5 0,5 0,5 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,4 0,3 0,4 n (a) 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700 800 NSSDA m (b) 2,445 2,446 2,445 2,446 2,446 2,445 2,446 2,446 2,446 2,446 2,446 2,446 2,446 2,447 2,446 2,446 2,446 2,446 2,446 2,446 2,446 2,447 Dev. ±m (c) 0,075 0,070 0,065 0,061 0,058 0,055 0,052 0,048 0,046 0,043 0,042 0,040 0,039 0,037 0,034 0,033 0,031 0,030 0,028 0,026 0,025 0,024 Variation Stab. ±% % (d) (e) 3,1 0,4 2,9 0,4 2,6 0,4 2,5 0,3 2,4 0,3 2,3 0,3 2,1 0,3 2,0 0,3 1,9 0,3 1,8 0,3 1,7 0,3 1,6 0,3 1,6 0,3 1,5 0,3 1,4 0,2 1,3 0,3 1,3 0,2 1,2 0,2 1,1 0,2 1,0 0,2 1,0 0,2 1,0 0,2 Columns are: (a) Size (number of points) of the 1000 random samples; (b) simulation mean observed value for horizontal accuracy (ACCURACYR) by applying the NSSDA with a 95% confidence level; (c) mean deviation of the simulation process with respect to the mean observed value; (d) previous deviation expressed as a percentage of mean observed value of the horizontal accuracy; (e) stability of the process when using a hundred random populations, distributed as N(0,1). The same tendency results are expressed graphically in Figure 1. Here the X-axis refers to the size of control sample, and the Y-axis to the mean observed, or estimated, population value through the sample when using N(µP= 0, 2P = 1) populations in the simulation process. The wider and red dashed line corresponds to the value to be theoretically detected by the NSSDA: ACCURACYR = 2.447 m. The series of points are the results of the simulation; they have a very clear tendency, approaching from below the theoretical value when increasing the sample size. The two other dashed lines represent the decreasing tendency of the variability of the mean values. Exactitud estimada por NSSDA para 1 - = 95% 2.450 2.445 NSSDA (m) 2.440 2.435 2.430 2.425 2.420 2.415 2.410 0 100 200 300 400 500 600 700 800 (n) puntos por muestra Figure 1.- Mean ACCURACYR values (points) and variability (black dashed lines) obtained by simulation. The theoretical NSSDA value corresponding to a N(µP= 0, 2P =1) distributed population is shown in red UNDERESTIMATION, USER’S AND PRODUCER’S RISK As shown in Figure 1, the mean estimated value for a NSSDA control is smaller than the corresponding value for the N(µP= 0, 2P = 1) distributed population. In other words, in mean values the NSSDA underestimated the error level of the population or overestimated the accuracy. But variability of mean values can be above or below the mean tendency, so that for a specific control there is a probability for a better or a worst estimation of the RMSER, and that implies certain kind of risk that we call producer’s and user’s risk. Figure 2 shows a graphical interpretation: the area between the upper dashed line and the horizontal red one, which corresponds to expected population value (an ACCURACYR = 2.447 m for N(µP= 0, 2P =1) distributed population), represents a producers risk. The same is true for the area between the lower dashed line and the same horizontal red line, but in this case corresponding to the user. The interpretation given here to such probabilities or kind of risks is as follows: Producer risk means the probability of an estimation of the population’s RMSER value greater than it actually is. So the conclusion for the NSSDA is a worse accuracy than the true one. For a given sample size, for instance n1, this occurs proportionally to the ratio between segments A1B1 and A1D1 (producers risk = A1B1/A1D1). Segment A1B1 is the width of the risk-producer area for n1, and A1D1 is the total variability for the mean estimation when sample size is n1. User risk means the probability of an estimation of the population’s RMSER value less than it actually is. So the conclusion of the NSSDA is a better accuracy than the true one. For a given sample size, for instance n1, this occurs proportionally to the ratio between segments B1D1 and A1D1 (users risk = B1D1/A1D1). Segment B1D1 is the width of the risk user area for n1, and A1D1 is the total variability for the mean estimation when the sample size is n1. A B NSSDA Pobl. NSSDA RP RU D D 10 15 20 25 30 n35 40 45 (n) Sample size 50 55 60 65 70 Figure 2.- Probabilities of better and worst estimations of RMSER than actually (User’s and producer’s risk) Numerically speaking, both probabilities are very similar and also present a tendency to be equal (Table 3), but curiously for lower sample sizes greater differences occur although these are limited to 4.2% for n=10. This behaviour, in relation to producer and user risk, implies certain equitability because both share, more or less, the same risk. Table 3.- Probabilities of better and worst estimations of RMSER than actually (User’s and Producer’s Risk) size Probabilities size Probabilities n RU RP n RU RP 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 90 0.521 0.517 0.514 0.514 0.512 0.511 0.511 0.510 0.509 0.510 0.508 0.509 0.509 0.508 0.508 0.508 0.479 0.483 0.486 0.486 0.488 0.489 0.489 0.490 0.491 0.490 0.492 0.491 0.491 0.492 0.492 0.492 95 100 110 125 150 175 200 225 250 275 300 400 500 600 700 800 0.506 0.494 0.507 0.507 0.508 0.503 0.503 0.507 0.504 0.508 0.506 0.504 0.506 0.507 0.508 0.510 0.500 0.493 0.493 0.492 0.497 0.497 0.493 0.496 0.492 0.494 0.496 0.494 0.493 0.492 0.490 0.500 VARAIBILITY OF USER’S AND PRODUCER’S RISK Until now we have worked under the assumption of having a N(µP= 0, 2P = 1) distributed population and analyzed what can occur when estimating from a sample of a given size. Now we are going to analyze the behaviour of the NSSDA when expecting N(µP= 0, 2P = 1) distributed population but actually working with other normal N(µP= 0, 2 2 P = D ) distributed population (D<>1). So we are going to determine the user’s and producer’s risk for that condition. For this analysis we have used a simulation process similar to that mentioned above, but changing the variation behaviour when creating random populations. So a set of populations normally distributed has been synthetically created following a N(µP= 0, 2P = D2), where D = 0,8; 0,85; 0,9; 0,95; 1,00; 1,05; 1,10; 1,15 and 1,20. For each synthetic population a thousand samples of different sizes (n=10, 20, 30, and so on) were extracted. The NSSDA was applied to each sample as if it were a single positional control test. The results of this process are presented in Figure 3. The horizontal axis corresponds to sample size and vertical axis to ACCURACYR (left) and D (right). Figure 3 shows very similar tendency lines, but shifted to the vertical. The wider and dashed line corresponds to the previously studied situation where population follows a N(µP= 0, 2P = 1) and so is the same line as presented in Figure 1. Tendency curves above the wider and dashed line correspond to those cases were D > 1, and curves below to those where D<1. NSSDA (m) Deviation (m) 3.0 1.2 2.9 1.15 2.8 1.1 2.7 2.6 1.05 2.5 1.0 2.4 0.95 2.3 2.2 0.9 2.1 0.85 2.0 0.8 1.9 0 20 40 60 80 100 120 140 Sample Size Figure 3.- Evolution of ACCURACYR values for different population deviations versus sample size The different values of D can be considered, in relation to D=1, as exigency ratios (ER) implying a detected nominal accuracy value when applying the NSSDA, and vice versa. This idea is presented in Table 4. For example: ER = 0,8 means that we require an ACCURACYR = 1,957 m but the estimation from sample gives ACCURACYR = 2,446 m. So that our exigency is the 80% (= 1,957 / 2,446) of the actual accuracy of the product. The opposite case occurs when ER > 1, for instance if ER = 1,2 that means we require an ACCURACYR = 2,935 m but the estimation from sample gives ACCURACYR = 2,446 m, so the product is a 120% (= 2,935 / 2,446) more accurate than expected. Table 4.- Exigency ratios or D values and corresponding detected nominal values for the NSSDA (ACCURACYR) Exigency Ratios (ER) ACCURACYR 0,80 0,85 0,90 0,95 1,957 2,079 2,201 2,324 1,00 1,05 1,10 1,15 1,20 2,446 2,568 2,691 2,813 2,935 The described variability behaviour of values around the mean (previous section) has a very important role now when studying the behaviour of the NSSDA when expecting a N(µP= 0, 2P = 1) distributed population but actually working with a N(µP= 0, 2P = D2) distributed population. In order to use an acceptance index in the form of probability we consider the following rules: R1: If D < 1 ER > 1, the accuracy of the population is better than expected, and this means that it would be considered as satisfactory or accepted. So when performing the simulation, we will take into account the number of cases where observed resulting values for ACCURACYR < 2.477m. The number of such cases will be expressed as an acceptance percentage of total cases (the number of times we are able to say that accuracy is better than expected). R2: If D >1 ER < 1, the accuracy of the population is worse than expected, and this means that it would be considered as not satisfactory or not accepted. So when performing the simulation, we will take into account the number of cases where observed resulting values for ACCURACYR < 2.477m. The number of such cases will be expressed as an acceptance percentage of total cases (the number of times we are not able to say that accuracy is worse than expected). Figure 4 shows obtained results for the above mentioned process when representing acceptance values as a percentage. The black wider curve corresponds to the case where D=1 (labelled with 1). Its value is a little more than 50% because the accuracy underestimation of the NSSDA generates that situation. The red curves correspond to cases where D<1 (labelled with 1.05 up to 1.20). Here quality is better than expected, user acceptance increases and producer’s risk decreases to a percentage equals to 100% minus Acceptance (%) (a good product can be rejected in this percentage). The blue curves correspond to cases where D>1 (labelled with 0,95 up to 0,80). Here quality is worse than expected, and user acceptance is a risk (a bad product can be accepted in that percentage) which decreases when sample size n increases. (%) Acceptance 100 1,20 1,15 90 1,10 80 1,05 70 60 1,00 50 40 30 0,95 20 10 0,85 0,80 0,90 0 0 20 40 60 80 100 120 140 Sample Size Figure 4.- Evolution of acceptance levels for different population deviations versus sample size CONCLUSIONS By using a simulation based methodology the NSSDA ACCURACYR variations and risk behaviours have been analyzed. The statistical analysis is based on the use of normal distributed synthetic populations, which ensures the control of the process and the generality of results and its easy applicability to real cases. The main conclusions derived from the results are: The NSSDA has a little tendency to underestimate accuracy. For the minimum proposed sample size (20 points) the variability of results is in the order of 11%, which actually means a confidence level of the 89%. In order to have a 95% confidence level on estimation, and variability within a range of ±5%, the sample size must be in the order of 100 points. Because of its statistical formulation, the NSSDA accuracy estimation gives similar user’s and producer’s risks, which means a shared risk behaviour. If the variability of the population is greater or lesser than expected, user’s and producer’s risks change. We have derived a family of curves that can be used by users to determine the sample size for limiting their risk but also by producers to analyze the tradeoffs between their product’s quality and acceptance to decide and establish the capacity of the production process. ACKNOWLEDGEMENTS This work has been partially funded by the National Ministry of Sciences and Technology under grant nº BIA200302234. REFERENCES ARIZA, F.J. (2002). Control de Calidad en la Producción Cartográfica. Ra-Ma. ARIZA, F.J., ATKINSON, A. (2005). Positional quality control by means of the emas test and acceptance curves. In proceedings of the XXII International Cartographic Conference, La Coruña, España. ASCI (1983). Map Uses, scales and accuracies for engineering and associated purposes. American Society of Civil Engineers, Committee on Cartographic Surveying, Surveying and Mapping Division, New York. ASP (1985). Accuracy Specification for Large-Scale Line Maps. In PE& RS, vol 51, nº 2. ASPRS (1989). Accuracy standards for large scale maps. In PE&RS, vol. 56, nº7. ATKINSON, A. (2005). Control de calidad posicional en cartografía: análisis de los principales estándares y propuesta de mejora. Tesis doctoral, Universidad de Jaén, Jaén. FGDC (1998). Geospatial Positioning Accuracy Standards, National Standard for Spatial Data Accuracy. FGDC-STD-007-1998, http://www.fgdc.gov/ GIORDANO, A.; VEREGIN, H. (1994). Il controllo di qualitá nei sistemei informative territoriali. Il Cardo Editore, Venetia. GREENWALT, C.; SHULTZ, M. (1962). Principles of error theory and cartographic applications. ACIC Technical Report nº 96. ACIC, St Louis. MCCOLLUM, J. (2003). Map Error and Root Mean Square. In proceedings of the Sixteenth Annual Geographic Information Sciences Conference of the Towson University and Towson University's Department of Geography and Environmental Planning (TGGIS 2003). MERCHANT, D. (1987). A Spatial Accuracy Specification for Large Scale Topographic Maps. In PE&SR, vol. 53. MPLMIC (1999). Positional Accuracy Handbook. Minnesota Planning Land Management Information Center; http://www.mnplan.state.mn.us/press/accurate.html. TILLEY, G. (2002). A classification system for National Standards for Spatial data Accuracy. In proceedings of the Fifteenth Annual Geographic Information Sciences Conference of the Towson University and Towson University's Department of Geography and Environmental Planning (TGGIS 2002). USBB (1947). United States National Map Accuracy Standards. U.S. Bureau of the Budget. VEREGIN H. (1989). Taxonomy of Errors in Spatial Data Bases, Technical Paper 89-12, NCGIA, Santa Barbara.

© Copyright 2020