Methodology In Robust And Nonparametric Statistics Pdf BooksNonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. Nonparametric regression requires larger. Inyoung Kim Biography. M.S in Applied Statistics, Yonsei University, Korea, Feb.
Robust statistics - Wikipedia, the free encyclopedia. Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standard- deviations; under this model, non- robust methods like a t- test work badly. Introduction. In statistics, classical estimation methods rely heavily on assumptions which are often not met in practice. In particular, it is often assumed that the data errors are normally distributed, at least approximately, or that the central limit theorem can be relied on to produce normally distributed estimates. Unfortunately, when there are outliers in the data, classical estimators often have very poor performance, when judged using the breakdown point and the influence function, described below. The interest in nonparametric statistical analysis has grown recently in the field of computational intelligence. In many experimental studies, the lack of the required properties for a proper application. The practical effect of problems seen in the influence function can be studied empirically by examining the sampling distribution of proposed estimators under a mixture model, where one mixes in a small amount (1. For instance, one may use a mixture of 9. Robust parametric statistics can proceed in two ways: by designing estimators so that a pre- selected behaviour of the influence function is achievedby replacing estimators that are optimal under the assumption of a normal distribution with estimators that are optimal for, or at least derived for, other distributions: for example using the t- distribution with low degrees of freedom (high kurtosis; degrees of freedom between 4 and 6 have often been found to be useful in practice. L- estimators are a general class of simple statistics, often robust, while M- estimators are a general class of robust statistics, and are now the preferred solution, though they can be quite involved to calculate. Definition. This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity. One of the most important cases is distributional robustness. Thus, in the context of robust statistics, distributionally robust and outlier- resistant are effectively synonymous. The data sets for that book can be found via the Classic data sets page, and the book's website contains more information on the data. Although the bulk of the data look to be more or less normally distributed, there are two obvious outliers. These outliers have a large effect on the mean, dragging it towards them, and away from the center of the bulk of the data. Thus, if the mean is intended as a measure of the location of the center of the data, it is, in a sense, biased when outliers are present. Also, the distribution of the mean is known to be asymptotically normal due to the central limit theorem. However, outliers can make the distribution of the mean non- normal even for fairly large data sets. Besides this non- normality, the mean is also inefficient in the presence of outliers and less variable measures of location are available. Estimation of location. Also shown is a normal Q. The outliers are clearly visible in these plots. Panels (c) and (d) of the plot show the bootstrap distribution of the mean (c) and the 1. The trimmed mean is a simple robust estimator of location that deletes a certain percentage of observations (1. The analysis was performed in R and 1. The distribution of the mean is clearly much wider than that of the 1. Also note that whereas the distribution of the trimmed mean appears to be close to normal, the distribution of the raw mean is quite skewed to the left. So, in this sample of 6. Robust statistical methods, of which the trimmed mean is a simple example, seek to outperform classical statistical methods in the presence of outliers, or, more generally, when underlying parametric assumptions are not quite correct. Whilst the trimmed mean performs well relative to the mean in this example, better robust estimates are available. In fact, the mean, median and trimmed mean are all special cases of M- estimators. Details appear in the sections below. Estimation of scale. The plots are based on 1. Gaussian noise added to the resampled data (smoothed bootstrap). Panel (a) shows the distribution of the standard deviation, (b) of the MAD and (c) of Qn. The distribution of standard deviation is erratic and wide, a result of the outliers. The MAD is better behaved, and Qn is a little bit more efficient than MAD. This simple example demonstrates that when outliers are present, the standard deviation cannot be recommended as an estimate of scale. Manual screening for outliers. Indeed, in the speed- of- light example above, it is easy to see and remove the two outliers prior to proceeding with any further analysis. However, in modern times, data sets often consist of large numbers of variables being measured on large numbers of experimental units. Therefore, manual screening for outliers is often impractical. Outliers can often interact in such a way that they mask each other. As a simple example, consider a small univariate data set containing one modest and one large outlier. The estimated standard deviation will be grossly inflated by the large outlier. The result is that the modest outlier looks relatively normal. As soon as the large outlier is removed, the estimated standard deviation shrinks, and the modest outlier now looks unusual. This problem of masking gets worse as the complexity of the data increases. For example, in regression problems, diagnostic plots are used to identify outliers. However, it is common that once a few outliers have been removed, others become visible. The problem is even worse in higher dimensions. Robust methods provide automatic ways of detecting, downweighting (or removing), and flagging outliers, largely removing the need for manual screening. Care must be taken; initial data showing the ozone hole first appearing over Antarctica were rejected as outliers by non- human screening. Such an estimator has a breakdown point of 0 because we can make x. Intuitively, we can understand that a breakdown point cannot exceed 5. Therefore, the maximum breakdown point is 0. For example, the median has a breakdown point of 0. The X% trimmed mean has breakdown point of X%, for the chosen level of X. Huber (1. 98. 1) and Maronna, Martin & Yohai (2. The level and the power breakdown points of tests are investigated in He, Simpson & Portnoy (1. Statistics with high breakdown points are sometimes called resistant statistics. The estimate of scale produced by the Qn method is 6. We can divide this by the square root of the sample size to get a robust standard error, and we find this quantity to be 0. Thus, the change in the mean resulting from removing two outliers is approximately twice the robust standard error. The 1. 0% trimmed mean for the speed- of- light data is 2. Removing the two lowest observations and recomputing gives 2. Clearly, the trimmed mean is less affected by the outliers and has a higher breakdown point. Notice that if we replace the lowest observation, . In many areas of applied statistics, it is common for data to be log- transformed to make them near symmetrical. Very small values become large negative when log- transformed, and zeroes become negatively infinite. Therefore, this example is of practical interest. Empirical influence function. It is a model- free measure in the sense that it simply relies on calculating the estimator again with a different sample. On the right is Tukey's biweight function, which, as we will later see, is an example of what a . The empirical influence function EIFi. Alternatively, the EIF is defined as the (scaled by n+1 instead of n) effect on the estimator of adding the point x. The approach is quite different from that of the previous paragraph. What we are now trying to do is to see what happens to an estimator when we change the distribution of the data slightly: it assumes a distribution, and measures sensitivity to change in this distribution. By contrast, the empirical influence assumes a sample set, and measures sensitivity to change in the samples. Let A. We want to estimate the parameter . Let the functional T: A. We will suppose that this functional is Fisher consistent, i. This means that at the model F. What happens when the data doesn't follow the model F. The influence function is then defined by: IF(x; T; F): =limt. For a robust estimator, we want a bounded influence function, that is, one which does not go to infinity as x becomes arbitrarily large. Desirable properties. However, M- estimators now appear to dominate the field as a result of their generality, high breakdown point, and their efficiency. See Huber (1. 98. M- estimators are a generalization of maximum likelihood estimators (MLEs). What we try to do with MLE's is to maximize . In 1. 96. 4, Huber proposed to generalize this to the minimization of . MLE are therefore a special case of M- estimators (hence the name: . The two figures below show four . When Winsorizing is used, a mixture of these two effects is introduced: for small values of x, . This Winsorised estimator is also known as the Huber loss function. Tukey's biweight (also known as bisquare) function behaves in a similar way to the squared error function at first, but for larger errors, the function tapers off. Properties of M- estimators. Therefore, off- the- shelf approaches to inference that arise from likelihood theory can not, in general, be used.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |