Nonparametric Statistics
Nonparametric statisticsis the branch of statistics that is not based solely on parametrized families of probability distributions(common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference.
Non-parametric models
Non-parametric modelsdiffer from parametric models in that the model structure is not specifieda prioribut is instead determined from data. The termnon-parametricis not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
- A histogram is a simple nonparametric estimate of a probability distribution.
- Kernel density estimation(KDE) provides better estimates of the density than histograms.
In statistics, kernel density estimation(KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen--Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form.One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.- Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets.
- Data envelopment analysis provides efficiency coefficients similar to those obtained by multivariate analysis without any distributional assumption.
- KNNs classify the unseen instance based on the K points in the training set which are nearest to it.
- A support vector machine(with a Gaussian kernel) is a nonparametric large-margin classifier.
- Method of moments (statistics) with polynomial probability distributions.
Methods
Non-parametric(ordistribution-free)inferential statistical methodsare mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include
- Analysis of similarities
- Anderson--Darling test: tests whether a sample is drawn from a given distribution
- Statistical bootstrap methods: estimates the accuracy/sampling distribution of a statistic
- Cochran's Q: tests whetherktreatments in randomized block designs with 0/1 outcomes have identical effects
- Cohen's kappa: measures inter-rater agreement for categorical items
- Friedman two-way analysis of variance by ranks: tests whetherktreatments in randomized block designs have identical effects
- Kaplan--Meier: estimates the survival function from lifetime data, modeling censoring
- Kendall's tau: measures statistical dependence between two variables
- Kendall's W: a measure between 0 and 1 of inter-rater agreement
- Kolmogorov--Smirnov test: tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution
- Kruskal--Wallis one-way analysis of variance by ranks: tests whether >2 independent samples are drawn from the same distribution
- Kuiper's test: tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week
- Logrank test: compares survival distributions of two right-skewed, censored samples
- Mann--Whitney U or Wilcoxon rank sum test: tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis.
- McNemar's test: tests whether, in 2 × 2 contingency tables with a dichotomous trait and matched pairs of subjects, row and column marginal frequencies are equal
- Median test: tests whether two samples are drawn from distributions with equal medians
- Pitman's permutation test: a statistical significance test that yields exactpvalues by examining all possible rearrangements of labels
- Rank products: detects differentially expressed genes in replicated microarray experiments
- Siegel--Tukey test: tests for differences in scale between two groups
- Sign test: tests whether matched pair samples are drawn from distributions with equal medians
- Spearman's rank correlation coefficient: measures statistical dependence between two variables using a monotonic function
- Squared ranks test: tests equality of variances in two or more samples
- Tukey--Duckworth test: tests equality of two distributions by using ranks
- Wald--Wolfowitz runs test: tests whether the elements of a sequence are mutually independent/random
- Wilcoxon signed-rank test: tests whether matched pair samples are drawn from populations with different mean ranks