Kernel Density Estimation

Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable based on a finite sample. Let (x1, x2, …, xn) be a univariate independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is

\({\displaystyle {\widehat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\Big (}{\frac {x-x_{i}}{h}}{\Big )},}\) where n is the number of sample and h is the bandwidth.

For estimation, we need to set 2 parameteres :
1. Kernel function
2. Bandwidth

Kernel Function

Kernel function can be seen as a measure of simillarity between two data points. We will pass the value \(u = (X - X_i)/h\) and according to the kernel function decide, how close or far they are. This concept can easily be explanded in multi dimension but I have only implemnted it for 1D vector x below. Some common types of kernels are given below -

#Parabolic kernel
kepa <- function (x) .75 / sqrt(5) * (abs(x) < sqrt(5)) * (1 - x ^ 2 / 5)

#Uniform kernel function
kboxcar <- function (x) .5 * (abs(x) < 1)

# Power kernel 
kpower <- function (k) {
  c <- .5 * (k + 1) / k
  if (k == 0) return(kboxcar)
  function (x) c * (abs(x) < 1) * (1 - abs(x) ^ k)
}

#Gausian kernel
kgaussian <- dnorm

#estimating kdensity acc to selected kernel
kdensity <- function (x, K, h)
  function (t) mean(K((t - x) / h)) / h # f^_h(t)

#Helper function - 

plot_density <- function (x, h, K = dnorm, ns = 100) {
  n <- length(x)
  f <- kdensity(x, K, h)
  t <- seq(min(x) - sd(x), max(x) + sd(x), length = ns)
  y <- sapply(t, f)
  plot(t, y, ylim = c(0, max(y)), type = 'l',
       main = paste0('h = ', sprintf('%.2f', h)),
       lwd = 2, xlab = 'x', ylab = expression(hat(f)[h]))
  for (xi in x)
    lines(t, K((t - xi) / h) / (n * h), lty = 2)
  rug(x)
  invisible(f)
}

Bandwidth Selection

The bandwidth of the kernel is a free parameter which has strongest influence on the resulting estimate. The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error:

\[ {\displaystyle \operatorname {MISE} (h)=\operatorname {E} \!\left[\,\int ({\hat {f}}_{h}(x)-f(x))^{2}\,dx\right].} \] Below I have implemented the unbiased least squares estimator for bandwidth. This is based on the bootstrap principle. This uses LOOCV for estimator a good enough bandwidth. This selects h with least MISE.
We sometimes also use rule of thumb for bandwidth estimation like SilverMans Rule of thumb h = \(\left(\frac{4\hat{\sigma}^5}{3n}\right)^{\frac{1}{5}} \approx 1.06 \hat{\sigma} n^{-1/5},\)

ucv <- function (h, x, K = dnorm, w = 3, n = 100) {
  fhat2 <- function (t) (mean(K((t - x) / h)) / h) ^ 2
  fhat2vec <- function (xs) sapply(xs, fhat2)
  from <- min(x) - w * h; to <- max(x) + w * h
  J <- integrate(fhat2vec, from, to, subdivisions = n)$value # R(fhat)
  J - 2 * mean(sapply(seq_along(x),
                      function (i) mean(K((x[i] - x[-i]) / h)) / h))
}

Estimation

Using the kernel density function, and the bandwidth , we write the function for Kernel density Estimation

density_ci_student <- function (x, h, alpha = .05, K = dnorm, w = 3, n = 100) {
  xs <- seq(min(x) - sd(x), max(x) + sd(x), length = n) # interval of interest
  m <- diff(range(xs)) / (w * h) # independent of n, used to define q:
  q <- qnorm(.5 * (1 + (1 - alpha) ^ (1 / m)))
  fhat <- l <- u <- numeric(n)
  for (i in seq_along(xs)) {
    y <- K((xs[i] - x) / h) / h
    se <- sd(y) / sqrt(length(x))
    fhat[i] <- fx <- mean(y)
    l[i] <- max(fx - q * se, 0); u[i] <- fx + q * se
  }

  plot(xs, fhat, type = 'n', ylim = range(l, u), xlab = 'x')
  polygon(c(xs, rev(xs)), c(l, rev(u)), col = 'gray', border = F)
  lines(xs, fhat, lwd=2)
  rug(x)
  invisible(fhat)
}

#Example 
p <- .4; mu <- c(1, 5); sigma <- c(1, 2) # normal mixture
n <- 10
x <- rbinom(n, 1, 1 - p) + 1; x <- rnorm(n, mu[x], sigma[x])
hist(x)

#Calculating h for which MISE is minimum
hs <- 10 ^ seq(-1, 1, length = 100)
uh <- sapply(hs, ucv, x)
plot(hs, uh, type = 'l', xlab='h', ylab='UCV(h)')

h.star <- hs[which.min(uh)]


plot_density(x, bw.nrd(x)) # Silverman's rule of thumb

plot_density(x, bw.ucv(x)) # R's UCV
## Warning in bw.ucv(x): minimum occurred at one end of the range

plot_density(x, h.star)

Confidence Bands

This is a rough approximation. We associate confidence band in the paramter using the sample estimation formula - \[S.E.(f(t)) = S.E.(\widehat {f}(t))/n \] where n is the number of sample.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.