Title: Comparing numerical performance of second generation wavelets and the nonparametric estimators in random design regression models

Mohammadhossein Aberoumand

June

Abstract

Since wavelet methods were introduced three decades ago they have captured the attention of scientists in different areas. In the discipline of statistics, these methods have been implemented in different areas such as nonparametric regression, time series analysis and image denoising. In this project we have compared the relatively new second generation wavelet method with the first generation method and some commonly used nonparametric methods. We have tested these methods on non-equally-spaced grids and recorded the MSEs and MAEs. Based on the results of this project, the second generation wavelet method performs as well as the common nonparametric methods and is substantially better than the first generation in terms of error.

Chapter 1 Introduction

One of the most important topics in statistics is to model the mean value of response variable and linear statistical model are a classical tool to do so. Due to their simplistic nature, these family of models have been at the centre of attention for a long time. These models define a relationship between a response variable \(Y\) and a set of independent variable \(X_1,X_2,\dots,X_k\) such as the following: \[Y=\beta_0+\beta_1X_1+\epsilon\] \[\epsilon\sim N(0,1)\] where \(\beta_0,\beta_1\) unknown parameter that we wish to estimate. Linear regression are discussed in more details in(Wackerly, Mendenhall, & Scheaffer, 2014). Although, these models are based on linear relationships between variables, there will e many cases where we wish to investigate nonlinear relationships between the response and an independent variable. The general form of the model corresponds to it is the following: \[\begin{equation} Y=f(X)+\epsilon,\quad \epsilon\sim N(0,1) \tag{1.1} \end{equation}\] In equation (1.1) function \(f \in C^r\) which mean its derivatives exist till order \(r\) and it is bounded. When the relationship is nonlinear simple linear regression is not good choice because it is linear in nature and therefore not very flexible and although it will a have relatively small variance, its bias will be considerably large. Therefore, in chapter 2, we will briefly discuss some nonparametric methods that can be used to find \(f\) such as Kernel smoothing, Splines and Wavelet methods these method are more flexible compared to simple linear regression and will be less biased and by tuning their hyper parameters correctly we can avoid over-fitting. For instance, choice of bandwidth for Kernel smoothing and number of knots for Spline are crucial to get a good fit for the data. In this project we wish to compare first and second generation of wavelet method with two other class of estimators mentioned above. The First generation wavelet method are known to be applicable to the equally spaced grid However, They are not equally effective in the non equally spaced grid case due to its reliance on Fourier transformation. Since data-sets with non equally spaced grid are common in the real world cases over the past years a newly methods called second generation wavelet method has been developed by scientists which can handle a non equally spaced grid. This algorithm benefits from a framework called lifting scheme which does not rely on Fourier transformation. The second generation has been tested on real data-sets in (Hamilton, Nunes, Knight, & Fryzlewicz, 2018). However, no comparison has been made between first and second generation wavelet method on a non equally spaced grid data. In this project we wish to put these methods to the test in numerous scenarios.

In chapter three, we will discuss the methods to simulated the data from a non equally space grid. To create a non equally spaced grid we will use a random design model. In general, if in (1.1) only responses\(Y_1,\dots,Y_N\) are treated as random and the sample \(X_1,\dots,X_N\) treated to be fixed values this is called a fixed design. In contrast, if \((X_1,Y_1),\dots,(X_N,Y_N)\) considered to be random pairs this is called random design(Hsu, Kakade, & Zhang, 2011). For demonstration purposes we will represent some graph of basic function and signals and the estimated curved by different methods.

In chapter four, we wish to compare these estimators by Mean square error and Mean absolute error. we will present the results of the simulations. Three type of grid will be generated from uniform, left skewed and right-skewed distributions. This is important due to nonsymmetric shape of the signals which will result in different amount of variation along the domain. In addition we will change the amount of noise added each time by changing the signal to noise ratio.