**Introduction**

In this tutorial, I am going to share how to build a simple linear regression model (it has only one independent variable and one dependent variable) in Python from scratch. We are going to use the linear regression for predicting number of bike-sharing users based on temperature. So, the independent variable of the model is temperature, and the dependent variable of the model is number of bike-sharing users.

**Prerequisites**

In this tutorial, I assume you are quite familiar with linear regression model. If not, then you can learn the basic of linear regression form Andrew Ng’s machine learning lecture (lecture 2.1 – 2.7).

**Data Set**

For the sake of simplicity, we are going to use a randomly generated data set. There are 100 samples in this data set with one independent variable (temperature) and one dependent variable (number of bike-sharing users). The following figure plots temperature as a function of number of bike-sharing users. The data set itself is divided into train set and test set. The ratio between train set and test set is 80:20.

This is a very simplified data set. In reality, when the temperature exceeds 27°C, the number of bike-sharing users may decrease. Because no one wants to cycle in extremely hot weather. So, in reality, polynomial regression is more suitable.

**Plot the Data Set**

First of all, import the following libraries: `numpy` and `matplotlib`:

1 2 |
import numpy as np import matplotlib.pyplot as plt |

Then, the following code defines the train set (`x_train` and `y_train`) and test set (`x_test` and `y_test`):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Train set x_train = np.array([[16., 15., 16., 10., 16., 13., 9., 19., 10., 12., 15., 10., 16., 26., 18., 16., 20., 15., 10., 25., 17., 9., 21., 7., 8., 13., 18., 7., 12., 18., 19., 19., 11., 12., 19., 21., 21., 19., 16., 10., 13., 20., 7., 8., 15., 13., 11., 14., 16., 21., 15., 12., 17., 24., 17., 21., 17., 14., 24., 12., 15., 16., 17., 18., 24., 16., 15., 23., 23., 21., 13., 9., 13., 11., 8., 15., 16., 12., 14., 21.]]) y_train = np.array([[40., 52., 48., 37., 48., 37., 37., 40., 38., 29., 42., 38., 52., 57., 47., 46., 53., 32., 43., 63., 44., 35., 43., 43., 37., 35., 45., 37., 40., 54., 40., 51., 39., 44., 43., 49., 62., 60., 45., 34., 49., 44., 35., 26., 45., 42., 43., 40., 49., 52., 42., 35., 43., 51., 41., 51., 52., 44., 54., 38., 48., 45., 37., 50., 62., 57., 41., 47., 49., 52., 32., 33., 44., 43., 31., 43., 42., 42., 47., 61.]]) # Test set x_test = np.array([[13., 5., 17., 15., 10., 23., 18., 12., 22., 14., 9., 14., 14., 20., 11., 19., 10., 13., 11., 16.]]) y_test = np.array([[44., 23., 47., 38., 32., 48., 40., 45., 50., 36., 37., 48., 46., 57., 42., 50., 44., 42., 45., 46.]]) |

Finally, you can plot the train set and test set using `matplotlib`:

1 2 3 4 5 6 7 8 |
plt.scatter(x_train, y_train, label='Train set') plt.scatter(x_test, y_test, label='Test set') plt.xlim(0, 30) plt.ylim(0, 100) plt.title('Correlation between temperature and bike rentals') plt.xlabel('Temperature $[^{\circ}C]$') plt.ylabel('Bike-sharing users') plt.legend() |

**Hypothesis Function**

Hypothesis function is a function that approximates the data set. We are going to use this function to make a prediction. The hypothesis function for this data set is defined as:

\(h_{\theta}(x)=\theta_{0}+\theta_{1}x\)

where \(\theta_{0}\) and \(\theta_{1}\) are the parameters that we will get from training using gradient descent or from normal equation. \(x\) is the input (`x_train` or `x_test`), and \(h_{\theta}\) is the predicted output.

This linear regression model and its mathematical notation are based on Andrew Ng’s machine learning course. If you are not familiar with this mathematical model, then I suggest you to learn the details from this lecture (lecture 2.1 – 2.7).

The following code defines the \(\theta_{0}\) and \(\theta_{1}\) as global variables (**line 2-3**). Then, we define the hypothesis function (**line 6-7**).

1 2 3 4 5 6 7 |
# Define theta 0 and theta 1 as global variables theta_0 = 0 theta_1 = 0 # Define hypothesis function def hypothesis(x): return theta_0 + theta_1*x |

**Gradient Descent**

In order to get the \(\theta_{0}\) and \(\theta_{1}\), we can use either gradient descent or normal equation. In this section, we are going to use the gradient descent. The normal equation is going to be built in the next section.

Gradient descent is an iterative method to solve the \(\theta_{0}\) and \(\theta_{1}\), while normal equation solves the \(\theta_{0}\) and \(\theta_{1}\) analytically. For a very large data set, the gradient descent method is preferred.

The gradient descent algorithm is defined as:

\(

repeat \; until \; convergence \; \{ \\

\qquad temp0:=\theta_{0}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)}) \\

\qquad temp1:=\theta_{1}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)} \\

\qquad \theta_{0}:=temp0 \\

\qquad \theta_{1}:=temp1 \\

\}

\)

where \(\alpha\) is the learning rate, and \(m\) is number of train set, which is \(80\).

The following code defines a function for calculating the gradient descent:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
def gradient_descent(x, y, iter, alpha): global theta_0 global theta_1 for i in range(iter): # calculate hypothesis h = hypothesis(x) # calculate error e = h - y # calculate d_J/d_theta_0 sum_e = np.sum(e) d_J_d_theta_0 = sum_e / 80 # calculate d_J/d_theta_1 e_mul_x = e * x sum_e_mul_x = np.sum(e_mul_x) d_J_d_theta_1 = sum_e_mul_x / 80 # update theta 0 theta_0 = theta_0 - alpha*d_J_d_theta_0; # update theta 1 theta_1 = theta_1 - alpha*d_J_d_theta_1; return |

After that, we should run the gradient descent for 10000 iterations and \(\alpha=0.003\). Finally, you can print the result.

1 2 3 |
gradient_descent(x_train, y_train, 10000, 0.003) print(theta_0) print(theta_1) |

You will get \(\theta_{0}\approx 25\) and \(\theta_{1}\approx 1.25\).

**Plot the Hypothesis Function**

The following code plots the hypothesis function \(h_{\theta}(x)=\theta_{0}+\theta_{1}x\) with the obtained value of \(\theta_{0}\) and \(\theta_{1}\):

1 2 3 4 5 6 7 8 9 10 11 12 |
x_trained_model = np.arange(0, 31) y_trained_model = hypothesis(x_trained_model) plt.scatter(x_train, y_train, label='Train set') plt.scatter(x_test, y_test, label='Test set') plt.plot(x_trained_model, y_trained_model, '--r', label='$h_{\\theta}(x)$') plt.xlim(0, 30) plt.ylim(0, 100) plt.title('Correlation between temperature and bike rentals') plt.xlabel('Temperature $[^{\circ}C]$') plt.ylabel('Bike-sharing users') plt.legend() |

As you can see in the following figure, the hypothesis function is a best-fit straight line.

**Normal Equation**

Normal equation is another method that can solve the \(\theta_{0}\) and \(\theta_{1}\) analytically. It is also called closed-form solution. In normal equation method, we don’t need to define the number of iteration and learning rate.

The normal equation is defined as:

\(\hat{\theta}=(X^{T}X)^{-1}.(X^{T}Y)\)

The following code implements the normal equation method:

1 2 3 4 5 6 7 8 |
def normal_equation(x_train, y_train): X_b = np.c_[np.ones((80, 1)), x_train.reshape(-1, 1)] y = y_train.reshape(-1, 1) X_transpose = X_b.T theta = np.linalg.inv(X_transpose.dot(X_b)).dot(X_transpose).dot(y) return theta normal_equation(x_train, y_train) |

**Make Predictions**

After we train the model, we can make predictions by calling the hypothesis function. The following code shows how to make predictions from the test set.

1 2 3 |
# make prediction y_predict = hypothesis(x_test) print(y_predict) |

The following figure shows the graph of the prediction compared to the actual value.

**Source Code**

You can get the source code from this repository.

**Summary**

In this tutorial, you have learned how to built a simple linear regression from scratch in Python for predicting number of bike-sharing users. There are two methods that can be used for solving the parameters of hypothesis function, namely gradient descent and normal equation.