Simple Linear Regression Model for Predicting Number of Bike-Sharing Users

Introduction

In this tutorial, I am going to share how to build a simple linear regression model (it has only one independent variable and one dependent variable) in Python from scratch. We are going to use the linear regression for predicting number of bike-sharing users based on temperature. So, the independent variable of the model is temperature, and the dependent variable of the model is number of bike-sharing users.

Prerequisites

In this tutorial, I assume you are quite familiar with linear regression model. If not, then you can learn the basic of linear regression form Andrew Ng’s machine learning lecture (lecture 2.1 – 2.7).

Data Set

For the sake of simplicity, we are going to use a randomly generated data set. There are 100 samples in this data set with one independent variable (temperature) and one dependent variable (number of bike-sharing users). The following figure plots temperature as a function of number of bike-sharing users. The data set itself is divided into train set and test set. The ratio between train set and test set is 80:20.

This is a very simplified data set. In reality, when the temperature exceeds 27°C, the number of bike-sharing users may decrease. Because no one wants to cycle in extremely hot weather. So, in reality, polynomial regression is more suitable.

Plot the Data Set

First of all, import the following libraries: numpy and matplotlib:

Then, the following code defines the train set (x_train and y_train) and test set (x_test and y_test):

Finally, you can plot the train set and test set using matplotlib:

Hypothesis Function

Hypothesis function is a function that approximates the data set. We are going to use this function to make a prediction. The hypothesis function for this data set is defined as:

$$h_{\theta}(x)=\theta_{0}+\theta_{1}x$$

where $$\theta_{0}$$ and $$\theta_{1}$$ are the parameters that we will get from training using gradient descent or from normal equation. $$x$$ is the input (x_train or x_test), and $$h_{\theta}$$ is the predicted output.

This linear regression model and its mathematical notation are based on Andrew Ng’s machine learning course. If you are not familiar with this mathematical model, then I suggest you to learn the details from this lecture (lecture 2.1 – 2.7).

The following code defines the $$\theta_{0}$$ and $$\theta_{1}$$ as global variables (line 2-3). Then, we define the hypothesis function (line 6-7).

In order to get the $$\theta_{0}$$ and $$\theta_{1}$$, we can use either gradient descent or normal equation. In this section, we are going to use the gradient descent. The normal equation is going to be built in the next section.

Gradient descent is an iterative method to solve the $$\theta_{0}$$ and $$\theta_{1}$$, while normal equation solves the $$\theta_{0}$$ and $$\theta_{1}$$ analytically. For a very large data set, the gradient descent method is preferred.

The gradient descent algorithm is defined as:

$$repeat \; until \; convergence \; \{ \\ \qquad temp0:=\theta_{0}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)}) \\ \qquad temp1:=\theta_{1}-\alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)} \\ \qquad \theta_{0}:=temp0 \\ \qquad \theta_{1}:=temp1 \\ \}$$

where $$\alpha$$ is the learning rate, and $$m$$ is number of train set, which is $$80$$.

The following code defines a function for calculating the gradient descent:

After that, we should run the gradient descent for 10000 iterations and $$\alpha=0.003$$. Finally, you can print the result.

You will get $$\theta_{0}\approx 25$$ and $$\theta_{1}\approx 1.25$$.

Plot the Hypothesis Function

The following code plots the hypothesis function $$h_{\theta}(x)=\theta_{0}+\theta_{1}x$$ with the obtained value of $$\theta_{0}$$ and $$\theta_{1}$$:

As you can see in the following figure, the hypothesis function is a best-fit straight line.

Normal Equation

Normal equation is another method that can solve the $$\theta_{0}$$ and $$\theta_{1}$$ analytically.  It is also called closed-form solution. In normal equation method, we don’t need to define the number of iteration and learning rate.

The normal equation is defined as:

$$\hat{\theta}=(X^{T}X)^{-1}.(X^{T}Y)$$

The following code implements the normal equation method:

Make Predictions

After we train the model, we can make predictions by calling the hypothesis function. The following code shows how to make predictions from the test set.

The following figure shows the graph of the prediction compared to the actual value.

Source Code

You can get the source code from this repository.

Summary

In this tutorial, you have learned how to built a simple linear regression from scratch in Python for predicting number of bike-sharing users. There are two methods that can be used for solving the parameters of hypothesis function, namely gradient descent and normal equation.