The Perceptron Algorithm, Part 3¶

So now we’ve learned about the basics of the perceptron algorithm, with the limiting assumption that its decision boundary had to pass through the origin. Consequently, we only had to find a single optimal weight vector \(w\) and not an additional intercept term \(\alpha\). Let’s remove that assumption here.

The way to incorporate the intercept term is clever: we add another fictitious dimension (sometimes called the bias term) to every point in our dataset: a 1. In other words, we add a column of ones to our dataset \(X\). That way, the corresponding weight found by our optomization algorithm will correspond to our intercept term \(\alpha\):

\[\begin{split} f(x) = w \cdot x + \alpha = \begin{bmatrix} w_1 & w_2 & ... & w_d & \alpha \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \\ ... \\ x_d \\ 1 \end{bmatrix} \end{split}\]

So now our data points are in \(\mathbb{R}^{d+1}\)-dimensional space, but their final coordinate is 1. This means that all of our data points now lie on a common hyperplane \(x_{d+1} = 1\). Now finding a hyperplane in \(d+1\)-dimensional space (that passes through the origin) will include our intercept (or bias) term!

Machine Learning

The Perceptron Algorithm, Part 3

The Perceptron Algorithm, Part 3¶