Where Machine Learning Began
January 2026
Traditional programming:
Rules + Data → Output
Machine learning:
Data + Output → Rules
Instead of writing rules by hand...
...let the computer learn them from examples
Traditional: write rules like "if contains Nigerian prince, mark spam"
ML: show thousands of labeled emails, let algorithm figure out the patterns
The first mathematical model of a neuron
A simple logic gate with binary outputs
Cornell Aeronautical Laboratory
Publishes the Perceptron learning rule
— The New York Times, 1958
Neurons are nerve cells that process and transmit signals
Multiple signals arrive at the dendrites
They are integrated in the cell body
If the accumulated signal exceeds a threshold...
An output signal is sent via the axon
Inputs → Weighted Sum → Threshold → Output
Just arithmetic, nothing scary
Vector form: z = w · x + b
x = input features (the data)
w = weights (importance of each feature)
b = bias (shifts the decision boundary)
z = net input (the weighted sum)
This is called a unit step function
def predict(x, weights, bias):
z = np.dot(x, weights) + bias
return 1 if z >= 0 else 0
The dot product w · x measures how similar the input is to the weight vector
High similarity → positive z → class 1
Low similarity → negative z → class 0
What is the perceptron actually doing?
In 2D: the perceptron finds a line
In 3D: a plane
In nD: a hyperplane
Points where z = 0
w · x + b > 0 → class 1 (one side)
w · x + b < 0 → class 0 (other side)
Weights control the orientation (which direction it points)
Bias controls the position (shifts it up/down)
How does it find the right weights?
η (eta) = learning rate, typically 0.0 to 1.0
True = 0, Predicted = 0: η(0-0)x = 0
True = 1, Predicted = 1: η(1-1)x = 0
No update needed
True = 1, Predicted = 0: η(1-0)x = ηx
Weights move toward the input
True = 0, Predicted = 1: η(0-1)x = -ηx
Weights move away from the input
Each update tilts the decision boundary
Trying to get the misclassified point on the correct side
Complete implementation in Python
import numpy as np
class Perceptron:
def __init__(self, eta=0.01, n_iter=50, random_state=1):
self.eta = eta
self.n_iter = n_iter
self.random_state = random_state
def fit(self, X, y):
rgen = np.random.RandomState(self.random_state)
self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
self.b_ = np.float_(0.)
self.errors_ = []
for _ in range(self.n_iter):
errors = 0
for xi, target in zip(X, y):
update = self.eta * (target - self.predict(xi))
self.w_ += update * xi
self.b_ += update
errors += int(update != 0.0)
self.errors_.append(errors)
return self
def net_input(self, X):
return np.dot(X, self.w_) + self.b_
def predict(self, X):
return np.where(self.net_input(X) >= 0.0, 1, 0)
If all weights start at zero...
Learning rate only affects scale, not direction
Small random values avoid this problem
The Iris Dataset
150 flower samples, 3 species
4 features: sepal length, sepal width, petal length, petal width
We will use 2 species, 2 features for visualization
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
df = pd.read_csv(url, header=None, encoding='utf-8')
# Extract setosa and versicolor (first 100 samples)
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', 0, 1)
# Extract sepal length and petal length
X = df.iloc[0:100, [0, 2]].values
ppn = Perceptron(eta=0.1, n_iter=10)
ppn.fit(X, y)
# Check convergence
print(ppn.errors_)
# [2, 2, 3, 2, 1, 0, 0, 0, 0, 0]
Converges after 6 epochs with zero errors
Click to add points, watch the perceptron learn
What the perceptron cannot do
Try drawing a single line to separate the 0s from the 1s
You cannot
Convergence is only guaranteed if classes are linearly separable
If not separable, weights never stop updating
(unless you set a maximum number of epochs)
1969: Minsky and Papert publish their analysis
Proved the perceptron cannot solve XOR
Funding dried up, research stopped
The AI Winter began
Add more layers
Multi-Layer Perceptron (MLP)
Neural Networks
Deep Learning
| Concept | Perceptron | Modern Neural Nets |
|---|---|---|
| Weighted sum | Yes | Yes |
| Activation function | Yes (step) | Yes (ReLU, etc) |
| Bias term | Yes | Yes |
| Iterative learning | Yes | Yes |
| Multiple layers | No | Yes |
| Backpropagation | No | Yes |
The perceptron is just:
But it introduced every key idea we still use today
Every layer of every deep network does the same thing:
weighted sum → activation → repeat
Adaline and Gradient Descent
How to optimize with calculus
i33ym.cc