Rabu, 10 Mei 2023

A Simple MLP with Pure JavaScript (Without Third-Party Libraries)

Untuk versi bahasa Indonesia dari entri ini, klik di sini.

Hello!

A few days ago, I watched videos about how a simple MLP work. Then, related videos came up. (Ah, the YouTube reccomendation system is something, eh.) One of those is how to make a simple artificial neural network (ANN) with Python, but without third-party libraries, e.g. Tensorflow or PyTorch.

The video starts with explanations about its calculation or its mathematics. It continues with arranging the program code by creating small functions to help writing the code and also to tidy up the code's structure as a whole. What we have is a long, single script that does MLP.

What is MLP?

Multilayer perceptron is one form/structure of ANN which consists of some layers of perceptrons. Usually, this network consists of three main types of layers: (1) input layer, (2) hidden layers, and (3) output layer. All layers in an MLP have the same structure, i.e. consists of some perceptrons.

A diagram of an MLP network which consists of three columns labeled: input layer, hidden layer, and output layer. Between two adjacent layers, there are lines that connects each perceptron on the left-side layer to each perceptron on the right-side of the right-side layer.
A diagram of an MLP network with three main layers

Perceptron is basically a simple classification component. It recieves inputs that is weighted by certain values plus its own bias and returns a value which can be assumed as activity level of the perceptron. Mathematically, a perceptron can be written like this below:

o=fA(ixiwi+b)

  • o is the output value.
  • x is the input value.
  • w is the weight of each input.
  • b is the bias value.
  • fA is an activation function.

A diagram of a single perceptron which consists of a circle in the center, some lines on the left-side, and a single line on the right-side. The left-side lines are written with x and w. The right-side line are written with f-A and o. The center circle are written with a plus in its middle and a b-letter in its bottom.
A diagram of a single perceptron

Okay, how to do it?

Note: This explanation is just a summary. Read the original code below for the full explanation.

We starts with the model's data structure. What are needed in an MLP? To make it simple, we need four things below:

const model = {
	layers:        [], // interlayer weight matrix
	bias:          [], // bias weight matrix for each layer (after input layer)
	activation:    [], // list of activation function of each layer (after input layer)
	derActivation: []  // list of activation function's (first) derivation of each layer (after input layer)
};

The next step is to create a function to feed-forward a model. We do a repeated matrix multiplication for each layer with some additional steps. The summary is the same with the perceptron equation above.

function feedForward(model, X) {
	let values = X;
	let listOfResults = [values];
	for (let i = 0; i < model.layers.length; i ++) {
		values = multiply(model.layers[i], values);
		values = addition(values, model.bias[i]);
		values = model.activations[i](values);
		listOfResults.push(values);
	}
	return listOfResults;
}

After that, we can start the model's training. For each iteration, to make it easy, each sample/entry from the dataset is fed-forward, then calculate the error, and back-propagate the error; all in a single step.

This indeed is not an effective method to ensure that the training is stable because this method is very stochastic. The usual step for each iteration is doing feed-forward, error calculation, and back-propagation as a whole (or in batches) to make the training stable.

Note that there isn't any step to calculate the derivation of the output (last) layer's activation function. This is because, in my experience, the training of the model fails and its error value for other layers becomes NaN or others when the last layer's error is multiplied by the derivation of the output (last) layer's activation function.

for (let k = 0; k < X.length; k ++) {
	// feed-forward
	const inputs = transpose([X[k]]);
	const listOfValues = feedForward(model, inputs);
	const outputs = listOfValues[listOfValues.length - 1];
	// back-propagation
	let error = substract(outputs, transpose([Y[k]]));
	totalError += transpose(error)[0].reduce((a, v) => a + v);
	// no multiplication with output layer's activation function's derivation
	for (let i = model.layers.length - 1; i >= 0; i --) {
		const layerError = multiply(
			error,
			transpose(listOfValues[i]),
			learningRate
		);
		const layerBiasError = multiply(
			error,
			learningRate
		)
		model.layers[i] = substract(model.layers[i], layerError);
		model.bias[i] = substract(model.bias[i], layerBiasError);
		if (i > 0) {
			error = multiply(transpose(model.layers[i]), model.derActivations[i - 1](error));
		}
	}
}

After the model is trained, we can try to do prediction as below. This function doesn't just return the choice of the model, but also the raw outputs. We assume that the model is trained for a classification problem.

function predict(model, X) {
	const result = {
		outputs: [],
		selected: []
	}
	for (let k = 0; k < X.length; k ++) {
		const inputs = transpose([X[k]]);
		const listOfValues = feedForward(model, inputs);
		const outputs = listOfValues[listOfValues.length - 1];
		const selected = argMax(transpose(outputs)[0]);
		result.outputs.push(outputs);
		result.selected.push(selected);
	}
	return result;
}

... and that's it generally, I think. There are a lot of details that need to be made, but those above are the general view of it. I used the full Iris dataset as training data and it got to 94% accuracy which is great in my opinion.

The complete version can be read in a GitHub Gist that I made.

I hope this helps and have fun trying it!

Tidak ada komentar:

Posting Komentar