Friday, March 4, 2016

Machine Learning methods, part I

Machine Learning is one of today's most influential technologies. With it, we enable computers to make predictions without being manually programmed.

It is important to remember that what we call machine learning is the intersection of applied statistics and computer science. I've met colleagues that are interested in the technology but still view is as a "magic box" into which you put data and out of which you get a functioning computer program. The truth is, these machine learning algorithms you hear about aren't all that complex. Moreover, understanding of the underlying processes is essential to giving the user a feel for what is and isn't possible with the technology, not to mention how to use it correctly.

Typically, courses can always be boiled down to a couple key conceptual components. The aim of this post is to summarize the content of Andrew Ng's famous Machine Learning Coursera course as succinctly as possible to give the reader a glimpse behind the hood of these machine learning algorithms once and for all.

Linear Regression

Given a collection of data points consisting of tuples \((x, y)\), an easy problem in statistics is to choose a line of best fit to the data. In a machine learning context, the data points we define as the training set and fitting the line is the process of training the machine learning algorithm.

-> A training set is a set \(D\) where elements of the set are vectors of \(n\) dimensions. We say that \(D\) contains \(m\) samples.

To use one of Andrew Ng's analogies, given a dataset of past sales from a certain realtor, we could, for instance, create a simple algorithm that when given the size of a house, predicts its market value. This will become more obvious from the graphic below.

Of the class of problems about fitting lines, the most fundamental problem is to fit a linear function to the data. Later, we will expand this further to incorporate non-linear functions.

-> Define \(\theta\) to be some row vector interpreted to be a model we are fitting to our dataset.

In our case, the models are the set of all possible linear functions. Since we know that all such functions can be written as \(h(x) = mx + b\), we can let \(\theta = \begin{bmatrix}b & m \end{bmatrix}\). Here the letter h stands for "hypothesis". In order to apply this model to the independent variable \(x\), it is now a matter of matrix multiplication. $$\begin{align} h(x) &= \theta x \\ &= \begin{bmatrix}b & m \end{bmatrix} \begin{bmatrix} 1 \\ x \end{bmatrix} \\ &= mx + b \end{align}$$

That extra one above the \(x\) is called the bias unit and is only necessary so that our predictions can have vertical offsets. Our end goal is to choose \(\theta\) such that, roughly \(\forall (x, y) \in D, (x, h(x)) \approx (x, y) \leftrightarrow h(x) \approx y\).

We're going out of our way to define things in terms of linear algebra. This is intentional. Later on, we'll see that even non-linear terms are going to be incorporated through the framework of linear algebra.

Cost Functions & Gradient Descent

Now that we have a way to talk about the training set and hypotheses, the meat of the algorithm will be determining which one is the most suitable. For this we will use the method of least squares and gradient descent.

-> For an arbitrary sample \((x, y) \in D\), define the cost as $$J((x, y), \theta) = (\theta x - y)^2 = (h(x) - y)^2$$

-> For an arbitrary hypothesis \(\theta\), define the cost as $$J(\theta) = \frac{1}{m}\sum\limits_{i=1}^m J(D_i, \theta)$$ where \(D_i\) denotes the i-th training example. Recall \(m\) is the size of the training set, so in this sense we're finding the average cost of all the samples

You notice that the function \(J\) has two meanings; there is one for an arbitrary sample and one for an arbitrary hypothesis. The hypothesis cost is made by adding together the costs of all the samples under that hypothesis. Exactly which \(J\) we are using should be clear from the context.

Let us draw a 3d graph consisting of the vector space of all hypotheses \(\theta\) along the bottom plane and their associated costs \(J(\theta)\) along the vertical axis. We want to find the point \(\theta\) that minimizes that cost.

In two dimensions this graph is in the shape of a paraboloid. This is good, because the vertex is the only minimum on the whole graph. Sometimes you get graphs that have many local minima, and finding the global minimum can be quite the challenge.

Proving the paraboloid shape isn't too difficult. Recall that a parabola is some equation of the form \(y = ax^2 + bx + c\) where \(a > 0\) if it is upwards facing. A paraboloid is just this with an extra dimension: \(y = ax_1^2 + bx_2^2 + x_1x_2c + x_1d + x_2e + f\) where \(a, b > 0\). Looking at \(J(\theta)\), we find$$\begin{align}J(\theta) &= \frac{1}{m}\sum\limits_{i=1}^m (\theta x^{(j)} - y^{(j)})^2 \\ &= \frac{1}{m}\sum\limits_{i=1}^m (\theta_0 + x_1 \theta_1 + x_2 \theta_2 - y^{(j)})^2 \\ &= \frac{1}{m}\sum\limits_{i=1}^m (a^{(i)}x_1^2 + b^{(i)}x_2^2 + c^{(i)}x_1x_2 + d^{(i)}x_1 + e^{(i)}x_2 + f)\end{align}$$So really it's the sum of many paraboloids, which still remains a paraboloid.

The gradient descent algorithm is just like sliding down a hill. We begin by initializing some random point as a guess. We will call this guess \(\theta_0\) and intialize it at the origin: \(\theta_0 = \begin{bmatrix} 0 & 0 \end{bmatrix}\). Recall that here the origin would be the linear function with slope and intercept both zero.

We then compute the gradient of the cost function \(J\). Recall from vector calculus that the gradient, \(\nabla f\), is defined as $$\nabla f = \frac{\delta f}{\delta x_1}\hat{x_1} + \frac{\delta f}{\delta x_2}\hat{x_2} + ... + \frac{\delta f}{\delta x_n}\hat{x_n}$$where \(\hat{x_i}\) are orthogonal unit vectors. Intuitively, we think of the gradient as being a multidirectional derivative. It is a vector field with arrows that point uphill with magnitudes that equal the intensity of the slope.

The gradient descent algorithm now proceeds as follows.

\(\theta = \theta_0\)
repeat {
    \(\theta = \theta - \alpha \nabla J(\theta)\)
}

The variable \(\alpha\) plays the role of how quickly we would like to slide down the slope. If set high, we will converge at the minimum after very few iterations but we risk jumping back and forth over the true minimum. If set low, we will converge very slowly but attain a much more accurate minimum.

Implementing gradient descent for our least-squares cost function is now quite easy. Our main task is to derive \(\nabla J(\theta)\). $$\begin{align} \nabla J(\theta) &= &&\frac{\delta J}{\delta \theta_1}\hat{\theta_1} + \frac{\delta J}{\delta \theta_2}\hat{\theta_2} + ... + \frac{\delta J}{\delta \theta_n}\hat{\theta_n} \\ \\ & &&\frac{\delta J}{\delta \theta_i} = \frac{\delta}{\delta \theta_i}(\frac{1}{m}\sum\limits_{j=1}^m (\theta x^{(j)} - y^{(j)})^2) \\ & &&\frac{\delta J}{\delta \theta_i} = \frac{1}{m}\sum\limits_{j=1}^m \frac{\delta}{\delta \theta_i}(\theta x^{(j)} - y^{(j)})^2 \\ & &&\frac{\delta J}{\delta \theta_i} = \frac{2}{m}\sum\limits_{j=1}^m (\theta x^{(j)} - y^{(j)})(\frac{\delta}{\delta \theta_i}(\theta x^{(j)})) \\ & &&\frac{\delta J}{\delta \theta_i} = \frac{2}{m}\sum\limits_{j=1}^m (\theta x^{(j)} - y^{(j)})(x_i^{(j)}) \\ \\ \nabla J(\theta) &= && \frac{2}{m} \sum\limits_{i=1}^n \sum\limits_{j=1}^m (y^{(j)} - \theta x^{(j)})(x_i^{(j)}) \end{align}$$

It's cool how it's possible to find this. This function, \(\nabla J(\theta)\), gives how the farily complicated cost function changes with respect to \(\theta\) in any number of dimensions. We're not talking about the attributes of any of the samples; the components of \(\theta\) are the slopes of the line itself. Computing this function means we have a sure method to always know where "downhill" is.

Examples

We have defined everything above in the multivariate case. This is confusing. Let's begin with the single-variate example.

Now let's extend this to two variables. Instead of just floor size, let's include a dimension for how close it is to major metropolitan areas. We say we are adding an additional feature to our training set.

Instead of one slope, we will have two slopes. The first one, \(\theta_1\) controls the price sensitivity to the size of the house and the second one, \(\theta_2\) contols the price sensitivity to its location. \(\theta_0\) is always used for vertical offset at the origin. When graphed, we have two input dimensions and one output dimension. Instead of a straight line, we get its two-dimensional analogue - a flat plane.

In the two-dimensional case, we also see how linear algebra is beneficial. Our previous definition of the hypothesis expands all by itself.

$$\begin{align} h(x) &= \theta x \\ &= \begin{bmatrix}b & m_1 & m_2 \end{bmatrix} \begin{bmatrix} 1 \\ x_1 \\ x_2 \end{bmatrix} \\ &= m_1x_1 + m_2x_2 + b \end{align}$$

In the next post we will investigate training non-linear components for each feature. This will allow us to blend certain features together in a unique way. We will also face the problem of overfitting and a simple solution known as regularization.

Monday, February 29, 2016

Get Khaledifier today!

In case you haven't had enough of DJKhaled quite yet, we've made a Chrome extension to seamlessly infuse him into your life. This creation won second place by popular vote at McGill's annual McHacks Hackathon.

Monday, September 1, 2014

GLSL - Passing uninterpolated varyings

Vertex and Fragment Shaders in GLSL

WebGL has a list of steps for rendering an image, known as a pipeline. Here's a quick summary.

  1. The vertices alone are processed through the vertex shader. Among other things, the primary task here is to convert from world space in three dimensions to screen space in two dimensions. In other words, we're projecting the coordinates from our 3d world onto our monitor.

  2. The faces are added in and rasterized. That is, we take the specific positions of vertices and their geometry, and from that generate a bitmap that maps onto the pixels of the screen.

  1. From each pixel, or fragment, we run the fragment shader to determine the final color. This is quite a large task; fragment shaders typically rely on textural information and interpolated values from the vertex shaders, such as UV coordinates, normals, and lighting vectors.

Barycentric Interpolation

The reason for interpolation is that meshes cannot keep information at every possible value; they can only store data at their vertices. There are times where, in the fragment shader, you need to access things like lighting vectors and texture coordinates. But remember that the fragment shader runs for every pixel on the screen. It does not necessarily align with how the information is stored per vertex!

How do we solve the problem, you ask? On a triangle, it's possible to have a three-coordinate system, where each coordinate represents the closeness of the given point to each vertex. These are called barycentric coordinates. They are normalized, such that all three coordinates always add up to one. Given these coordinates, all we need is to apply the percentage of each vertex value, and sum the results. Here we get the equation of barycentric interpolation:

$$P=uA+vB+wC$$

And here is a visualization of the process in action, with each vertex of the triangle set to a new primary color.

Interpolation in GLSL

When we write shaders, there are variables that undergo the process automatically. These sorts of values, that are passed from vertex to fragment, are called varyings.

Varyings can be any continuous datatype. (float, vec2, vec3, mat4...) Discrete datatypes cannot be interpolated.

Lighting calculations, though possible on the fragment shader, are much faster if done only once per vertex. For this we can generate a vec3 for lightness from the vertex normal and send it as a varying to be applied in the fragment shader. It still works because vec3's can undergo exactly the same operations as decimal numbers.

The Problem

Now that we have a little background, I'll describe in further detail into the topic of this post. I have a slow function that checks for collisions in a list of edges. A vertex is colored black if it lies behind one of the edges with reference to the player at the center of the screen. The effect is shown below.

Some faces are completely black, and others are unchanged. What's more, there are some faces that lie in the transition zone, where there is one vertex shaded and one vertex unshaded. You can see these in the top left.

I am looking for an effect that looks more like this:

Notice how smooth those transition faces have now become. The difference between the second image and the first image is the shader in which the edge collision checking procedure happens. The vertex shader (first image) is much faster but does not process each pixel independently. Instead it processes only the vertices, leaving us with very bad-looking interpolated values. The fragment shader (second image) looks great but is much too slow for realtime graphics. We must somehow combine the two approaches.

The solution is to (A) do collision checking in the vertex shader and (B) if the face lies in one of the transition zones, do the same on the fragment shader. It's very simple to check - a face will lie in the transition zone if the darkness is not exactly 0.0 or 1.0. We'll only have to check collisions per-pixel on the faces that actually need it.

Quite a nifty solution if I do say so myself, but there's still some more optimization that can be done. Namely, if we check for collisions in the vertex shader and know which edge we're talking about, it becomes quite redundant to cycle through every single edge again in the fragment shader. We need some way of passing an index for the edge into the fragment shader from just one or two vertices.

It's quite a differerent than the problem varyings are meant to solve, because the value must remain constant for all points on the face.

We cannot simply send the value as a varying. Given that you could convert it into a float, the value looses all meaning once it goes through the interpolation process. We would not be able to reconstruct the original because we have no idea where on the triangle we are!

A second thought might be to set all of the vertices in the triangle to that same value so that the interpolation does not actually change anything. This idea is debunked on the basis that vertices have no means of communicating to each other in the shader program.

What about creating a unique varying for each vertex of the triangle, so that we can deduce the coordinates and somehow reverse the barycentric interpolation? This doesn't work either, because vertices are processed before the geometry is generated, and it's impossible to find the unique "role" that a given vertex serves in a given face.


The Solution

The trick to the puzzle is using not one variable, but two! Check it:

// Vertex shader

varying float myConstA;
varying float myConstB;

myConstA = 1.0;
myConstB = myVariable;


// Fragment shader

varying float myConstA;
varying float myConstB;

float myVariable = myConstB / myConstA;

Remember that the interpolation done in OpenGL is a matter of multiplying the vertex values by the barycentric coordinates of the point in question. Now we've found a way to preserve information from being destroyed during multiplication. Supply two varyings, and set one some ratio above the other. The ratio is always preserved throughout interpolation, and we still have easy access to the value in the fragment shader.

Cool stuff!

Thursday, August 28, 2014

Game Project Part 3: Implementing View Occlusion

Commit 90e349e5ca764a71a7dea7cd771d0c2c3a7da11e

Here I will try to tackle a problem that I introduced in the last post - implementing a 2d visibility algorithm. Yes, this is a 3d game, but I'd like an effect similar to the one presented there. My idea is to create an underlying 2d linkage (that is, a faceless mesh composed of only vertices and edges) to represent the "walls" through which no light can pass. From there, we will run a fairly simple line segment collision algorithm on the shader for each edge, then darken any pixels that are in the "shadow".

Creating and Importing the View Occluder

I began by creating the linkage, which I'll be refering to as the view occluder in Blender.

the view occluder in Blender

Very simple, as you can see - about fifty edges.

Exporting it into the type of JSON files used by THREE.js involved some work. By default, the provided THREE.js exporter script only does vertices and faces. We will have to extend it to support edge connections for the linkage.

Thankfully, I already did this some time ago on my RTS project. I made a small script while I was working on adding pathfinding capabilities. The python script runs inside of the convenient built-in text editor, extracting mesh information from the Blender API. Here is the script:

# Run this from within Blender to export edge data to the desired text file. Replace the
# strings on lines 5 and 16 to edit the object and output file, respectively.
import bpy

object = bpy.data.objects['ViewOcclusion'].data

# Generate an empty sublist for every vertex
out = [[] for i in object.vertices]

for edge in object.edges:

    # Doubly linked edge array

    #out[edge.vertices[0]].append(edge.vertices[1])
    #out[edge.vertices[1]].append(edge.vertices[0])


    # Singly linked edge array
    if edge.vertices[0] > edge.vertices[1]:
        out[edge.vertices[1]].append(edge.vertices[0])
    else:
        out[edge.vertices[0]].append(edge.vertices[1])


save = open('/path/to/file.txt', 'w')
save.write(str(out))
save.close()

Inside of file.txt, you'll find an array full of vertex indicies that represent edge connections in the mesh. It looks like this:

[[3], [4, 2], [1, 3], [2, 0], [5, 1], [4, 6], [5, 7], [6, 8], [7], [10] ... ]

For more information, check out mapspec.mdown where I wrote a little bit about the datastructure that is used. That link will take you to the file as it is at the time of writing.

I manually added this array into the JSON file as generated by the THREE exporter. THREE.js will only parse the default parameters once a mesh has been loaded. With a little bit of ingenuity, we can overwrite the parse function to add on our extra edge data to the geometry object.

// Create a wrapper for the parsing function, which is automatically called by 
// JSONLoader
var oldParse = THREE.JSONLoader.prototype.parse;
THREE.JSONLoader.prototype.parse = function(json, texturePath) {
    var obj = oldParse(json, texturePath),
        geo = obj.geometry;

    geo.viewOcclusion = oldParse(json.viewOcclusion, texturePath).geometry;
    geo.viewOcclusion.edges = json.viewOcclusion.edges;

    geo.viewOcclusion.edgePairs = parseEdges(json.viewOcclusion.edges);

    return obj;
};

// viewOcclusion.edges are the edges exactly as defined in map.js

// Generates viewOcclusion.edgePairs, which collapse into an array of ints
// [Edge1A, Edge1B, Edge2A, Edge2B, ... ] that can be fed into a shader
function parseEdges(edges) {
    var edgePairs = [];

    for(var i = 0; i < edges.length; i++) {
        for(var j = 0; j < edges[i].length; j++) {
            edgePairs.push(i, edges[i][j]);
        }
    }

    return edgePairs;
}

It's a crude workaround, but it does what is should and is actually a pretty elegant solution! I am embedding the view occluder within the regular JSON file. That's why oldParse is being called twice - once for the regular mesh and once for the view occluder. The only thing I actually need it for in the view occluder is to convert the flat array representation of vertices into an array of THREE.Vector3s. We can skip most of the other dirty work because our mesh has no faces! :)

The second function above is self-explanatory; our nested array structure for the edges, while useful in JavaScript for future applications, cannot be fed into a shader. This function collapses it into a flat integer array of edge-forming vertex pairs.

With some extra effort, the view occluder can be visualized with help from the convenient THREE.Line class.

visualizing the view occluder

I had to mess around a little bit with the best way to do it. The lines are defined as a THREE.Geometry object, where the connections are in between all of the vertices. A problem arises because my data structure does not define "chains" of vertices in this way. Apparently you can create disconnected lines with a technique called indexing, but I found no problems with creating each edge as its own individual line and adding all of them as children to one big object! Don't fix what isn't broken, right?

Feeding View Occluder Geometry into a Shader

This proved to be quite a lot more challenging that I anticipated. First, let's try the obvious and add the uniforms to the material.

var vo = engine.map.viewOcclusion;

var uniforms = THREE.UniformsUtils.merge([
    THREE.UniformsLib.common,
    THREE.UniformsLib.lights,
    {
        'uPlayerPosition': {
            type: 'v3',
            value: engine.player.position
        },

        'uVOVerts': {
            type: 'v3v',
            value: vo.vertices
        },

        'uVOEdges': {
            type: 'iv1',
            value: vo.edgePairs
        }
    }
]);

And the accompanying declarations within the fragment shader: (all of my experiments here are in the fragment shader for simplicity sake)

uniform vec3 uPlayerPosition;

uniform vec3 uVOVerts[];
uniform int uVOEdges[];

varying vec3 vLightFront;
varying vec4 vWorldPosition;

But we get an error: ERROR: 0:43: 'uVOVerts' : unsized array declarations not supported. Uh oh. But it's okay! With some more playing around, it turns out that THREE.ShaderMaterial supports adding custom DEFINE statements in the shader prefix. It's a very simple solution.

// JS
var defines = {
    'uVOVertsLength': vo.vertices.length,
    'uVOEdgesLength': vo.edgePairs.length
};

// GLSL
uniform vec3 uVOVerts[ uVOVertsLength ];
uniform int uVOEdges[ uVOEdgesLength ];

And it compiles without error!

Calculating Shadows

Now we have to loop through each and every edge, calculating if the current pixel is within a shadowed area or not. If the line segment formed between the player position and the current pixel intersects the current edge, then the pixel must be shaded black.

Here are the prerequisite functions defined in GLSL:

// Hope to Jesus that no lines are perfectly parallel or have verticies 
// exactly on the origin
vec2 intersectPoint(vec2 a, vec2 b, vec2 c, vec2 d) {
    float slope1, slope2, c1, c2;

    // Calculate slopes of both lines
    slope1 = (b.y - a.y) / (b.x - a.x);
    slope2 = (d.y - c.y) / (d.x - c.x);

    // Calculate y-intercepts of both lines
    c1 = a.y - (a.x * slope1);
    c2 = c.y - (c.x * slope2);

    // (m1x + y1 = m2x + y2) boils down to (x = (y2 - y1) / (m1 - m2))
    float ix = (c2 - c1) / (slope1 - slope2),
          iy = (ix * slope1) + c1;

    return vec2(ix, iy);
}

bool onLine(vec2 point, vec2 a, vec2 b) {

    // Make sure x is in the domain of both lines
    // Checking just x should be enough
    return (point.x < max(a.x, b.x)) && (point.x > min(a.x, b.x));
}

bool intersect(vec2 a, vec2 b, vec2 c, vec2 d) {
    vec2 i = intersectPoint(a, b, c, d);

    return onLine(i, a, b) && onLine(i, c, d);
}

bool withinRadius(vec2 a, vec2 b, float radius) {
    // deltaX^2 + deltaY^2 < r^2 
    return ((b.x - a.x) * (b.x - a.x)) + ((b.y - a.y) * (b.y - a.y)) < (radius * radius);
}

Looping through each edge SHOULD be an easy thing to do but, surprise surprise, GLSL doesn't like it.

vec3 vertA, vertB;
for(int i = 0; i < uVOEdgesLength; i += 2) {

    vertA = uVOVerts[uVOEdges[i]];
    vertB = uVOVerts[uVOEdges[i + 1]];

    if(intersect(vertA.xz, vertB.xz, uPlayerPosition.xz, vWorldPosition.xz)) {
        gl_FragColor *= 0.0;
    }
}

ERROR: 0:255: '[]' : Index expression must be constant. John Smith writes about this issue in Hassles with array access in WebGL. One of his clever workarounds is to, get this, use a for loop to go through every value until we reach the one we want, then use the index variable instead of the original variable! This sounds confusing, so let me explain.

Above, for both vertA or vertB, we have two indexing operations. The first is uVOEdges[i], which compiles happily because WebGL counts index variables as constants. The second is when we take that value and use it as an index into uVOVerts, forming the overall expression uVOVerts[uVOEdges[i]]. WebGL doesn't like you using non-constants for indices, hence the error was raised. Smith's workaround uses the index of a second for loop for accessing the second array. Thus the following code will compile:

int edgeA, edgeB;
vec3 vertA, vertB;

for(int i = 0; i < uVOEdgesLength; i += 2) {
    edgeA = uVOEdges[i];
    edgeB = uVOEdges[i + 1];

    for(int j = 0; j < 200; j++) {
        if(j == edgeA) {
            vertA = uVOVerts[j];
        }

        if(j == edgeB) {
            vertB = uVOVerts[j];
        }
    }

    //Disable for slowness

    if(intersect(vertA.xz, vertB.xz, uPlayerPosition.xz, vWorldPosition.xz)) {
        gl_FragColor *= 0.0;
    }
}

It's bizzare to be sure, but it works, and it's awesome! I love those moments when your program finally works like it should. Here's a screenshot and a couple bonus images of the debugging process.

view occluder demonstration

There is one problem, though - the performance is horrendous! I can get a solid 20fps when I resize the my chrome window to the smallest possible size. When fullscreened, I get one frame every 5-10 seconds or so. It's no surprise; this system is a combination of non-practical workarounds. As the old engineering saying goes "First make it work, then make it work better!" In the next post, I will document my attempts to improve performance and optimize the system to the point of playability.

Monday, August 25, 2014

Game Project Part 2: Fear of the Dark

Commit 74fec67d1bdee06f65d9f43761020f5956b5f1a3

Development soundtrack of the day:

Yes, we're dealing with darkness, ladies and gentlemen. The effect I plan on implementing into the game can be found, beautifully explained, at redblobgames. Unfortinately, our dealings in 3d as opposed to 2d complicate things, but it's still completely doable - especially considering that we have the power of shaders at our disposal.

Shaders are fantastic because they can run hundreds of calculations concurrently on the GPU. This comes in useful for realms like 3d graphics, for instance, where there is a procedure running for each individual pixel on the screen. There are over two million pixels on a standard 1080p monitor. If we write a quick program to compute the color of each one, and then run the calculations the traditional way one-after-another on the CPU, it would be extremely difficult to achieve realtime performance. (especially if you're using JavaScript!)

Infact, the calculations would have to run one-hundred twenty-four million times every second to get up to an ideal 60fps, and that's not even accounting for the considerable complexity of these calculations. In 3d graphics, computing the final colour of a pixel is no easy task. You have to account every light in the scene, throw in all sorts of textures/shadow maps/normal maps and, if that wasn't enough, do matrix math to project everything into three dimensions! The fragment (per pixel) shader found in THREE.MeshLambertMaterial totals almost five hundred lines of code! That's a lot to ask for, or at least, running it over a hundred million times per second on a CPU is a lot to ask for.

I'm am exaggerating some of my points above. Anyone who ever watched a YouTube video on programming will tell you that counting lines of code as a measure of complexity is stupid to start with. Still, you can see what I'm getting at.

graphics card

GPU's are different than CPU's in that they can run a large number of procedures concurrently. A GPU can process dozens, possibly hundreds, of pixels at a time. But it's not all smooth sailing, this has limitations. The cores are forced to operate independently from one-another - they cannot share memory. That's the reason why graphics cards aren't called general purpose multicore processing cards; there are just certain tasks that cannot be done on them. Tasks that are possible to do, like 3d rendering, are susceptible to significant performance gains. In our case, we get to write the graphics processing code in GLSL - a fast, fairly low-level language based loosely on C.

If you're looking for resources about shaders in THREE.js, check out the tutorials, part 1 & part 2, on Aerotwist by Paul Lewis.

There are quite a lot of good resources on the web for tinkering around with shaders in WebGL. With exception to a few introductory articles like the one above, the same is not true for shaders in THREE.js. I had to spend a number of hours digging through the source to try and make sense of things. Here are some of my discoveries.

Hacking Shaders in THREE.js

All of the precompiled shaders are stored in THREE.ShaderLib.

Open up your browser console and explore. Here are the first few lines from the lambert vertex shader:

#define LAMBERT
varying vec3 vLightFront;
#ifdef DOUBLE_SIDED
    varying vec3 vLightBack;
#endif
#if defined( USE_MAP ) || defined( USE_BUMPMAP ) || defined( USE_NORMALMAP ) || defined( USE_SPECULARMAP ) || defined( USE_ALPHAMAP )

    varying vec2 vUv;
    uniform vec4 offsetRepeat;

#endif

#ifdef USE_LIGHTMAP

    varying vec2 vUv2;

#endif
#if defined( USE_ENVMAP ) && ! defined( USE_BUMPMAP ) && ! defined( USE_NORMALMAP )

    varying vec3 vReflect;

    uniform float refractionRatio;
    uniform bool useRefract;

#endif

uniform vec3 ambient;
uniform vec3 diffuse;
uniform vec3 emissive;

uniform vec3 ambientLightColor;

#if MAX_DIR_LIGHTS > 0

    uniform vec3 directionalLightColor[ MAX_DIR_LIGHTS ];
    uniform vec3 directionalLightDirection[ MAX_DIR_LIGHTS ];

#endif

THREE.js shaders are conveniently generated out of chunks

The most surprising thing about viewing the built-in shaders for me was seeing all of the cryptic #if and #endifs. These signify chunks of smaller shader components that have been combined together to form the final shader. They can be turned on and off with the #if statements; each one checks if that definition exists in the program. (the definitions above are DOUBLE_SIDED, USE_MAP et cetera) If it has, then subsiquint block of code is enabled and executes happily.

This doesn't clear up much - it moves the problem from demistifying the #ifs to demistifying the definitions! How then, you ask, are the definitions controlled? They are added in a special prefix chunk that comes at the beginning of the shader:

prefix_vertex = [

    "precision " + parameters.precision + " float;",
    "precision " + parameters.precision + " int;",

    customDefines,

    parameters.supportsVertexTextures ? "#define VERTEX_TEXTURES" : "",

    _this.gammaInput ? "#define GAMMA_INPUT" : "",
    _this.gammaOutput ? "#define GAMMA_OUTPUT" : "",

    "#define MAX_DIR_LIGHTS " + parameters.maxDirLights,
    "#define MAX_POINT_LIGHTS " + parameters.maxPointLights,
    "#define MAX_SPOT_LIGHTS " + parameters.maxSpotLights,
    "#define MAX_HEMI_LIGHTS " + parameters.maxHemiLights,

    "#define MAX_SHADOWS " + parameters.maxShadows,

    "#define MAX_BONES " + parameters.maxBones,

    parameters.map ? "#define USE_MAP" : "",
    parameters.envMap ? "#define USE_ENVMAP" : "",
    parameters.lightMap ? "#define USE_LIGHTMAP" : "",
    parameters.bumpMap ? "#define USE_BUMPMAP" : "",
    parameters.normalMap ? "#define USE_NORMALMAP" : "",
    parameters.specularMap ? "#define USE_SPECULARMAP" : "",
    parameters.alphaMap ? "#define USE_ALPHAMAP" : "",
    parameters.vertexColors ? "#define USE_COLOR" : "",

    ...

]

And here is revealed the link between the #ifs within the shaders and the JavaScript objects that control them. parameters.map adds a #define USE_MAP line to the prefix, thus enabling the #if USE_MAP block lower down. Cool stuff!

We can now explore the source some more to find where THREE sets the parameters, so that we can control this functionality for ourselves. It is found in the initMaterial method of the massive THREE.WebGLRenderer object.

parameters = {

    precision: _precision,
    supportsVertexTextures: _supportsVertexTextures,

    map: !! material.map,
    envMap: !! material.envMap,
    lightMap: !! material.lightMap,
    bumpMap: !! material.bumpMap,
    normalMap: !! material.normalMap,
    specularMap: !! material.specularMap,
    alphaMap: !! material.alphaMap,

    vertexColors: material.vertexColors,

    fog: fog,
    useFog: material.fog,
    fogExp: fog instanceof THREE.FogExp2,

    sizeAttenuation: material.sizeAttenuation,
    logarithmicDepthBuffer: _logarithmicDepthBuffer,

    skinning: material.skinning,
    maxBones: maxBones,
    useVertexTexture: _supportsBoneTextures && object && object.skeleton && object.skeleton.useVertexTexture,

    morphTargets: material.morphTargets,
    morphNormals: material.morphNormals,
    maxMorphTargets: this.maxMorphTargets,
    maxMorphNormals: this.maxMorphNormals,

    maxDirLights: maxLightCount.directional,
    maxPointLights: maxLightCount.point,
    maxSpotLights: maxLightCount.spot,
    maxHemiLights: maxLightCount.hemi,

    maxShadows: maxShadows,
    shadowMapEnabled: this.shadowMapEnabled && object.receiveShadow && maxShadows > 0,
    shadowMapType: this.shadowMapType,
    shadowMapDebug: this.shadowMapDebug,
    shadowMapCascade: this.shadowMapCascade,

    alphaTest: material.alphaTest,
    metal: material.metal,
    wrapAround: material.wrapAround,
    doubleSided: material.side === THREE.DoubleSide,
    flipSided: material.side === THREE.BackSide

};

Nice! It is noteworthy here the peculiar JavaScript construct that I like to call the "double bang-pang balooza", or !!. A single exclaimation mark acts as a logical NOT, it inverts TRUE to FALSE and backwards again, FALSE to TRUE. If the object you give it is not already a boolean value, it will do an implicit conversion. These sorts of conversions are infamous in JavaScript for making debugging very difficult. I'm sure we have all made a typo when accessing a property of an object, causing the expression to return undefined, and only later on in the program when we see ERROR: undefined is not a function can we see the implications of our little typo. Or, worse yet, is when you get NaN in an expression that will silently spread throughout a system without raising any errors or warnings, leaving us baffled a finished component starts acting buggy. Anyways, back to the !!. The interpreter will follow these three steps when evaluating the expression:

  1. Convert the object to a boolean
  2. NOT the result
  3. NOT the result

What we get is a neat, shorthand way of getting the boolean representation of an object, determined by if it is a "falsey" or "truthy" type. Saying map: !!material.map is analogous to "set map to true if material.map is defined".

In order to enable, say, USE_MAP in a custom shader that builds off of the pre-built THREE.js ones, you have to both add it as a uniform (a process that usually automated, but pretty trivial to do yourself nonetheless) and as obj.map, otherwise the codeblock won't be activated.

Uniforms, too, are built out of chunks!

If you try to run some code using all we know now, you'll still get an error: "Uncaught TypeError: Cannot set property 'value' of undefined". The value in question here belongs to the uniforms that you pass into your THREE.ShaderMaterial, or, rather, the uniforms that you haven't. Obviously, the precompiled shaders in THREE need a lot of information about our scene to function properly; they get all of this information through the uniforms. Luckily for us, there is another module - THREE.UniformsLib - that provides us with everything to get up and running. Like the shaders, it is divided into small chunks that can be combined together.

I ended up with the following code that emulates THREE.MeshBasicMaterial functionality, using the default precompiled shaders. The difference, now, is that we are free to edit them to our liking!

engine.materials.basic = function(config) {

    var uniforms = THREE.UniformsUtils.merge([
        THREE.UniformsLib.common,
        // others go here
    ]);

    uniforms.map.value = config.map;
    var mat = new THREE.ShaderMaterial({
        vertexShader: engine.shaders['basic.vert'],
        fragmentShader: engine.shaders['basic.frag'],
        uniforms: uniforms
    });

    mat.map = config.map;

    return mat;
}

'basic.vert' and 'basic.frag' are saved from THREE.ShaderLib.basic.vertexShader and THREE.ShaderLib.basic.fragmentShader, respectively.

Modifying THREE.MeshBasicMaterial

I said above that my first step was to duplicate THREE.MeshBasicMaterial using shaders that I could customize. check! The next step was to add lighting functionality. This is actually pretty trivial on part of the JavaScript. I edited the above code to make this:

engine.materials.darkness = function(config) {

    var uniforms = THREE.UniformsUtils.merge([
        THREE.UniformsLib.common,
        THREE.UniformsLib.lights
    ]);

    uniforms.map.value = config.map;
    var mat = new THREE.ShaderMaterial({
        lights: true,
        vertexShader: engine.shaders['darkness.vert'],
        fragmentShader: engine.shaders['darkness.frag'],
        uniforms: uniforms
    });

    mat.map = config.map;

    return mat;
}

For the shaders it gets a little bit more complicated, but not by a lot. It turns out that THREE does all of the lighting computations in the vertex shader, and then the vec3 RGB result is sent as as a varying under the name vLightFront (or vLightBack if you're into that kind of stuff). Since THREE.MeshBasicMaterial is completely shadeless by default, I ended up copying lambert's vertex shader, which has a nice little procedure for calculating vLightFront/Back out of all the lighting uniforms that are passed in automatically by THREE.

In the fragment shader, it was just a matter of acknowledging the lighting varyings, then strategically inserting the following line:

gl_FragColor.xyz *= vLightFront;

That's it! And we can refresh our browser to be greeted with some beautiful, new lighting effects.

Fear of the Dark

Now to make things spooky! Again, this is actually a simple process now that everything's set up. A few simple operations on vLightFront transform this lame shader into something that actually starts to look very usable!

vec3 not_vLightFront = vLightFront - 0.9; 
not_vLightFront = not_vLightFront * 3.0;
not_vLightFront = min(not_vLightFront, 1.0);

gl_FragColor.xyz *= not_vLightFront;

I had to create a temporary variable not_vLightFront because varyings are immutable, incase you're curious why that is.

This system does not actually implement what I opened with. It just exaggerates lighting based off of the face normals. We'll look into doing that in the next post!

Sunday, August 24, 2014

Game Project Part 1: Loading a Simple Map

Commit 63a71e1f1562d547fd45d0e91a16fc236707962f

I have decided to build this project off of my old codebase. It contains quite a few useful implementations, such as one for THREE.js octrees, and several pathfinding algorithms. You can view the Github repo at github.com/MontyThibault/JSGame. Since I assume the majority of readers will be reading this at much later date, I'll be putting the git commit hash at the time of writing at the top of each post - that way, if you want, you can back up the repository to exactly what it's like for me right now!

Creating the Assets

I first wanted to emulate the type of terrain that was in my old RTS engine. This consisted of a few valleys and mountains to indicate passable and impassable regions. I'll be able to use those in the future to experiment with pathfinding capabilities for my game.

My background in 3d graphics comes in very useful for this kind of thing. The process goes about as follows:

  1. sculpt a high-resolution mesh
  2. run a decimate modifier to reduce polycount
  3. create special linkages for the movement system
  4. set up materials/lighting
  5. bake out the colourmap
  6. export mesh using the Blender THREE.js exporter

Here's the result of playing around in Blender for about half an hour:

Blender terrain render

A fantastic feature of the new cycles render engine is the ability to bake out textures based on properties of object/3d scene. ex. shadow maps, ambient occlusion maps, normal maps, et cetera. Here is a texture for all the color values in the final render, mapped to the terrain's UV coordinates.

Terrain colourmap

I am going to use this as the colourmap for my basic terrain. It includes shadows and the likes already baked into it, so it will fake the effect of having lights/more complex shaders in a simple flat-shading system like what I have right now. In the future, I will replace this with a complete system of real lights/shaders in WebGL itself and then it'll begin to look really good. Until then, this colourmap should provide us with a way to get descently asthetic results.

Importing into THREE.js

After exporting the model, including the UVs and normals, to a JSON object format thanks to the THREE.js Blender exporter, it's time to load the model into the browser. This process is made simple thanks to the built-in utilities provided by THREE.js. This is the code for processing everything:

function load(callback) {
        var loader = new THREE.JSONLoader();
        $path.text('assets/samplemap/map.js');
        loader.load('assets/samplemap/map.js', function (geometry) {

            $path.text('assets/samplemap/Colormap.png');
            THREE.ImageUtils.loadTexture('assets/samplemap/Colormap.png', 
                THREE.UVMapping, function(texture) {

                texture.magFilter = THREE.NearestFilter;
                texture.minFilter = THREE.NearestFilter;
                texture.anisotropy = 16;

                exports.material = new THREE.MeshBasicMaterial({
                    map: texture
                });

                exports.mesh = new THREE.Mesh(geometry, exports.material);
                exports.mesh.scale.set(1, 1, -1);

                callback(exports.mesh);
            });
        });
    }

As you can see, it's really not anything too tricky. THREE.js handles all the hard stuff for us. You might notice that I had to invert the Z axis on the model for it to display correctly. It could be a bug with the exporter not converting between coordinate systems, though I'd rather not have to hunt down the reason.

It took a few hours of debugging to update my old code to work with the new system. A combination of the inverted-Z axis program, some design issues in the camera module, and updated THREE.js API caused the program to render a blank screen without giving any error messages. Good ol' JavaScript kept putting NaNs into the calculations without any warnings. Maybe someday I'll create a more rhobust system to debug scenegraph objects, like you might find in Unity, though that could be considered a new project in itself!

Anyways, the final system is just as beautiful as you'd think it'd be. You can access it online at montythibault.github.com/JSGame/ (this is probably a newer version if you're reading in the future. You'd have to clone the Git repo and go back to the commit with the hash I provided if you're dying to see it, which I just know you are!)

Here's a screenshot:

Screenshot of finished terrain system

Saturday, August 23, 2014