Problem with Python Lambdas

Python is a pretty nice language in many ways. It allows very fast prototyping and experimenting with algorithms. However, it’s dynamic nature and some language design decisions can really cause some bizzare bugs. I ran into this interesting example recently relating to lambdas, variable capture, and variable scoping. At first glance the below snippets should all produce the same output.

Read More

Simulated Hovercraft Control using an RNN

Using Neural Networks for agent control and game playing is a very interesting area. In previous projects I’ve used the DQN algorithm to play connect-four, and later a Recurrent Neural Network (RNN) to control an inverted pendulum balancing agent. I think that using RNNs to control agents is a very interesting area of research, since in this case the network can implicitly learn its own suitable state representation and update method for Partially Observable Markov Decision Processes (POMDPs). I had some success training an RNN for a fairly straight-forward task (inverted pendulum), so now I wanted to try something more complex. In this case I chose the task of controlling a “hovercraft” (so a nonholonomic system) on a 2D track. I wanted to implement the decision process using an RNN, taking as input only the raw instantaneous simulated sensor data. To make the problem more complex/interesting, the sensor data is in the form of a 1D “stereo” camera on the front of the simulated hovercraft. This means that the RNN has to first correlate the two 1D images to extract the depth information, keep track of this state from frame to frame to determine its current velocity and other necessary derived state data, as well as learn the correct control policy. While at first glance this toy task looks fairly similar to the numerous other examples that can be found online, it is in fact much harder. The existing examples I’ve seen all seem to use a “sonar” type simulated sensor, and make variables like the velocity of the agent directly available to the policy function. This initial description of the problem isn’t super detailed (more detail below), so I made a quick video of this environment (in this case I am manually controlling the hovercraft):

Read More

Convolutional Neural Nets

This is a very quick and dirty overview of Convolutional Neural Networks. I did not spend too much time working with these since currently my primary interest is focused on Recurrent Networks. But this quick writeup will serve for now for the sake of completeness…

Read More

LSTM Networks

In a previous project I experimented with Recurrent Neural Nets (RNNs), their implementation, and application to character-level language modelling. In that instance I used a simple RNN architecture, more or less a standard feed-forward network with an optional recurrent connection between layers. This type of network can learn short term correlations in the input stream, but struggles to learn long-term correlations. To address this problem, a special type of RNN was proposed in 1997, called Long Short-Term Memory (LSTM). This network architecture has more in-built structure, compared to a series of fully-connected layers. I view LSTMs as modelling a “memory latch”. An LSTM cell has one data input, one output, and 3 “control” inputs. The controls can be labelled as input, forget, and read. These affect how the LSTM cell treats the incoming data, the current internal state, and the output of the cell. There are several variations of the architecture of an LSTM cell, a common one is shown below:

Read More

Reinforcement Learning using a Recurrent Neural Network

Having done some work with Recurrent Neural Networks and implemented several variations, it was time to apply them to something more interesting than character-level language modeling. To me one of the most interesting applications of RNNs is for agent control in a Partially Observable Markov Decision Process (POMDP). First, what is a POMDP? A Markov Decision Process (MDP) is a stochastic control process, where you are in some state s and you can perform some action a. This action will result in a transition into a new state s’ and a reward r. Both of these are dependent only on the tuple (s, a), rather than on the history of states and actions (this is known as the Markov Property). A POMDP is a generalisation of MDPs in that it introduces the concept of observations and the possibility that the current state s can only be deduced from observations, rather than being directly observable. In this case the agent must maintain some internal probability distribution over the states s, and integrate the observations at every time step to update this distribution. The ultimate task of an agent in a MDP and POMDP is to choose actions such that the long term reward is maximised.

Read More

CUDA - RNN Implementation

After implementing a simple Recurrent Neural Network (RNN) for character-level language modelling, the natural next step is to port this implementation to CUDA. RNNs typically take a long time to train, so using the computational power of GPUs is definitely a good idea. I have already developed a CUDA based Feed-Forward Neural Network (FFNN) implementation (as well as several variants), so I could re-use a lot of those foundations. One of the interesting aspects of a RNN is that it offers the opportunity to meaningfully exploit stream level parallelism from CUDA. For FFNNs there isn’t too much to be gained from using parallel streams, as the back-propagation algorithm is mostly sequentially dependent (there is two-way parallelism available in the backprop stage, but it’s not much). This available parallelism meant I had to structure my code a bit better and provide more abstraction than my FFNN CUDA implementations. In this “blog” entry I’ll describe the details of this implementation and several performance related tidbits.

Read More

Monte-Carlo Tree Search

I have used Monte-Carlo Tree Search (MCTS) in several projects up to this point, so I thought I should probably write a short entry on this class of algorithms. First, what are Monte-Carlo algorithms in general? These are algorithms that randomly (according to some distribution) sample some system and use the observed outcomes to infer something about the system. This is a very broad definition, but then MC algorithms are very broad as well. I think the simplest example of a Monte-Carlo algorithm is one to approximate the value of Pi. This can be done by generating random pairs of x,y points, each uniformly distributed in the range -1.0 .. 1.0. The fraction of these points that satisfies the condition x^2 + y^2 < 1 is proportional to the value of Pi. Now this is by no means a quickly converging algorithm or an efficient way of approximating Pi, but I think it is an effective example of a Monte-Carlo algorithm.

Read More

Recurrent Neural Networks

Up to this point I had only looked at stateless neural networks, such as feed-forward nets. This means that the network treats every input vector independent of every other. This is quite limiting when dealing with certain tasks where the input is not naturally a fixed-length vector, but is a stream of correlated data (eg: sentiment analysis based on the words of a document). Recurrent Neural Networks are a class of network that process data as a stream, one element at a time, keeping some internal state or memory between time steps. This class of neural network is very interesting as it can learn a hidden state representation necessary for a particular task. To me the most interesting application of RNNs would be for Reinforcement Learning an agent policy in a Partially Observable Markov Decision Process (POMDP), removing the need to explicitly model and track the hidden state. I’ll explore applying RNNs to this type of problem for a later project, for now I’m going to look into the basics of training an RNN.

Read More

Deep Q-Networks (DQN)

After playing around with the basics of reinforcement learning, I wanted to try something more advanced. The table based approach I used for Tic-Tac-Toe and a simple Grid World does not scale to more complex domains. The main reasons is that by using a table to represent the Q-value function you cannot generalise. You have to perform numerous explicit samples of a given state-action pair to have a reasonable estimate of its q-value. For a game like 3x3 Tic-Tac-Toe this is possible. But moving up to even a moderately more complex game like connect-four requires much more computation, as there are just too many possible state-action pairs to explicitly represent and explore in a reasonable time.

Read More

Reinforcement Learning

Reinforcement learning is an interesting learning paradigm, distinct from supervised and unsupervised learning. Supervised learning features an external source providing the learner with the correct/desired classification/value/action for a given input, and the learner’s aim is to learn this mapping and be able to apply it at a later time. Unsupervised learning is concerned with extracting structure and information from input, without having access to a ground truth. Clustering and dimensionality reduction are two examples. Reinforcement learning deals with an agent in some environment, performing actions and obtaining rewards. The agent’s goal is to carry out an action policy that maximises long term reward. The main distinction from say supervised learning is that there is no data given to the agent as to what action to perform for various environment states. Instead the agent has to explore different actions to discover their effects, experimenting to discern good and bad actions. Another distinct issue is reward assignment. An agent may only rarely receive rewards from the environment, relative to the frequency of actions performed. This means that it may not be directly clear which actions lead to the reward. A simple example is a chess playing agent. The environment is the chess board, and the actions are individual moves. Rewards are assignment for winning or losing a game (losing would result in negative reward), but not for intermediate board states.

Read More

CUDA - Neural Network Implementation

After writing the fractal renderer to familiarise myself with CUDA, I wanted to use it to implement a fast neural network. Neural networks are almost the ideal application for GPUs as most of the computation boils down to matrix multiplications (or similar operations) across tensors. There is little data inter-dependency and a well implemented neural network implementation can be 100x faster on a GPU than a modern CPU. The emergence of GPUs as a platform for general purpose programming is actually one of the big drivers of the emergence of “deep learning” in the last few years. Training deep models is a computationally intensive process, and modern GPUs have made this far more tractable.

Read More

CUDA - Part 0 (Fractals)

Recently I decided to learn CUDA programming (and generally about the GPU programming model). The motivation is that CPU performance is stagnating (certainly in terms of single threaded performance), while special purpose hardware like GPUs are becoming more and more powerful and useful. It seems that going forward for many interesting problems GPUs are going to become the standard platform of computation. Machine learning is an area that has seen great benefit from the power of GPUs, deep neural networks in particular. A lot of the computation involved in training neural nets boils down to matrix and vector multiplication, something GPUs are exceedingly good at.

Read More

Handwritten Digit Recognition - Part 2 (Going Deeper)

So in the previous part I described the very simple first attempt at using neural nets to perform handwritten digit recognition. In this part (in no particular order) I’ll describe some of the variations I tried, improvements I discovered, and lessons learned.

Read More

Handwritten Digit Recognition - Part 1

As mentioned in the previous post, I’ve spent some time working on the problem of hand-written digit recognition. This is a pretty fun problem and also a good domain for experimenting with neural nets. Here I’ll note down some of my experiences so far, lessons learned, and plans for future improvement.

Read More

Handwritten Digit Recognition - Part 0

Recently I remembered a fun/interesting assignment we were given in first year computing at UNSW - handwritten digit recognition. The gist of the assignment was basically that you are given as input a grey-scale image of a handwritten digit (between 0 and 9), and your program had to output which digit it is. The input images were approximately 30x30 pixels. An extension to this was to recognise postcodes, a sequence of 4 digits with the added problem of dealing with digit separation (it’s not always trivial to separate the pixels of individual digits from each other).

Read More

Downsides of Garbage Collection

This opinion may be considered backward or anachronistic by many “modern” programmers, but I believe that garbage collection is harmful. To be clear, I don’t just mean harmful for performance, but harmful for program design and architecture. It is a sad reality that GC’ed languages currently dominate the landscape. C and C++ are the only mainstream/mature languages that offer manual memory management, with Rust and D providing additional alternatives, if maturity/widespread industry acceptance are not important criteria. I will layout an argument against GC, approaching it from both ends. I will argue that the benefits that GC is thought to provide are overstated, and in fact leads to often overlooked problems. I will also argue that modern “manual” (non-GC) memory management is not nearly as complicated, error prone, and tedious as many people may think.

Read More

Hello World

Hello, my name is Oleg and welcome to my blog! I intend for this blog to be mainly about programming and software, but may have a few other topics randomly thrown in. My plan is to mostly treat this as a sort of personal diary/stream of consciousness outlet, rather that trying to create some kind of publicly popular blog. So don’t really expect too much in terms of production values.

Read More