Icon Quaternion Weighted Average


Icon BVHView


Icon Dead Blending Node in Unreal Engine


Icon Propagating Velocities through Animation Systems


Icon Cubic Interpolation of Quaternions


Icon Dead Blending


Icon Perfect Tracking with Springs


Icon Creating Looping Animations from Motion Capture


Icon My Favourite Things


Icon Inertialization Transition Cost


Icon Scalar Velocity


Icon Tags, Ranges and Masks


Icon Fitting Code Driven Displacement


Icon atoi and Trillions of Whales


Icon SuperTrack: Motion Tracking for Physically Simulated Characters using Supervised Learning


Icon Joint Limits


Icon Code vs Data Driven Displacement


Icon Exponential Map, Angle Axis, and Angular Velocity


Icon Encoding Events for Neural Networks


Icon Visualizing Rotation Spaces


Icon Spring-It-On: The Game Developer's Spring-Roll-Call


Icon Interviewing Advice from the Other Side of the Table


Icon Saguaro


Icon Learned Motion Matching


Icon Why Can't I Reproduce Their Results?


Icon Latinendian vs Arabendian


Icon Machine Learning, Kolmogorov Complexity, and Squishy Bunnies


Icon Subspace Neural Physics: Fast Data-Driven Interactive Simulation


Icon Software for Rent


Icon Naraleian Caterpillars


Icon The Scientific Method is a Virus


Icon Local Minima, Saddle Points, and Plateaus


Icon Robust Solving of Optical Motion Capture Data by Denoising


Icon Simple Concurrency in Python


Icon The Software Thief


Icon ASCII : A Love Letter


Icon My Neural Network isn't working! What should I do?


Icon Phase-Functioned Neural Networks for Character Control


Icon 17 Line Markov Chain


Icon 14 Character Random Number Generator


Icon Simple Two Joint IK


Icon Generating Icons with Pixel Sorting


Icon Neural Network Ambient Occlusion


Icon Three Short Stories about the East Coast Main Line


Icon The New Alphabet


Icon "The Color Munifni Exists"


Icon A Deep Learning Framework For Character Motion Synthesis and Editing


Icon The Halting Problem and The Moral Arbitrator


Icon The Witness


Icon Four Seasons Crisp Omelette


Icon At the Bottom of the Elevator


Icon Tracing Functions in Python


Icon Still Things and Moving Things


Icon water.cpp


Icon Making Poetry in Piet


Icon Learning Motion Manifolds with Convolutional Autoencoders


Icon Learning an Inverse Rig Mapping for Character Animation


Icon Infinity Doesn't Exist


Icon Polyconf


Icon Raleigh


Icon The Skagerrak


Icon Printing a Stack Trace with MinGW


Icon The Border Pines


Icon You could have invented Parser Combinators


Icon Ready for the Fight


Icon Earthbound


Icon Turing Drawings


Icon Lost Child Announcement


Icon Shelter


Icon Data Science, how hard can it be?


Icon Denki Furo


Icon In Defence of the Unitype


Icon Maya Velocity Node


Icon Sandy Denny


Icon What type of Machine is the C Preprocessor?


Icon Which AI is more human?


Icon Gone Home


Icon Thoughts on Japan


Icon Can Computers Think?


Icon Counting Sheep & Infinity


Icon How Nature Builds Computers


Icon Painkillers


Icon Correct Box Sphere Intersection


Icon Avoiding Shader Conditionals


Icon Writing Portable OpenGL


Icon The Only Cable Car in Ireland


Icon Is the C Preprocessor Turing Complete?


Icon The aesthetics of code


Icon Issues with SDL on iOS and Android


Icon How I learned to stop worrying and love statistics


Icon PyMark


Icon AutoC Tools


Icon Scripting xNormal with Python


Icon Six Myths About Ray Tracing


Icon The Web Giants Will Fall


Icon PyAutoC


Icon The Pirate Song


Icon Dear Esther


Icon Unsharp Anti Aliasing


Icon The First Boy


Icon Parallel programming isn't hard, optimisation is.


Icon Skyrim


Icon Recognizing a language is solving a problem


Icon Could an animal learn to program?




Icon Pure Depth SSAO


Icon Synchronized in Python


Icon 3d Printing


Icon Real Time Graphics is Virtual Reality


Icon Painting Style Renderer


Icon A very hard problem


Icon Indie Development vs Modding


Icon Corange


Icon 3ds Max PLY Exporter


Icon A Case for the Technical Artist


Icon Enums


Icon Scorpions have won evolution


Icon Dirt and Ashes


Icon Lazy Python


Icon Subdivision Modelling


Icon The Owl


Icon Mouse Traps


Icon Updated Art Reel


Icon Tech Reel


Icon Graphics Aren't the Enemy


Icon On Being A Games Artist


Icon The Bluebird


Icon Everything2


Icon Duck Engine


Icon Boarding Preview


Icon Sailing Preview


Icon Exodus Village Flyover


Icon Art Reel




Icon One Cat Just Leads To Another

Encoding Events for Neural Networks

Created on July 8, 2021, 10 p.m.

One of the big advantages of using neural networks for generating animation in video games is that once a neural network has been trained, the animation data it was trained on can be thrown away, and the network used to generated the animation on-the-fly - potentially saving huge amounts of memory.

But when we start getting into the details, things start to get a bit more complicated. For example, there can be all kinds of meta-data that gets attached to animations in games such as tags or markups to label particular states or sections of an animation. There are also events, such as audio events, triggers, and synchronization points. We can even have things like pointers to other assets, comments, timestamps, and all sorts of other unstructured data. If we still want the benefit of throwing away all our animation data we need to encode all of this additional information in a way which is suitable for the neural network to generate at runtime.

Some of these are easier than others. If we have tags or markups associated with different ranges in the animation data we can use a one-hot encoding indicating the presence of each of these different tags at each frame, and train the network to output this additional one-hot vector along with the pose.

More tricky to encode are single frame events such as sound events, synchronization points, triggers, and other indicators. If we try to encode these as one-hot vectors as we did with the tags we have two problems: first, a problem of data imbalance as generally >99% of frames will not contain any events (and as such the neural network can achieve a fantastic accuracy by just ignoring this output altogether), and secondly, because we are only outputting a value at a single specific frame, we may find we miss or skip over an event if the frame-rate drops or the network does not make an accurate enough prediction.

So here is my little trick for encoding single frame events in a way which is far more suitable for neural networks:

The first step is to take your track of single frame events, and instead compute the signed time until the nearest event. So for example, if the nearest event is two seconds in the future you return 2, while if the nearest event was one that happened three seconds ago you return -3. Here is what this might look like visualized, where the single-frame events are shown in red and the signed time until the event is shown in blue:


The intuition behind this transformation is that if we can have the network produce something like the signed time until the event we can trigger the event at the exact moment this output crosses zero. Using this encoding we can also apprehend events if need be by simply triggering the event a little before the output crosses zero. This can be useful for events such as sound or network events where we may need to prepare things a little bit in advance. Finally, since this value is not sparse, the network will not be able to simply ignore it as it would in the single-frame encoding.

However this encoding still has problems. Firstly, this value is unbounded - meaning we will be asking the network to produce a huge number when the nearest event is a very long time away, and secondly, we have a large discontinuity when the nearest event changes and we suddenly go from a large negative value to a large positive value.

To fix the first issue we're going to clip this time to some maximum and minimum time - this is going to limit how much we can apprehend an event (or know how long it has been since an event) but it will mean the value is no longer unbounded, making it more suitable for prediction. For neatness we can also re-scale this into the range -1 to 1 by dividing by this maximum. Here I set the maximum to 30 frames.


To fix the second issue we're going to re-scale this again to the range \( -\pi \) to \( \pi \), treat it like an angle, and convert it to Cartesian coordinates using sin and cos:


If you look closely you will see this has removed all discontinuities and encoded both the minimum time and maximum time as the same value. However, the gradient of this signal can still suddenly change, so I like to apply a tiny bit of Gaussian smoothing to make sure the resulting signal is nice and smooth:


Now we have something that is non-sparse, bounded, smooth, and continuous - the perfect kind of signal for a neural network to predict.

Once trained, if we want to recover the original signed time until event again from our network prediction, all we need to do is apply the same operations in reverse: First, convert back from Cartesian coordinates to an angle using atan2, then re-scale back from the range \( -\pi \), \( \pi \) to the minimum time and maximum time, and voila! Here you can see a reconstruction of what that looks like compared to the original time until event:


One thing you might notice about the reconstruction is that we need to treat the minimum time and maximum time as synonymous in meaning - both represent the same thing: that we don't know about the timing of the event because either it was too long ago, or it will happen at some point too far in the future. Similarly, we can see that our smoothing means we don't get exactly the right time predicted when the event is close to the maximum. But, we did preserve the fact that our prediction smoothly crosses zero at exactly when the event occurs which is more important.

Also important to know (although you cannot see it in this example) is that for events happening very frequently the time until event value can actually cross zero in the wrong direction when it is ramping up again. This means we need to be a little careful when we apprehend events: we should ignore the predicted time until event value completely when it is increasing rather than decreasing (and assume we don't know anything about the occurrence of the event). Similarly, when we check to see if we should trigger an event we need to make sure that the time until event value is crossing zero in the right direction (from positive to negative), moving at an appropriate speed, and in a relatively small range close to zero when doing so.

But other than those small caveats I've found this system to work remarkably well both for frequent events like foot contacts and also very sparse, long term events that only happen every so often. By controlling the maximum time and minimum time we can control how much apprehension is available vs how much precision is desired for the exact timing (I've found it's good to set this value to something close to the average period between events), and the whole thing only requires two additional outputs per event to be generated. Not bad!

github twitter rss