Icon SuperTrack: Motion Tracking for Physically Simulated Characters using Supervised Learning

 

Icon Joint Limits

 

Icon Code vs Data Driven Displacement

 

Icon Exponential Map, Angle Axis, and Angular Velocity

 

Icon Encoding Events for Neural Networks

 

Icon Visualizing Rotation Spaces

 

Icon Spring-It-On: The Game Developer's Spring-Roll-Call

 

Icon Interviewing Advice from the Other Side of the Table

 

Icon Saguaro

 

Icon Learned Motion Matching

 

Icon Why Can't I Reproduce Their Results?

 

Icon Latinendian vs Arabendian

 

Icon Machine Learning, Kolmogorov Complexity, and Squishy Bunnies

 

Icon Subspace Neural Physics: Fast Data-Driven Interactive Simulation

 

Icon Software for Rent

 

Icon Naraleian Caterpillars

 

Icon The Scientific Method is a Virus

 

Icon Local Minima, Saddle Points, and Plateaus

 

Icon Robust Solving of Optical Motion Capture Data by Denoising

 

Icon Simple Concurrency in Python

 

Icon The Software Thief

 

Icon ASCII : A Love Letter

 

Icon My Neural Network isn't working! What should I do?

 

Icon Phase-Functioned Neural Networks for Character Control

 

Icon 17 Line Markov Chain

 

Icon 14 Character Random Number Generator

 

Icon Simple Two Joint IK

 

Icon Generating Icons with Pixel Sorting

 

Icon Neural Network Ambient Occlusion

 

Icon Three Short Stories about the East Coast Main Line

 

Icon The New Alphabet

 

Icon "The Color Munifni Exists"

 

Icon A Deep Learning Framework For Character Motion Synthesis and Editing

 

Icon The Halting Problem and The Moral Arbitrator

 

Icon The Witness

 

Icon Four Seasons Crisp Omelette

 

Icon At the Bottom of the Elevator

 

Icon Tracing Functions in Python

 

Icon Still Things and Moving Things

 

Icon water.cpp

 

Icon Making Poetry in Piet

 

Icon Learning Motion Manifolds with Convolutional Autoencoders

 

Icon Learning an Inverse Rig Mapping for Character Animation

 

Icon Infinity Doesn't Exist

 

Icon Polyconf

 

Icon Raleigh

 

Icon The Skagerrak

 

Icon Printing a Stack Trace with MinGW

 

Icon The Border Pines

 

Icon You could have invented Parser Combinators

 

Icon Ready for the Fight

 

Icon Earthbound

 

Icon Turing Drawings

 

Icon Lost Child Announcement

 

Icon Shelter

 

Icon Data Science, how hard can it be?

 

Icon Denki Furo

 

Icon In Defence of the Unitype

 

Icon Maya Velocity Node

 

Icon Sandy Denny

 

Icon What type of Machine is the C Preprocessor?

 

Icon Which AI is more human?

 

Icon Gone Home

 

Icon Thoughts on Japan

 

Icon Can Computers Think?

 

Icon Counting Sheep & Infinity

 

Icon How Nature Builds Computers

 

Icon Painkillers

 

Icon Correct Box Sphere Intersection

 

Icon Avoiding Shader Conditionals

 

Icon Writing Portable OpenGL

 

Icon The Only Cable Car in Ireland

 

Icon Is the C Preprocessor Turing Complete?

 

Icon The aesthetics of code

 

Icon Issues with SDL on iOS and Android

 

Icon How I learned to stop worrying and love statistics

 

Icon PyMark

 

Icon AutoC Tools

 

Icon Scripting xNormal with Python

 

Icon Six Myths About Ray Tracing

 

Icon The Web Giants Will Fall

 

Icon PyAutoC

 

Icon The Pirate Song

 

Icon Dear Esther

 

Icon Unsharp Anti Aliasing

 

Icon The First Boy

 

Icon Parallel programming isn't hard, optimisation is.

 

Icon Skyrim

 

Icon Recognizing a language is solving a problem

 

Icon Could an animal learn to program?

 

Icon RAGE

 

Icon Pure Depth SSAO

 

Icon Synchronized in Python

 

Icon 3d Printing

 

Icon Real Time Graphics is Virtual Reality

 

Icon Painting Style Renderer

 

Icon A very hard problem

 

Icon Indie Development vs Modding

 

Icon Corange

 

Icon 3ds Max PLY Exporter

 

Icon A Case for the Technical Artist

 

Icon Enums

 

Icon Scorpions have won evolution

 

Icon Dirt and Ashes

 

Icon Lazy Python

 

Icon Subdivision Modelling

 

Icon The Owl

 

Icon Mouse Traps

 

Icon Updated Art Reel

 

Icon Tech Reel

 

Icon Graphics Aren't the Enemy

 

Icon On Being A Games Artist

 

Icon The Bluebird

 

Icon Everything2

 

Icon Duck Engine

 

Icon Boarding Preview

 

Icon Sailing Preview

 

Icon Exodus Village Flyover

 

Icon Art Reel

 

Icon LOL I DREW THIS DRAGON

 

Icon One Cat Just Leads To Another

Encoding Events for Neural Networks

Created on July 8, 2021, 10 p.m.

One of the big advantages of using neural networks for generating animation in video games is that once a neural network has been trained, the animation data it was trained on can be thrown away, and the network used to generated the animation on-the-fly - potentially saving huge amounts of memory.

But when we start getting into the details, things start to get a bit more complicated. For example, there can be all kinds of meta-data that gets attached to animations in games such as tags or markups to label particular states or sections of an animation. There are also events, such as audio events, triggers, and synchronization points. We can even have things like pointers to other assets, comments, timestamps, and all sorts of other unstructured data. If we still want the benefit of throwing away all our animation data we need to encode all of this additional information in a way which is suitable for the neural network to generate at runtime.

Some of these are easier than others. If we have tags or markups associated with different ranges in the animation data we can use a one-hot encoding indicating the presence of each of these different tags at each frame, and train the network to output this additional one-hot vector along with the pose.

More tricky to encode are single frame events such as sound events, synchronization points, triggers, and other indicators. If we try to encode these as one-hot vectors as we did with the tags we have two problems: first, a problem of data imbalance as generally >99% of frames will not contain any events (and as such the neural network can achieve a fantastic accuracy by just ignoring this output altogether), and secondly, because we are only outputting a value at a single specific frame, we may find we miss or skip over an event if the frame-rate drops or the network does not make an accurate enough prediction.

So here is my little trick for encoding single frame events in a way which is far more suitable for neural networks:


The first step is to take your track of single frame events, and instead compute the signed time until the nearest event. So for example, if the nearest event is two seconds in the future you return 2, while if the nearest event was one that happened three seconds ago you return -3. Here is what this might look like visualized, where the single-frame events are shown in red and the signed time until the event is shown in blue:

nn_events_raw

The intuition behind this transformation is that if we can have the network produce something like the signed time until the event we can trigger the event at the exact moment this output crosses zero. Using this encoding we can also apprehend events if need be by simply triggering the event a little before the output crosses zero. This can be useful for events such as sound or network events where we may need to prepare things a little bit in advance. Finally, since this value is not sparse, the network will not be able to simply ignore it as it would in the single-frame encoding.

However this encoding still has problems. Firstly, this value is unbounded - meaning we will be asking the network to produce a huge number when the nearest event is a very long time away, and secondly, we have a large discontinuity when the nearest event changes and we suddenly go from a large negative value to a large positive value.

To fix the first issue we're going to clip this time to some maximum and minimum time - this is going to limit how much we can apprehend an event (or know how long it has been since an event) but it will mean the value is no longer unbounded, making it more suitable for prediction. For neatness we can also re-scale this into the range -1 to 1 by dividing by this maximum. Here I set the maximum to 30 frames.

nn_events_clipped

To fix the second issue we're going to re-scale this again to the range \( -\pi \) to \( \pi \), treat it like an angle, and convert it to Cartesian coordinates using sin and cos:

nn_events_cart

If you look closely you will see this has removed all discontinuities and encoded both the minimum time and maximum time as the same value. However, the gradient of this signal can still suddenly change, so I like to apply a tiny bit of Gaussian smoothing to make sure the resulting signal is nice and smooth:

nn_events_smoothed

Now we have something that is non-sparse, bounded, smooth, and continuous - the perfect kind of signal for a neural network to predict.

Once trained, if we want to recover the original signed time until event again from our network prediction, all we need to do is apply the same operations in reverse: First, convert back from Cartesian coordinates to an angle using atan2, then re-scale back from the range \( -\pi \), \( \pi \) to the minimum time and maximum time, and voila! Here you can see a reconstruction of what that looks like compared to the original time until event:

nn_events_recon

One thing you might notice about the reconstruction is that we need to treat the minimum time and maximum time as synonymous in meaning - both represent the same thing: that we don't know about the timing of the event because either it was too long ago, or it will happen at some point too far in the future. Similarly, we can see that our smoothing means we don't get exactly the right time predicted when the event is close to the maximum. But, we did preserve the fact that our prediction smoothly crosses zero at exactly when the event occurs which is more important.

Also important to know (although you cannot see it in this example) is that for events happening very frequently the time until event value can actually cross zero in the wrong direction when it is ramping up again. This means we need to be a little careful when we apprehend events: we should ignore the predicted time until event value completely when it is increasing rather than decreasing (and assume we don't know anything about the occurrence of the event). Similarly, when we check to see if we should trigger an event we need to make sure that the time until event value is crossing zero in the right direction (from positive to negative), moving at an appropriate speed, and in a relatively small range close to zero when doing so.

But other than those small caveats I've found this system to work remarkably well both for frequent events like foot contacts and also very sparse, long term events that only happen every so often. By controlling the maximum time and minimum time we can control how much apprehension is available vs how much precision is desired for the exact timing (I've found it's good to set this value to something close to the average period between events), and the whole thing only requires two additional outputs per event to be generated. Not bad!

github twitter rss