Robust Solving of Optical Motion Capture Data by Denoising

09/06/2018

This year at SIGGRAPH I will be presenting a paper with the goal of removing the need for manual cleaning of motion capture data. The core component of the paper is a deep neural network which learns a mapping between motion capture marker data (which may be unclean) and the the final joint positions and rotations of the character. As it isn't always easy to acquire large databases of unclean and cleaned motion capture data we also present a novel method for data generation where we first attach markers to a character skeleton and then randomly corrupt the marker positions in millions of different ways using a noise function designed to emulate the typical kinds of errors that appear in motion capture data. This results in a method which far more accessible as all it requires is a large database of skeletal motion capture, many of which are freely available online such as the CMU motion capture database.

Webpage • Paper • Video • Article

Abstract: Raw optical motion capture data often includes errors such as occluded markers, mislabeled markers, and high frequency noise or jitter. Typically these errors must be fixed by hand - an extremely time-consuming and tedious task. Due to this, there is a large demand for tools or techniques which can alleviate this burden. In this research we present a tool that sidesteps this problem, and produces joint transforms directly from raw marker data (a task commonly called "solving") in a way that is extremely robust to errors in the input data using the machine learning technique of denoising. Starting with a set of marker configurations, and a large database of skeletal motion data such as the CMU motion capture database [CMU 2013b], we synthetically reconstruct marker locations using linear blend skinning and apply a unique noise function for corrupting this marker data - randomly removing and shifting markers to dynamically produce billions of examples of poses with errors similar to those found in real motion capture data. We then train a deep denoising feed-forward neural network to learn a mapping from this corrupted marker data to the corresponding transforms of the joints. Once trained, our neural network can be used as a replacement for the solving part of the motion capture pipeline, and, as it is very robust to errors, it completely removes the need for any manual clean-up of data. Our system is accurate enough to be used in production, generally achieving precision to within a few millimeters, while additionally being extremely fast to compute with low memory requirements.