Implementing Control Operators

11/08/2025

This year at SIGGRAPH Asia we are presenting "Control Operators for Interactive Character Animation", which is a paper about our new framework for implementing and thinking about control mechanisms for interactive character controllers.

This topic is something which has been bubbling away in my brain over the last few years, and was an incredibly fun project to work on with Ruiyu Gou, and in particular to see Control Operators combined with the really powerful Flow-Matching model she developed.

But what the hell are Control Operators?... was a question asked by many reviewers and colleagues...

Well, to put it as briefly as possible, Control Operators are a way of differentiably encoding arbitrarily structured data in a way which is principled, controllable, and allows for the construction of abstract, re-usable, parts.

They let you take data that looks like this...

input = [

    # Batch Element 0
    [
        {
            'name': 'slime',
            'class': 'enemy',
            'location': torch.as_tensor([0.1, 0.5, 1.1]),
            'rotation': torch.as_tensor([1.0, 0.0, 0.0, 0.0]),
            'links': [4, 7, 9],
            'aiming': None,
            'state': ['allocated', 'alive']
        },
        {
            'name': 'chair_01',
            'class': 'prop',
            'location': torch.as_tensor([0.1, -10.0, 1.1]),
            'rotation': torch.as_tensor([-1.0, 0.0, 0.0, 0.0]),
            'links': [],
            'aiming': None,
            'state': ['allocated']
        },
        {
            'name': 'shotgun',
            'class': 'weapon',
            'location': torch.as_tensor([-0.5, -0.5, 0.0]),
            'rotation': torch.as_tensor([0.0, 1.0, 0.0, 0.0]),
            'links': [2],
            'aiming': torch.as_tensor([1.0, 0.0, 0.0]),
            'state': []
        },
    ],
    
    # Batch Element 1
    [
        {
            'name': 'flower',
            'class': 'prop',
            'location': torch.as_tensor([1.2, -5.0, 5.1]),
            'rotation': torch.as_tensor([0.0, 0.0, 1.0, 0.0]),
            'links': [23, 5, 8, 12, 5],
            'aiming': None,
            'state': ['allocated', 'alive']
        },
    ]
]

and differentiably encode it into a tensor like this

>>> print(encoder(input))
tensor([[-0.0464, -0.1774, -0.1264,  ..., -0.1592, -0.0488, -0.0406],
        [-0.0097, -0.2288, -0.1953,  ..., -0.2882, -0.1435, -0.0524]],
       grad_fn=<ViewBackward0>)

...which can be further passed into other neural network structures as required.

In video games this is useful because what we want to provide as input to neural networks can vary a lot and be made up of data from all sorts of different sources, that might change from frame to frame, contain missing entries, or include multiple different behaviours or control mechanisms.

For example, we might have an AI system which wants to provide a target location for the character to move to, sometimes this might also have a desired time-until-arrival, or a facing direction - and sometimes it might not. Other times the AI system may want to give a path to follow, sometimes including a style. Sometimes it might want to give the location of other objects or characters in the scene, and the number of those things might vary depending on what is near the character. Control Operators allow you to train a single network which can take in all of those different kinds of inputs, without building a bespoke network structure and data processing pipeline for every different variation.

And although in the paper we only show Control Operators used for interactive character animation, the idea is not limited to animation - and there is no real reason why Control Operators can't be used for all kinds of Machine Learning systems that take as input data which might not be structured in a regular or easy to consume way. For example, Control Operators are essentially already used in Learning Agents (Unreal Engine's Reinforcement Learning plugin), which is much broader in scope than just animation.

In the paper, the implementation we mention uses Unreal Engine's Blueprint Visual Scripting Language. While this makes it accessible to non-technical users, it does make the implementation quite complex - in particular on the C++ side of things. And in general, implementing Control Operators in a way that is maximally efficient and performant can introduce some complications in terms of things such as batching.

But if we focus on just clarity and a pure Python implementation, Control Operators can actually be implemented and understood in just a few hundred lines of code, which is what I was hoping to show in this article - to give a different perspective to how they are explained in the paper.

The Interface
Vector
Typed Operators
And / Struct
Or / Union
Fixed Array
OneOf / Enum & SomeOf / Flags
Index
String
Null
Optional / Maybe
Set
Array
Dictionary
Encoded
Conclusion & Limitations

The Interface

The best way to think about implementing Control Operators in Python is to consider them as special kinds of PyTorch Modules. Like PyTorch Modules, they are also function-like objects, which can have trainable parameters, and which need to be instanced and then evaluated on some input data.

But unlike PyTorch Modules, Control Operators have an additional simple interface that must be respected. They must implement a output_size function which describes the output dimensionality of the vector they produce (we'll call it D), and for the forward function, they must always take as input a list of some objects of length N, and produce as output a tensor of shape (N, D)

class ControlOperator(torch.nn.Module):

    # Must specify the output dimensionality `D` of the tensor produced by `forward` 
    def output_size(self) -> int:
        pass
    
    # Must take as input a list of `N` elements and output a tensor of shape `(N, D)`
    def forward(self, x : List[Any]) -> torch.FloatTensor:
        pass

Vector

One of the most basic operators we can implement which respects this interface is an operator which takes as input a list of one dimensional PyTorch tensors of a fixed size, and stacks those into a single large tensor:

class Vector(ControlOperator):
    
    def __init__(self, size : int) -> None:
        super().__init__()
        self.size = size
    
    def output_size(self) -> int:
        return self.size
        
    def forward(self, x : List[torch.FloatTensor]) -> torch.FloatTensor:
        assert all(len(xb.shape) == 1 for xb in x)
        assert all(xb.shape[0] == self.size for xb in x)
        return torch.stack(x, dim=0)

This we can instantiate and use like we would any other PyTorch module:

encoder = Vector(size=3)

encoded = encoder([
    torch.as_tensor([ 0.2, 0.5, 0.1]),
    torch.as_tensor([-1.1, 0.6, 0.4])
])

It doesn't do anything exciting of course - but represents the basic way Control Operators consume input data and produce tensors as output.

>>> print(encoded)
tensor([[ 0.2000,  0.5000,  0.1000],
        [-1.1000,  0.6000,  0.4000]])

Typed Operators

Something a little more interesting we can do is to specialize Vector by fixing the size and providing a more insightful name.

class Location(Vector):
    def __init__(self) -> None: super().__init__(3)

class Direction(Vector):
    def __init__(self) -> None: super().__init__(3)

I think of these variations as "typed operators", because now they describe the type of input they expect to take - even if the way they encode that input is still trivial.

A more interesting example of a "typed operator" might be a Rotation operator - which we could define to take quaternions as input, and output rotations encoded in the two-axis format (since this is more appropriate for neural networks):

def quat_to_xform_xy(q):
    qw, qx, qy, qz = q[...,0:1], q[...,1:2], q[...,2:3], q[...,3:4]
    
    x2, y2, z2 = qx + qx, qy + qy, qz + qz
    xx, yy, wx = qx * x2, qy * y2, qw * x2
    xy, yz, wy = qx * y2, qy * z2, qw * y2
    xz, zz, wz = qx * z2, qz * z2, qw * z2
    
    return torch.cat([
        torch.cat([1.0 - (yy + zz), xy - wz], dim=-1)[...,None,:],
        torch.cat([xy + wz, 1.0 - (xx + zz)], dim=-1)[...,None,:],
        torch.cat([xz - wy, yz + wx], dim=-1)[...,None,:],
    ], dim=-2)

class Rotation(ControlOperator):
    
    def output_size(self) -> int:
        return 6
        
    def forward(self, x : List[torch.FloatTensor]) -> torch.FloatTensor:
        assert all(len(xb.shape) == 1 for xb in x)
        assert all(xb.shape[0] == 4 for xb in x)
        return quat_to_xform_xy(torch.stack(x, dim=0)).reshape([len(x), 6])

This time things are a little more interesting

encoder = Rotation()

encoded = encoder([
    torch.as_tensor([0.1, 0.2, 0.5, 0.1]), 
    torch.as_tensor([-1.1, 0.6, 0.4, 0.0])
])

as now our operator actually does something:

>>> print(encoded)
tensor([[ 0.4800,  0.1800,  0.2200,  0.9000, -0.0600,  0.1400],
        [ 0.6800,  0.4800,  0.4800,  0.2800,  0.8800, -1.3200]])

So Control Operators can be defined to perform some kind of encoding or pre-processing of inputs of different types if we want. That's nice, but the real power of control operators starts when we begin to combine them together...

And / Struct

The And operator takes multiple other operators as input during construction and concatenates together their outputs during evaluation.

class And(ControlOperator):
    
    def __init__(self, ops : Dict[str,ControlOperator]) -> None:
        super().__init__()
        self.ops = torch.nn.ModuleDict(ops)
        
    def output_size(self) -> int:
        return sum(v.output_size() for v in self.ops.values())
        
    def forward(self, x : List[Dict[str,Any]]) -> torch.FloatTensor:
        assert all(all(k in xb for k in self.ops) for xb in x)
        return torch.cat([v([xb[k] for xb in x]) for k, v, in self.ops.items()], dim=-1)

The forward function here may be a little hard to parse, but effectively it takes a list of dictionaries as input, and applies each sub-operator to each corresponding element in each of the dictionaries.

I think it is clearer in an example. We instantiate it like this:

encoder = And({
    'location': Location(),
    'direction': Direction()
})

And then we can evaluate it on some input like this:

encoded = encoder([
    {
        'location': torch.as_tensor([0.2, 0.5, 0.1]),
        'direction': torch.as_tensor([0.4, 0.4, 0.2]),
    },
    {
        'location': torch.as_tensor([-0.1, 0.3, 0.0]),
        'direction': torch.as_tensor([0.8, -0.3, 0.7]),
    },
])

In this example, given And just concatenates the inputs, the output encoded will be of shape (2, 6):

>>> print(encoded)
tensor([[ 0.2000,  0.5000,  0.1000,  0.4000,  0.4000,  0.2000],
        [-0.1000,  0.3000,  0.0000,  0.8000, -0.3000,  0.7000]])

We might prefer to call this operator Struct if we want a more C-like naming of things:

class Struct(And): pass

encoder = Struct({
    'location': Location(),
    'direction': Direction()
})

Or / Union

So far, we've still not really done anything particularly complex or useful but things are about to get spicy so bear with me. Let's take a look at how we can define an Or operator:

class Or(ControlOperator):
    
    def __init__(self, ops : Dict[str,ControlOperator], encoding_size=256) -> None:
        super().__init__()
        
        self.ops = torch.nn.ModuleDict(ops)
        self.Ws = torch.nn.ModuleDict({k: 
            torch.nn.Linear(v.output_size(), encoding_size) 
            for k, v in ops.items()})
        self.encoding_size = encoding_size
        
    def output_size(self) -> int:
        return self.encoding_size + len(self.ops)
        
    def forward(self, x : List[Tuple[str,Any]]) -> torch.FloatTensor:
        assert(all(xb[0] in self.ops for xb in x))
        
        # Create zero output
        out = torch.zeros([len(x), self.output_size()], dtype=torch.float32)
        
        # Loop over sub-operators
        for ki, (k, v) in enumerate(self.ops.items()):
            
            # Find batch indices for this sub operator
            indices = torch.as_tensor([xi for xi, xb in enumerate(x) if xb[0] == k])
            
            # Generate encoded values using sub-operators
            encoded = v([xb[1] for xb in x if xb[0] == k])
            
            # Pass encoded values through linear layer
            out[indices,:-len(self.ops)] = self.Ws[k](encoded)
            
            # Insert one-hot
            out[indices,-len(self.ops) + ki] = 1.0
        
        return out

Again, this may not be crystal clear without going over it line-by-line, but the general idea is that for each input type we are going to use a linear layer to map it to a dimension of a fixed size, and then we are going to add a one-hot encoding to the end to help indicate which choice was given.

Using this operator is a bit like And, but instead of a list of dictionaries, we pass it a list of tuples. The first item in the tuple is the name of the sub-operator we want to use, and the second item is the input to that sub-operator itself.

encoder = Or({
    'location': Location(),
    'rotation': Rotation()
}, encoding_size=16)

encoded = encoder([
    ('rotation', torch.as_tensor([0.5, -0.1, 0.3, 0.0])),
    ('location', torch.as_tensor([0.2, 0.5, 0.1])),
    ('rotation', torch.as_tensor([0.1, 0.7, 0.2, 0.4])),
])

This will encode each input as a 16 + 2 dimensional vector using the corresponding weight matrices for each different type:

>>> print(encoded)
tensor([[-0.1878, -0.3041, -0.2608,  0.1791,  0.8779,  0.1312, -0.0033, -0.4950,
         -0.2920, -0.4796, -0.1314,  0.5334,  0.4711, -0.1670, -0.3288,  0.4319,
          0.0000,  1.0000],
        [ 0.3279, -0.0771,  0.7842, -0.5520, -0.1030, -0.6392,  0.4021, -0.4236,
         -0.2792, -0.0895, -0.6595, -0.8514, -0.8780,  0.6082,  0.3384, -0.0688,
          1.0000,  0.0000],
        [ 0.1930, -0.2574,  0.3523,  0.1472,  0.0352, -0.3257,  0.1088, -0.4665,
         -0.0969, -0.1376,  0.1153, -0.1602,  0.2666,  0.5245,  0.2892,  0.1348,
          0.0000,  1.0000]], grad_fn=<CopySlices>)

In a similar way to And, you might prefer to call this Union if you prefer that style of naming:

class Union(Or): pass

This operator is really powerful because it allows us to merge multiple different input modalities. For example, in the context of character control we might want to sometimes provide a target point to move towards, and sometimes a desired direction to to face, and just have the network deal with that.

Fixed Array

If we have a fixed number of elements of the same type we could also encode this via concatenation, just like the And operator:

class FixedArray(ControlOperator):

    def __init__(self, op : ControlOperator, num : int) -> None:
        super().__init__()
        self.op = op
        self.num = num
    
    def output_size(self) -> int:
        return self.op.output_size() * self.num
    
    def forward(self, x : List[List[Any]]) -> torch.FloatTensor:
        assert all(len(xb) == self.num for xb in x)
        return self.op(sum(x, [])).reshape([len(x), -1])

We just need to give the number of elements on construction:

encoder = FixedArray(Location(), 5)

encoded = encoder([[
    torch.as_tensor([-0.1,  0.1,  0.0]), torch.as_tensor([-0.2, -0.3,  0.7]),
    torch.as_tensor([ 0.1,  0.4, -0.0]), torch.as_tensor([ 0.8,  0.4, -0.2]),
    torch.as_tensor([-0.2,  0.2,  0.3]),
]])

And it will concatenate them together as expected:

>>> print(encoded)
tensor([[-0.1000,  0.1000,  0.0000, -0.2000, -0.3000,  0.7000,  0.1000,  0.4000,
         -0.0000,  0.8000,  0.4000, -0.2000, -0.2000,  0.2000,  0.3000]])

OneOf / Enum & SomeOf / Flags

If we have some kind of categorical input, we generally want to encoding it using a one-hot encoding. We can do this by defining a OneOf operator.

class OneOf(ControlOperator):
    
    def __init__(self, choices : List[str]) -> None:
        super().__init__()
        self.choices = choices
        
    def output_size(self) -> int:
        return len(self.choices)
        
    def forward(self, x : List[str]) -> torch.FloatTensor:
        return torch.nn.functional.one_hot(
            torch.as_tensor([self.choices.index(xb) for xb in x]), len(self.choices))

This works by providing a number of choices on construction and then looking up the index corresponding to those choices at evaluation time.

encoder = OneOf(["foo", "bar", "baz"])

encoded = encoder([
    "baz",
    "foo",
    "foo",
    "bar",
    "foo",
    "baz"
])

Which produces the one-hot encoding as expected:

>>> print(encoded)
tensor([[0, 0, 1],
        [1, 0, 0],
        [1, 0, 0],
        [0, 1, 0],
        [1, 0, 0],
        [0, 0, 1]])

The C-style naming of this might be Enum.

class Enum(OneOf): pass

Alternatively, if we have a non-exclusive category we define a similar SomeOf operator.

class SomeOf(ControlOperator):
    
    def __init__(self, choices : List[str]) -> None:
        super().__init__()
        self.choices = choices
        
    def output_size(self) -> int:
        return len(self.choices)
        
    def forward(self, x : List[List[str]]) -> torch.FloatTensor:
        out = torch.zeros([len(x), len(self.choices)])
        for xi, xb in enumerate(x):
            for c in xb:
                out[xi,self.choices.index(c)] = 1.0
        return out

Which takes a list of choices and uses an indicator function to say which choices are given.

encoder = SomeOf(["foo", "bar", "baz"])

encoded = encoder([
    ["baz", "foo"],
    ["foo"],
    [],
    ["bar", "baz", "foo"],
    ["foo", "bar"],
    ["baz"]
])

Encoded it looks as expected.

>>> print(encoded)
tensor([[1., 0., 1.],
        [1., 0., 0.],
        [0., 0., 0.],
        [1., 1., 1.],
        [1., 1., 0.],
        [0., 0., 1.]])

Some people might call this Flags.

class Flags(SomeOf): pass

Index

Another useful thing to be able to encode is some kind of positional index. For this we can define an Index operator which takes as input a positional index and encodes it using a position encoding.

class Index(ControlOperator):

    def __init__(self, encoding_size : int = 128) -> None:
        super().__init__()
        
        self.encoding_size = encoding_size
        
        self.freqs = torch.exp(
            torch.arange(0, self.encoding_size * 2, 2) * 
            (-np.log(10000.0) / (self.encoding_size * 2)))[None]
        
    def output_size(self) -> int:
        return self.encoding_size * 2

    def forward(self, x : List[int]) -> torch.FloatTensor:
        
        i = torch.as_tensor(x, dtype=torch.float32)[:,None]
        
        return torch.cat([
            torch.sin(i * self.freqs),
            torch.cos(i * self.freqs)], dim=-1)

This encodes indices as follows:

encoder = Index()

encoded = encoder([0, 3, 4, 0, 1])

Converting them into a series of sin and cos frequencies:

>>> print(encoded)
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  1.0000,  1.0000,  1.0000],
        [ 0.1411,  0.3428,  0.5173,  ...,  1.0000,  1.0000,  1.0000],
        [-0.7568, -0.5486, -0.3167,  ...,  1.0000,  1.0000,  1.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  1.0000,  1.0000,  1.0000],
        [ 0.8415,  0.8020,  0.7617,  ...,  1.0000,  1.0000,  1.0000]])

String

We can also encode strings using any kind of out-of-the-box text embedding. For example, here is how we might make a Control Operator that wraps CLIP:

import clip

clip_model, _ = clip.load("ViT-B/32")

class String(ControlOperator):
    
    def output_size(self) -> int:
        return clip_model.token_embedding.weight.shape[1]

    def forward(self, x : List[str]) -> torch.FloatTensor:
        return clip_model.encode_text(clip.tokenize(x))

Now we can use the String control operator like anything else:

encoder = String()

encoded = encoder([
    "first string", 
    "second string", 
    "hello world!"
])

And it will produce an encoding of the input strings.

>>> print(encoded)
tensor([[-0.2899, -0.2474,  0.0722,  ..., -0.1336,  0.1429, -0.1606],
        [-0.3094, -0.1937, -0.0914,  ..., -0.1078, -0.1488, -0.1906],
        [-0.0205,  0.1111, -0.1053,  ..., -0.2532, -0.2795,  0.2360]],
       grad_fn=<MmBackward0>)

Null

Another useful operator is the Null operator - something that takes None as input and produces the empty vector as output:

class Null(ControlOperator):
    
    def output_size(self) -> int:
        return 0
        
    def forward(self, x : List[None]) -> torch.FloatTensor:
        return torch.empty([len(x), 0], dtype=torch.float32)

This isn't useful on it's own, but can be useful in other places as a symbolic representation of some kind of missing data.

encoder = Struct({
    'location': Location(),
    'null': Null()
})

encoded = encoder([
    {'location': torch.as_tensor([-0.2,  0.2,  0.3]), 'null': None},
    {'location': torch.as_tensor([ 0.8,  0.4, -0.2]), 'null': None}
])

In this example the empty vector produced by the Null operator gets concatenated to the location input, which is basically a no-op:

>>> print(encoded)
tensor([[-0.2000,  0.2000,  0.3000],
        [ 0.8000,  0.4000, -0.2000]])

Optional / Maybe

For example, we can use the Null operator (in combination with Or) to define a very useful operator - Optional. This operator allows us to express that a value may or may not be provided.

class Optional(ControlOperator):

    def __init__(self, op : ControlOperator, **kw) -> None:
        super().__init__()
        self.op = Or({ 'null': Null(), 'valid': op }, **kw)
        
    def output_size(self) -> int:
        return self.op.output_size()

    def forward(self, x : List[Any]) -> torch.FloatTensor:
        return self.op([('null' if xb is None else 'valid', xb) for xb in x])

This we can use when some data is missing from the elements in the batch:

encoder = Optional(Location(), encoding_size=16)

encoded = encoder([
    torch.as_tensor([ 0.5, -0.7, 0.0]),
    None,
    torch.as_tensor([-0.1, 0.3, 0.0]),
])

It still maps each element to a vector of a uniform size:

>>> print(encoded)
tensor([[-0.2692,  0.2094,  0.6121,  0.4173, -0.2211, -0.3560,  0.4648,  0.7399,
          0.5171, -0.6681, -0.0627,  0.1318, -0.4802,  0.0124, -0.3467, -0.4102,
          0.0000,  1.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          1.0000,  0.0000],
        [-0.0703, -0.0221,  0.4773,  0.0047, -0.2852, -0.2211,  0.5099,  0.0796,
          0.0384, -0.3100,  0.7852,  0.0494, -0.5230,  0.5981,  0.1897, -0.3267,
          0.0000,  1.0000]], grad_fn=<CopySlices>)

The missing null element will effectively end up initialised using one of the bias values of the weights in the Or linear layer - which for an untrained network is zero.

Like with Struct and Union you might prefer to call this Maybe.

class Maybe(Optional): pass

Set

One thing we might want to do is to be able to encode a set of elements, all of the same type, but where we don't know exactly how many we are going to give. We can do this by defining a Set operator which makes use of attention to summarize the items in the set.

Here is a slightly simplified version of what we have in the paper:

class Set(ControlOperator):
    
    def __init__(self, op : ControlOperator, 
        head_num=8, query_size=256, encoding_size=256) -> None:
        
        super().__init__()
        self.op = op
        self.head_num = head_num
        self.query_size = query_size
        self.encoding_size = encoding_size
        self.Q = torch.nn.Linear(op.output_size(), head_num * query_size)
        self.K = torch.nn.Linear(op.output_size(), head_num * query_size)
        self.V = torch.nn.Linear(op.output_size(), head_num * encoding_size)

    def output_size(self) -> int:
        return self.head_num * self.encoding_size
        
    def forward(self, x : List[List[Any]]) -> torch.FloatTensor:
        
        encoded = self.op(sum(x, []))
        total = len(encoded)
        
        queries = self.Q(encoded).reshape([total, self.head_num, self.query_size])
        keys = self.K(encoded).reshape([total, self.head_num, self.query_size])
        values = self.V(encoded).reshape([total, self.head_num, self.encoding_size])
        
        output = torch.zeros(
            [len(x), self.head_num, self.encoding_size], dtype=torch.float32)
        
        offset = 0
        for xi, xb in enumerate(x):
            attn = (queries * keys)[offset:offset+len(xb)].sum(dim=-1)
            attn = torch.softmax(attn / np.sqrt(self.query_size), dim=0)
            output[xi] = (attn[...,None] * values[offset:offset+len(xb)]).sum(axis=0)
            offset += len(xb)
        
        return output.reshape([len(x), self.head_num * self.encoding_size])

This uses self-attention to summarize all the items in the set into a single vector. Here is how we might actually use it:

encoder = Set(Location())

encoded = encoder([
    [torch.as_tensor([-0.1,  0.3, 0.0])],
    [torch.as_tensor([-0.8,  0.7, 0.4]), torch.as_tensor([-0.8,  0.7, 0.4])],
    [],
])

Even though the number of elements in the set differs for each item in the batch, the encoded vector is still the same size:

>>> print(encoded)
tensor([[-0.6212, -0.3185,  0.2319,  ...,  0.1747, -0.5789,  0.0832],
        [-0.9693, -0.5274,  0.0776,  ..., -0.0313, -0.8946, -0.2327],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
       grad_fn=<ViewBackward0>)

Array

We defined a fixed-size array, but if we want a variable-sized array, we can define it in terms of Set, plus Index for the array index of each element:

class Array(ControlOperator):

    def __init__(self, op : ControlOperator, **kw) -> None:
        super().__init__()
        self.op = Set(And({'index': Index(), 'value': op}), **kw)
    
    def output_size(self) -> int:
        return self.op.output_size()
    
    def forward(self, x : List[List[Any]]) -> torch.FloatTensor:
        return self.op([[{'index': xbi, 'value': xbv} 
            for xbi, xbv in enumerate(xb)] for xb in x])

We can use it something like this:

encoder = Array(Location())

encoded = encoder([
    [torch.as_tensor([-0.2, -0.3,  0.7]), torch.as_tensor([ 0.1,  0.4, -0.0])],
    [],
    [torch.as_tensor([ 0.8,  0.4, -0.2])],
    [torch.as_tensor([-0.2,  0.2,  0.3])],
])

Which produces something like this:

>>> print(encoded)
tensor([[-0.2998,  0.0572, -0.3017,  ..., -0.0926, -0.6604, -0.0490],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [-0.3610,  0.5190, -0.6120,  ..., -0.3639, -0.1028, -0.1387],
        [-0.2540,  0.0418, -0.3737,  ...,  0.0220, -0.6772, -0.0099]],
       grad_fn=<ViewBackward0>)

Unlike the Set Control Operator, the order of the elements in the list will now matter in terms of what encoding is produced.

Dictionary

We can define a dictionary similarly, just with an explicit key and value type.

class Dictionary(ControlOperator):

    def __init__(self, key : ControlOperator, value : ControlOperator, **kw) -> None:
        super().__init__()
        self.op = Set(And({'key': key, 'value': value}), **kw)
    
    def output_size(self) -> int:
        return self.op.output_size()
    
    def forward(self, x : List[Dict[Any,Any]]) -> torch.FloatTensor:
        return self.op([[{'key': xbk, 'value': xbv} 
            for xbk, xbv in xb.items()] for xb in x])

This is pretty neat if we combine it with something like our String operator:

encoder = Dictionary(String(), Location(), encoding_size=16)

encoded = encoder([
    {
        "control": torch.as_tensor([-0.1,  0.1,  0.0]),
        "operators": torch.as_tensor([-0.2, -0.3,  0.7]),
        "rock": torch.as_tensor([ 0.1,  0.4, -0.0]),
    },
    {},
    {
        "blah": torch.as_tensor([-0.5,  0.5,  0.0]),
        "testing": torch.as_tensor([ 0.8, -0.2,  0.2]),
        "foo": torch.as_tensor([-0.5, -0.9, -1.0]),
    },
])

Which produces an encoding like this:

>>> print(encoded)
tensor([[ 0.2419, -0.1294, -0.3512,  0.0264, -0.0391,  0.1934, -0.4428,  0.3578,
         -0.0390, -0.3939,  0.1215,  0.4105,  0.3867,  0.1722, -0.1018,  0.1746,
          0.0983, -0.2184,  0.0430, -0.1131,  0.0123,  0.4871,  0.0461, -0.0035,
          0.2232, -0.4030,  0.2108, -0.3307, -0.0515,  0.4794,  0.0666, -0.1344,
          0.2769,  0.0630,  0.0066, -0.0604,  0.2082, -0.1023,  0.2968,  0.3442,
          0.2031,  0.0118, -0.4333,  0.2422,  0.2477,  0.3807, -0.3074, -0.0906,
         -0.0196,  0.1870,  0.3583, -0.1929,  0.2748,  0.2691,  0.3498, -0.1233,
          0.1292,  0.1912,  0.1758, -0.3564,  0.3516, -0.4300,  0.1132, -0.2948,
          0.0106,  0.1802,  0.0998, -0.1874,  0.1205,  0.2187,  0.3077, -0.5048,
          0.1552,  0.0798,  0.0514,  0.4032,  0.0508, -0.3058,  0.0671,  0.3146,
         -0.2099, -0.1990,  0.0526,  0.1612, -0.4301, -0.3265, -0.0488, -0.1238,
          0.0387, -0.5461,  0.2222, -0.5773,  0.3465, -0.0329, -0.4385,  0.1131,
         -0.0503,  0.1553,  0.2751, -0.6218, -0.0246,  0.2355,  0.3991, -0.1351,
          0.0944,  0.2890, -0.0647, -0.0607,  0.1525, -0.1976,  0.0394,  0.0175,
          0.1347, -0.1093,  0.0972, -0.2148, -0.2075, -0.5322,  0.3142, -0.1472,
         -0.3189,  0.0411, -0.1914, -0.2380, -0.3892, -0.2341, -0.1163, -0.4384],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.1299,  0.0238, -0.2588,  0.0506, -0.1233,  0.4003, -0.5034,  0.4655,
          0.0396, -0.5257,  0.2322,  0.5050,  0.3808,  0.1592, -0.1094,  0.1165,
          0.1001, -0.1724, -0.0230, -0.0819,  0.1041,  0.5542,  0.1500,  0.1374,
          0.4254, -0.4948,  0.4456, -0.4298, -0.0727,  0.4559,  0.2187, -0.1999,
          0.4325,  0.0108,  0.1046, -0.1028,  0.2892, -0.0314,  0.3629,  0.2943,
          0.1698,  0.2504, -0.3948,  0.2457,  0.0996,  0.4317, -0.4504, -0.1272,
         -0.1268,  0.0640,  0.4903, -0.2416,  0.3820,  0.3215,  0.3080, -0.0958,
          0.3426,  0.1858,  0.2266, -0.4813,  0.4663, -0.5764,  0.1538, -0.4412,
         -0.0943,  0.3015,  0.0820, -0.1371,  0.2562,  0.1637,  0.2777, -0.5467,
          0.2250,  0.2646,  0.0830,  0.3998,  0.0129, -0.4364,  0.0539,  0.2807,
         -0.1584, -0.3058,  0.0404,  0.0809, -0.4319, -0.2785, -0.0009, -0.0941,
         -0.0451, -0.5030,  0.4396, -0.6496,  0.4330, -0.1450, -0.2869,  0.1899,
         -0.0548,  0.0452,  0.0685, -0.6486, -0.0376,  0.1951,  0.5730, -0.1085,
          0.0647,  0.2016, -0.0801, -0.0602, -0.0266, -0.2762,  0.0499,  0.1604,
          0.0282, -0.1232, -0.0097, -0.1829, -0.3193, -0.7331,  0.1893, -0.1763,
         -0.2875,  0.1726, -0.2363, -0.2419, -0.3451, -0.3498,  0.0193, -0.3901]],
       grad_fn=<ViewBackward0>)

Encoded

At any point we can always insert some other kind of PyTorch modules to encode the vector output by some other operator:

class Encoded(ControlOperator):

    def __init__(self, op : ControlOperator, 
        encoding_size = 256, activation = torch.nn.functional.elu) -> None:
        
        super().__init__()
        self.op = op
        self.W = torch.nn.Linear(op.output_size(), encoding_size)
        self.encoding_size = encoding_size
        self.activation = activation
    
    def output_size(self) -> int:
        return self.encoding_size
    
    def forward(self, x : List[Any]) -> torch.FloatTensor:
        return self.activation(self.W(self.op(x)))

This lets us map the encoded output vector to whatever size we want.

encoder = Encoded(Location(), encoding_size=32)

encoded = encoder([
    torch.as_tensor([ 0.5, -0.7, 0.0]),
    torch.as_tensor([-0.1, 0.3, 0.0]),
])

For example:

>>> print(encoded)
tensor([[ 0.1044, -0.3420, -0.1119, -0.3230,  0.1396,  0.4583,  0.7204, -0.4526,
         -0.0747,  0.7712, -0.3624,  0.0154, -0.0880, -0.4740,  0.5538,  0.1476,
          0.4227,  0.3701, -0.0232,  0.0404, -0.4097, -0.3988,  0.8864, -0.4473,
          0.7275, -0.0798,  0.5861,  0.1896, -0.0028,  0.1434, -0.5173, -0.2102],
        [ 0.4872, -0.3004, -0.1856, -0.3578,  0.0414,  0.2394, -0.0490, -0.0642,
         -0.1385,  0.3772, -0.0157, -0.2182, -0.3251, -0.1228, -0.2200, -0.2719,
          0.2250,  0.1557,  0.1955,  0.2665, -0.0145, -0.2658,  0.3358, -0.3536,
          0.4866,  0.3880,  0.0411,  0.0530, -0.4671,  0.0044, -0.1518, -0.4055]],
       grad_fn=<EluBackward0>)

Conclusion & Limitations

Above I've defined just a handful of what are probably the most basic operators but I hope it is clear what more can be done with this approach.

Once we have all of these building blocks made we can create networks which can automatically and differentiably encode really quite complex structures. And now perhaps it is clear how I set-up the encoding for the example shown right at the beginning:

encoder = Set(Encoded(Struct({
    'name': String(),
    'class': Enum(['enemy', 'prop', 'weapon']),
    'location': Location(),
    'rotation': Rotation(),
    'aiming': Optional(Direction()),
    'state': Flags(['allocated', 'alive'])
})))

Now we can see how this Control Operator "schema" is linked to our input structure:

encoded = encoder([

    # Batch Element 0
    [
        {
            'name': 'slime',
            'class': 'enemy',
            'location': torch.as_tensor([0.1, 0.5, 1.1]),
            'rotation': torch.as_tensor([1.0, 0.0, 0.0, 0.0]),
            'aiming': None,
            'state': ['allocated', 'alive']
        },
        {
            'name': 'chair_01',
            'class': 'prop',
            'location': torch.as_tensor([0.1, -10.0, 1.1]),
            'rotation': torch.as_tensor([-1.0, 0.0, 0.0, 0.0]),
            'aiming': None,
            'state': ['allocated']
        },
        {
            'name': 'shotgun',
            'class': 'weapon',
            'location': torch.as_tensor([-0.5, -0.5, 0.0]),
            'rotation': torch.as_tensor([0.0, 1.0, 0.0, 0.0]),
            'aiming': torch.as_tensor([1.0, 0.0, 0.0]),
            'state': []
        },
    ],
    
    # Batch Element 1
    [
        {
            'name': 'flower',
            'class': 'prop',
            'location': torch.as_tensor([1.2, -5.0, 5.1]),
            'rotation': torch.as_tensor([0.0, 0.0, 1.0, 0.0]),
            'aiming': None,
            'state': ['allocated', 'alive']
        },
    ]
])

And how it will encode each element in the batch differentiably as a single vector, spitting out a matrix as follows...

>>> print(encoded)
tensor([[ 0.1199,  0.0481, -0.0157,  ..., -0.2353, -0.1024, -0.0566],
        [ 0.1466,  0.0350, -0.1535,  ..., -0.2658, -0.2678, -0.0341]],
       grad_fn=<ViewBackward0>)

...which you can then plug into any other neural network system.

So perhaps you can see the strengths of Control Operators, but what about the limitations?

Performance: If you implement Control Operators in the way shown here then at some point you might have performance issues. As you might have noticed, there are quite a few python for loops over the batch dimension involved in lots of these functions and that can get slow. Implementing things in a performant way over the batch dimension is possible, but the code can start to get pretty hairy, and type systems that are stricter than python might begin to fight you too.

Encoder Only: Control Operators only let you encode structured data, they don't provide an answer for producing it. That means the flow of information is one-directional, which makes it difficult to do things like (for example) train some kind of auto-encoder on structured data. While there are some ways to output some kind of structured outputs (we have a system in Learning Agents for producing "structured actions") I think it largely remains an open problem.

Normalization & Statistics: Certain things become difficult when you use Control Operators. For example, how can we properly normalize structured data like the data shown above? Or compute statistics on it? Things that are simple if your inputs are just flat vectors become complicated when your inputs might be in all kinds of crazy forms.

But ultimately, I hope you've found the idea of Control Operators interesting - because I think there is more than one way to try and tackle this problem and hope this is an idea that can be developed into something even better.

And all the source code for this article can be found here.