Halo Networking: An Interview with Chris Butcher

Chris Butcher is one of four engineering leads at Bungie Studios who were responsible for overseeing the creation of the smash hit "Halo 2." If you've read our article on the A.I. of Halo 2 then you already know that Chris is a really smart dude. When I visited Bungie on November 3rd I actually had a chance to sit down with Chris twice.

One of the largest differences between "Halo 2" and the original "Halo" is that "Halo 2" can be played over Xbox Live. In order to pull that off, Bungie had to create a new way to network Halo multiplayer matches over the Internet. This responsibility fell largely on Chris' shoulders. In my second interview we discussed networking "Halo 2."


Bungie is taking "Halo 2" multiplayer to the world.
Photo courtesy of Bungie.net

Now rather than put this article in the traditional Q&A form that you're used to seeing, I'm just going to let you read what Chris had to say. Because the truth is I only asked Chris one question:

"So, what had to be done to put 'Halo 2' on Live?"

Click to the next page for his answer.

Stay Together Now

"Halo 1 is what's referred to as synchronous networking."
"Halo 1 is what's referred to as synchronous networking."
Photo courtesy of Bungie.net

Chris Butcher:

"Halo 1 was a network game because you could play it over the LAN (Local Area Network) with System Link. That was a very successful part of it. The thing that we really wanted to do for 'Halo 2' was to take that and extend the network model to the Internet so that you could play over Xbox Live.

'Halo 1' networking is what's referred to as synchronous networking. What that means is that if you have some number of applications and different instances of software running on different machines. In this case, the game is on four Xboxes. Those machines are all running the same game simulation and the game simulation is deterministic. That means that if you if you provide the simulation with the same inputs then it will produce the same outputs when you run all that code. We use that fact to leverage the multiplayer game. Instead of sending [the information] to every machine -- every piece of information about what's happening in the entire game world -- instead what we send is, "here are the inputs that we provide for the game simulation at this instant in time."

Now our game runs at 30 ticks per second, because NTSC is 60 hertz. So we run one game tick for every two draws of the screen.... and we render one frame at a time, so our frame rate is 30 frames per second. Every time we run the game, the machine samples what the players are doing in the game at that moment. Like, what they are doing on their controllers whether they're jumping or getting vehicles, turning three degrees to the left, pulling the fire button... or things like that.

It then sends that information about what the players inputs are in the game simulation to the all other machines. It also receives information from all of the players who are not on the local machine, and so that means every machine at the same time has all of the inputs it needs and they all run the simulation together... Then all the machines know where [each player's action] is in the world and the consequences, and the consequences are enacted on all machines... So the state of the world is maintained consistently across the machines."

Who's in Charge Here?

"The thing with this networking model is if there's a bug in the computer code where two machines could provide the same inputs but get different outputs, there can be problems."
"The thing with this networking model is if there's a bug in the computer code where two machines could provide the same inputs but get different outputs, there can be problems."
Photo courtesy of Bungie.net

Chris Butcher continues:

"Halo is also a client/server based network model, meaning that one machine in the game is the server of the game, and then everybody joins it making that machine the master. If you are a client, you send your actions to the server and then when the server receives the actions from everybody it then sends out everybody's collective actions to all clients. And that's how we make sure everybody's in the same game together.

It's actually the same network model we used in Marathon back in the day, although Marathon had some bugs in it. The thing with this networking model is if there's a bug in the computer code where two machines could provide the same inputs but get different outputs, there can be problems. There are lots of different ways that could happen. It could be a bug where you are using just some random garbage memory in the computer and that would be random from machine to machine. That would be bad.

The other thing is that we're not running exactly the same simulation on all machines. When [the server] is sending information about what actions are occurring on all of the machines [it doesn't] send them to everybody. It would be a pure peer model if we did send it to everybody.

The thing is, we run the simulation and we run the world, that's one part of what we do, but then every frame we also have to do things just for the local player, like you have to figure out what their first person weapon is doing, whether they're reloading or throwing a grenade. We actually render their view of the world as well.

So those actions -- because they only take place on one machine -- those actions can't be allowed to affect the deterministic state of the world. So basically we have a separation inside our game. This is the stuff that is deterministic -- it's all the objects in the world and how they move. This is the stuff that is not deterministic -- the sounds that you can hear on the local machine, what you are rendering with your graphics and a couple of other things. We have to separate those two.

If we keep them separated, then the game will stay in sync between the other machines. But if they are not separated correctly -- if there is information transferred between the two -- then the machines will diverge in the simulation, and you might not necessarily notice that because one machine could be like the player is here but the same player is [in a slightly different place] on somebody else's machine, so you might not necessarily notice that, unless you tried to shoot them and the bullet hit them in such a way that hit them on one machine and missed them on another machine. Then the divergence basically cascades like that until eventually the game is completely different on different machines and then it's meaningless of course."

A New Network for Halo 2

Chris Butcher continues:

"So the thing about that is, because the client sends the server their actions and the server sends them back to the client you have to have a round trip between the client and the server. That works fine in the local area networks. The latency is probably two or three milliseconds between boxes. You know, if you use the XB Connect Software where you can have a PC that tunnels traffic from your Xbox over the Internet, you can actually make it work between people over the Internet. But our experience is that because you're a client, you have to wait for the round trip from your server for your action to do anything. All of your movement or your shooting lags by some amount. What we want to avoid with 'Halo 2' is designing a network model that is really susceptible to that.

So moving to 'Halo 2'... rather than sending your actions from machine to machine, what we have to do is we still have a client and server, but rather than the client waiting for the server to tell it exactly what should happen with a turn, the client is predicting the entire world. It is simulating the world exactly as it thinks things should happen, so that will be perfectly in sync with the server. The things the client doesn't know about are other sources of input, like the other players in the game.

So when you take that model you can predict yourself perfectly, so you can run and jump and get in elevators and all those things you can predict what the player on the client box is doing fine. Where the differences come in is when your interacting with other people in the world. The client predicts that the other player is moving left because the last information from the server said that, but if they moved right at this time, there could be a slight difference.

So these are the kind of artifacts of the network model. It's that when you interact with a collective source of input from some other machine, you'll see strangeness that doesn't match what your predicted world was on your machine. So the way that we go about doing that is we still need a client server/model but the client sends the server not only information about these are the button pushes that I am doing, but it's at one higher level, it's information about where I am in the world and what I am doing at that moment. So rather than saying that I adjust my joy stick force 23 degrees, it would just say I am in the Warthog, I am here and I am driving in this direction.

You also send a stream of events to the server saying this is what I think is happening in my world. Like I think I am throwing a grenade, I think I sniped that guy, I think I hit him in the head. Then what happens is the server processes these streams from every machine -- it's all their versions of events. And what's happening is the server is also running it's simulation itself. The difference is, it's not predicted; it is the authority. It is responsible for all of the stuff that happens in the game."

For Example...

"The server's the only machine that can create a destroying object and do other things like damage people or award kills and stuff like that."
"The server's the only machine that can create a destroying object and do other things like damage people or award kills and stuff like that."
Photo courtesy of Bungie.net

Chris Butcher continues:

"So if I'm the client, and I pull the trigger to throw a grenade, I actually create the grenade in the world. I'll play the animation, I'll play the sound. But the grenade [in terms of actually affecting the world] -- I'm not allowed to create that because that's an action that requires the authority to do so. The server is the only machine that can create a destroying object and do other things like damage people or award kills and stuff like that. So, what happens is that the clients are sending requests to the server such as, 'I request to throw a grenade here.' The server will say, 'All right I believe you because my knowledge says that you are here in this location and that is consistent with my version of events.' So essentially, there's interplay between the client sending their version of events to the server and the server trying to reconcile them and create the authoritative version of the world. It then sends out the authoritative version to everyone in the world.

So the full sequence of events for throwing a grenade is: I throw a grenade, I see the animation, I hear the sound and then some number of milliseconds later the server will start sending the information for this new object in the world which is the grenade that it created as a result of my actions. So from the user's perspective, you see this grenade appear in mid-air right there and we have all this trickery and prediction to make it look like these complex interactions. Interactions like boarding someone's vehicle essentially are made up of five or six different messages from the server. You're on the vehicle, you start the animation to board him, he gets kicked out, and he's here in the world. All the interaction between those events is sent as a separate message from the server."

Everything in its Right Place

"Then the client can use that information to predict for maybe the next 300 milliseconds what that's going to look like before the guy actually gets there. "
"Then the client can use that information to predict for maybe the next 300 milliseconds what that's going to look like before the guy actually gets there. "
Photo courtesy of Bungie.net

Chris Butcher continues:

"The last piece of the puzzle is as the client you're trying to present a consistent view of the world to the player. What the server is sending you is maybe it's sending you four updates a second for this player that is running and shooting near you. So basically four times a second you're getting, "This is where that player is, this is where he's looking and this is what he's doing in the world." So if you were to just present that the way it is sent, the client would see a lot of jerky, stuttering behavior because it's not necessarily a smooth packet that's coming from the server.

So we have code that manages the objects in the predicted client world. Basically it tries to smooth out the appearance of what's happening to the players in the world. For example, rather than just sending the player 'He's here, he's here, he's here.' the server sends, 'He's here and this is what he's doing and this is the direction he's heading.' Then the client can use that information to predict for maybe the next 300 milliseconds what that's going to look like before the guy actually gets there. So rather than seeing a guy go jerk, jerk, jerk, jerk, I see him here and he's running here and he's running here.

Once you've predicted the behavior of the guy you basically apply smoothing to smooth out those differences in the transmission of data so that you see the guy running. Maybe the range of his motion isn't completely consistent but we don't jerk him from place to place. We speed him up or slow him down based on where we think he needs to be at that point in time. And of course you do the same for the vehicles, the physical objects in the world and everything like that.

So the problem essentially... is that the servers have got to generate this stream of information from clients. And how do they do that? Well the way we do that is there are two different types of things that the server can send: There is the persistence state of objects and the events that take place.

So imagine the persistence state of this jersey on the table -- that it's in this position [Chris moves the jersey] or it's in that position, right? So for every object in the world, the server is tracking information about what has changed about this object and which machine is this information being sent from.

For example, if I throw a grenade here, and there are a whole lot of objects on the ground in that location and they go flying, they will eventually settle down in new places. Those objects will be marked. Their new position needs to be sent to everybody in the world because their position has been changed.

But people who are a long way away don't care about those objects very much because maybe they're in a battle somewhere. But when they come over to that location they will eventually want those objects to be in the right location. So what that means is that those objects are low priority because they're a long way away. But they are marked because they do need to be transmitted eventually. So over time, the priority of those objects will rise and rise until eventually they pass the threshold to be sent."

Getting your Priorities Straight

Chris Butcher continues:

"Basically, the system remembers what it sends to people. It knows that I have 5,000 pieces of information that I would like you to have, but I can only send you maybe 50 in this packet because the network only allows for certain size packet, transmitted a certain rate in order to not congest the network.

So it figures out the most important things are based on where you are, what you're doing, whether you're alive or dead, whether you're shooting at someone. You know if I'm shooting at someone and they're in front of me, I need to know about them at a very high priority. But if there's somebody behind me that I can't see, I don't need to know about them. The server determines the priority of objects. There are a lot of rules for things. Grenades have a priority between 50 and 70, but just a little object lying on the ground, maybe isn't a priority -- between 10 and 20 you know, or something like that. There are actually cases in which a low priority object wouldn't get rendered at all from the client side.

There are basically two types of data that get prioritized. There is the persistence state of objects which will always be sent eventually. They might take a long time and the reason for that if there is an object moving continuously with a low priority then you don't want to send information about that object until you've seen it. You might want an update every 10 seconds to say at a low priority, "Here's where it is." The priority system manages the priority of the objects, the time since it was last transmitted and how much information is needed to transmit that data.

There is persistent state and there is also a stream of events that are taking place in the world, like a bullet hit a wall or some guy said "urg" because he was killed. These events are the information we don't need to send. If it becomes necessary to drop events, we will. So for instance, if I blew up a million grenades in a multiplayer match, many of those grenades would just fall off the end because they couldn't go in the pipeline fast enough. They would be discarded as irrelevant. So there is this balance going on between things you have to have and things that are going on in the world that give you a better picture, but are not necessary.

This separation of data means we can support all kinds of interesting things. For example, because every client is being sent the persistent state of the entities in the world (the entities are what we call the persistent objects in the world), that means that any client has all the information about the world. Because the client knows everything about the world, we can support making them be a new server, if the server machine was to turn off, get disconnected from Xbox Live or the player just wants to leave the game. What happens is that the game pauses, a new server is determined, control is shifted and the game continues. It works in a LAN game too. So you can finish a game with a completely different set of players than when you started, because we have the technology to replicate that information and reassign a new server.

The potential of Xbox live could be really exciting. There are all kinds of interesting social ramifications to taking this game online. There's so much more to it than even what we've talked about. I'm not the best person to talk to about all of the social aspects of designing a user interface or directing players to games, but I can definitely talk about the network side of things."

You certainly can Chris, you certainly can.