We fulfill power fantasies

RSS Feed

MUTA devlog 4: solo dev, client entity system, master server rewrite, world database server...

25.6.2019 18:05:54

It's been a while since the last one of these, but now as we're celebrating midsummer and nightless nights here in Finland, I think its time for me to sit down and write a bit of a catch-up post about what's going on with MUTA, the Multi-User Timewasting Activity.

Officially developing an MMORPG solo

To recap, MUTA is a free and open source MMOPRG project started in 2017 as a student project. The idea for it came from myself and another Kajaani UAS student, Lommi, as we had both been talking of writing an MMO for a while. In fact, wanting to write an MMO was the primary reason I personally applied to the school's game development programme.

We spent two full project courses, each 2-3 months long, on writing the game. During that time we got much assistance from many different people. Game art students created art, and programming students helped us with some great tools and certain parts of the engine, and production students helped us get organized.


MUTA after 2 months of development, during a test with around 20 people online. The 64x64 art at this point was created by three Singaporean summer school students.

After (and between) said courses, it was just me and Lommi working on the game. I'm more of an engine guy, he's more of a gameplay guy, though neither of us is one thing exclusively. While it initially only took us two and a half months to get an OK-looking demo game running, writing an MMO engine and toolset properly takes a lot of time. And while such tools are in development, content creation is difficult, if not impossible. The lack of visible progress during a technology development phase such as this, I think, is problematic for people who are primarily driven by gameplay and visuals, and so after a while, I was mostly working on the game alone, always telling my friend I would try to get the engine and tools ready for content-development ASAP.

Last month, Lommi finally told me he felt the project was too big for him to work on it on the side at this point (he's also employed full-time and has been for quite a while). That leaves me as the only official developer of MUTA. But that's not such a big change after all: likely more than 90% of the current codebase was already written by me alone at this point, as Lommi has not really been involved during the last year or more.

In fact, it's a bit of a relief for me. I no longer need to worry about getting the game ready for others to develop content for it. I'm working a full-time job and try to spend the time I can on MUTA, but often time is simply hard to come by.

With this organizational shift in mind, I have some more changes coming up. I'm planning on reworking the theme of the game and possibly making the art myself. I'm a terrible artist, so it will probably come to simple indie pixel crap. But that's alright, the gameplay is the important part. As for the theme, I'm planning on simple high fantasy, due to the fact I'm not good enough of a visual artist to present a universe anything like what we originally planned for - the original idea was a sword and sorcery -type world inspired by the works of Robert E. Howard. I'm of the opinion that the theme should support the gameplay and not the other way around. However, I don't want to see fireballs flying all over in the style of Warcraft either - it's gonna be something lower key than that.


Character "art" from my first-ever game project, Isogen.

For now, MUTA remains a hobby project I try to pour much of my free time into. Time will tell what it actually evolves into, but I've got high hopes that one day it will be a real, online MMORPG. If not that, at least the code will be available for anyone to inspect.

Code changes

Phew, it's been many months since I last wrote about MUTA, so a lot of things have changed in the codebase, and some of them I don't even remember anymore. Some of the changes include (in a semi-chronological order):

       
  • Reworking of the immediate mode GUI into a standalone library.
  •    
  • Shared code cleanup (mostly just renaming things and organizing them into files)
  •    
  • Moving to MojoAL from OpenAL Soft on at least GNU/Linux. I don't know how great of an idea this is, but MojoAL is easy to embed into the project, having only two source files as opposed to OpenAL Soft's CMake hell.
  •    
  • New entity system for the client.
  •    
  • Rewriting the client's rendering.
  •    
  • Completely rewriting the master server.
  •    
  • Writing a world database server.
  •    
  • Writing a new async database API.
  •    
  • packetwriter2 tool for network message serialization.
  •    
  • Shared API and authentication for server-server connections (svchan_server and svchan_client).
  •    
  • New generic hashtable written as a separate library.

Client entity system

The entity system on the client needed a rework. It was something of an entity-component system (and I know how pretentious that term is) and remains so. This job had two distinct motivations:

       
  • Making the code clearer
  •    
  • Performance
I feel like both goals were achieved. First of all, the code needed breaking into more source files as previously the whole world code was in a single file (I feel this sort of isolation is more future-proof for this project), but second, I really wanted less weird macros and more flexibility in component implementation. To recap the new system:
       
  • An entity has a fixed-size array of components.
  •    
  • Each component has an enumerator it's referred with. The enum is an    index to an entity's component array.
  •    
  • Components in an entity generally point to a handler structure. The    component might have an iterable element in a tight array associated with    it, but this is not visible to the component's user - they access it    through a set of functions.
  •    
  • Components communicate mainly through events (callbacks).

Components are defined by creating an instance of a static-duration component_definition_t struct.

struct component_definition_t {
   int (*system_init)(world_t *world, uint32 num_components);
   void (*system_destroy)(world_t *world);
   void (*system_update)(world_t *world, float dt);
   component_handle_t (*component_attach)(entity_t *entity);
   void (*component_detach)(component_handle_t handle);
   entity_event_interest_t     *entity_event_interests;
   uint32                      num_entity_event_interests;
   component_event_interest_t  *component_event_interests;
   uint32                      num_component_event_interests;
   uint32                      index; /* Autofilled, initialize to 0! */
   component_event_callback_t  *component_event_callbacks; /* Autofilled, initialize to 0! */
};

So the component_definition_t structure is really just a set of callbacks. Components are also pooled, but the pools are members of world instances, hence not visible int he above example (the functions just accept a pointer to a world_t, as seen).

Using a component definition, components can be added to an entity and then manipulated.

component_handle_t entity_attach_component(entity_t *entity,
   component_definition_t *component_definition);

The component handle returned by entity_attach_component can be used to access the component. It could be laid out in memory in various ways - the API does not set restrictions on this, except that the handle must be a constant address pointer until destruction.

void mobility_component_set_speed(component_handle_t handle, float speed);

Component event callbacks are attached to the component definitions rather than individual components. This does away with some flexibility, but saves memory and likely performs better in the average case, since in MUTA, certain sets of components in a single entity type are very common (creatures have a certain set of components, players another, etc.) The callbacks get called mostly immediately when a component fires an event. An example use case of events would be animations: when the mobility component fires a "start move" event, the event can trigger the animation component to start playing a different animation.

Rewriting the client's rendering

This one's a pretty simple one. Tile depth sorting was moved to the GPU, and with the new entity system, entity rendering was also changed.

Previously, the world rendering system walked through each rendering component in the world every frame, looked up the entity's position from a separate memory address, then decided whether to cull it or not, and so on. In the new system, positions are cached in more CPU cache-friendly structures. For example, if a entity moves, an event is fired to it's rendering component, and the rendering component logic culls the entity and caches its position in an array of render commands. The array of render commands is iterated through every frame to draw all visible entities - render commands contain all the necessary data to place the entity's sprites properly on the screen.

Master server rewrite

The master server is the authoritative part of a single MUTA shard/world. It knows all the entities in the world and generates unique IDs for everything. Multiple simulation servers connect to the master server, each one of them simulating different parts of the world map.

While writing MUTA's proxy server (of which I also wrote my bachelor thesis), I feel like I finally "got" how I want to do multithreading with servers: handle all state on one thread, have other threads post events to that thread. The event loop works with a wait function akin to poll. Basically, an event-based approach.

Since then, I've been wanting to rewrite the rest of MUTA's server applications to use a similar architecture. To explain a little, the below table displays the programs that make up the server side software.

                                                                                                                                                                                                                                                       
ProgramDirectory name in repoEvent-based?
MasterserverNo
Simulation ServerworlddNo
Login Serverlogin-serverYes
Proxy ServerproxyYes
Old database serverdb-serverNo
World database serverworld_db (server_rewrite branch)Yes

Don't worry about the discrepansies in subproject naming conventions, I do have a plan for them now, believe it or not. It's just that the plan keeps on changing...

For the master server, an architecture change means that the main loop will no longer only run at a fixed rate: it will also be able to respond to events immediately, using blocking event queues. This is achieved with a structure akin to the pseudo-code example below.

int target_delta    = 17; /* Milliseconds */
int last_tick       = time_now();
int time_to_wait    = target_delta;
for (;;) {
   event_t events[64];
   int num_events = event_wait(events, 64, time_to_wait);
   for (int i = 0; i < num_events; ++i)
       _handle_event(&events[i]);
   int delta_time = time_now() - last_tick;
   if (delta_time < target_delta) {
       time_to_wait = target_delta - delta_time;
       continue;
   }
   update(delta_time);
}

Changing the architecture has meant a rather large amount of refactoring, as it affects nearly all systems on the master server. Since this has largely meant a complete rewrite, I have taken to also rewriting some of the systems into a mold I feel is better suited for the future. For example, the world/entity system that's used to control player characters, creatures and other game objects, is being completely written from scratch in the server_rewrite Git branch. The world API contains functions such as the ones below.

... uint32 world_spawn_player(uint32 instance_id, client_t *client,
   player_guid_t id, const char *name, int race, int sex, int position[3], int direction);
void world_despawn_player(uint32 player_index);
int world_player_find_path(uint32 player_index, int x, int y, int z); ...
Calls such as the ones above are asynchronous in nature, as they involve the simulation server instances connected to the master server. Hence, I've been thinking of reworking them in such a way that they would accept a callback, "on_request_finished" (or whatever). That would be alright for code clarity, but then, that would involve some memory overhead. The alternative is to handle finished requests inside the world API itself, meaning it will have to call back to some other API's that called it. You know, I'm constantly pondering where the line of abstraction should lie: tight coupling isn't pretty, but abstraction often comes at a great programmatic resource cost. In the above case, there's little reason to create a datastructure for saving callbacks and their user data (void pointers) if there's really only one logical path the code can take when a response arrives. I try not to fall down the trap of "OOP" and "design patterns" just for the sake of such silly things, but at the same time, sometimes I have an engineer's urge to overengineer things. Usually I end up with the more practical, less abstraction-based approach. After all, I know every dark corner of my own program, or so I at the very least believe.

The rewrite has taken about two months now and I think it will still take some more time, partly because at the same time I must make changes to other programs in the server side stack as well. At the same time, new programs are coming in, such as the world database server. It will be interesting to see how things will work out when the server starts up again for the first time... Well, maybe frustrating is a more appropriate word.

World database server

The world database server is a new introduction to the server side stack. Previously, MUTA had a "db-server" application, but there was no separation between individual world/shard databases and account databases - now, that separation is coming.

The WDB is an intermediate program in between the MySQL server and the MUTA master server. It's sole purpose is to serve requests by master servers through a binary protocol while caching relevant results. The intention is that this is the de-facto way to access a single shard's database.

There's an asynchoronous API associated with the WDB. It's consists of a set of functions, each one of which performs a specific query. The query functions also take in callbacks as parameters - the callbacks are called when the query completes or fails.

wdbc_query_id_t wdbc_query_player_character_list(
   void (*on_complete)(wdbc_query_id_t query_id, void *user_data, int error,
       wdbc_player_character_t *characters, uint32 num_characters),
   void *user_data, account_guid_t account_id);

Not much else to say about it right now... It's event-based like the rest of the newer applications on the server side. Will keep working on it!

packetwriter2

Back when the MUTA project was started, Lommi wrote an application called MUTA_Packetwriter for network packet serialization. It compiles C code, structs and serialization functions, from a simple file format where the fields of each network packet are defined.

Lommi's tool has saved us countless of hours of writing arbitrary code and debugging it, but now that he is no longer working on the project, and there are many new packets making their way into the protocol with new applications such as the world database coming, I've deemed it necessary to write a new version of this program.

packetwriter2 will use a new, simpler file format, the ".def" format used in many of MUTA's data files. It will support features I've wanted for a long time, such as arrays of structs and nested structs. I've started writing the parser, and below is an example of the file format.

include: types.h
include: common_defs.h

group: twdbmsg
   first_opcode = 0

group: fwdbmsg
   first_opcode = 0

struct: wdbmsg_player_character_t
   query_id    = uint32
   id          = uint64
   name        = int8{MIN_CHARACTER_NAME_LEN, MAX_CHARACTER_NAME_LEN}
   race        = int (0 - 255)
   sex         = int (0 - 1)
   instance_id = uint32
   x           = int32
   y           = int32
   z           = int8

packet: fwdbmsg_reply_query_player_character_list
   __group     = fwdbmsg
   query_id    = uint32
   characters  = wdbmsg_player_character_t{MAX_CHARACTERS_PER_ACC}

Along the way, I think the encryption scheme needs a rework, too. Not the basic algorithms behind it (MUTA uses libsodium for that), but the fact that currently, messages are encrypted on a per-message-type basis. Being able to turn encryption on and off in the stream would save bandwith and improve performance, as multiple messages could be encrypted in a single set.

Keepin' busy

Honestly, I've been having a tough time scraping up enough time to work on MUTA after starting my current job. Also, turns out motivating myself is difficult if I don't constantly have something to prove to someone (you know, like progress reports to people you know in real life and stuff). Guess I should go and get one of those productivity self-help books soon.

A generic hashtable in C

2.3.2019 23:58:15

When ever I've implemented hashtables in C in the past, the way I've made them "generic" has been to create macros that declare new types and sets of functions for the tables wanted. For example, taken from the MUTA source code:

#define DYNAMIC_HASH_TABLE_DEFINITION(table_name, value_type, key_type, \
   hash_type, hash_func, bucket_sz) \
   ... \

...

DYNAMIC_HASH_TABLE_DEFINITION(str_int_table, int, char *, uint32,
   fnv_hash32_from_str, 4);

The above macro would declare a new type, str_int_table_t, and functions such as str_int_table_init(), str_int_table_insert() and str_int_table_erase().

With such macros, it's a bother to have to call the macro before using any new type of a hashtable. However, for example the stb library implements easier-to-use, generic dynamic arrays with macro trickery, so why not try the same with hashtables?

Turns out working with a more complex structure like a hashtable takes a little more work than dynamic arrays. We cannot automatically create versions of a single function for different data types in C like we could with C++ templates, so the main problem is passing compile-time data such as struct member types and sizes to functions. And we do want to use functions because we will need complex statements such as loops - such statements do not (very cleanly at least) fit inside plain macros.

The above paragraph might not have made immediate sense, so I'll attempt to clarify my point in practice. Below is the new hashtable declaration macro.

#define hashtable(key_type, value_type) \
   struct { \
       struct { \
           struct { \
               key_type    key; \
               value_type  value; \
               size_t      hash; \
               uint8_t     reserved; \
           } items[HASHTABLE_BUCKET_SIZE]; \
       } *buckets; \
       size_t num_buckets; \
       size_t num_values; \
   }

Below is how it's used.

/* Declare a hashtable called 'table' whose key type is char * and value type
* is int. */
hashtable(char *, int) table;

As may be observed, the type of the resulting struct is anonymous, meaning it cannot even be passed to a struct as a void * and then cast back to its original type. So how do we do what we need to do? Below is the hashtable_insert_ext() macro and the signature of the _hashtable_insert() function the macro calls.

#define hashtable_insert_ext(table_, pair_, compare_keys_func_, \
   copy_key_func_, ret_err_) \
   (table_.buckets = _hashtable_insert((ret_err_), \
       (uint8_t*)table_.buckets, &table_.num_buckets, &table_.num_values,        \
       sizeof(*table_.buckets), sizeof(table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0], \
           &table_.buckets[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].key, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].value, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].hash, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].reserved, \
           &table_.buckets[0].items[0]), \
       &pair_.key, sizeof(pair_.key), pair_.hash, &pair_.value, \
       sizeof(pair_.value), compare_keys_func_, copy_key_func_))

void *_hashtable_insert(int *ret_err, uint8_t *buckets, size_t *num_buckets,
   size_t *num_values, size_t bucket_size, size_t item_size,
   size_t items_offset, size_t key_off1, size_t value_off1, size_t    hash_off1,
   size_t reserved_off1, void *key, size_t key_size, size_t hash, void    *value,
   size_t value_size,
   int (*compare_keys)(const void *a, const void *b, size_t size),
   int (*copy_key)(void *dst, const void *src, size_t size));

Yep, instead of passing in a struct to _hashtable_insert(), we pass pointers to each of the struct's relevant members. Sizes of some data types are also passed, and finally, struct offsets using the macro _hashtable_ptr_offset(). That allows us to modify everything at the byte level, which we do inside _hashtable_insert(). As a side note, the function callbacks passed are there to allow for dynamic types such as copyable strings.

One may wonder if this is an efficient way to do things. I would say it probably isn't, not in terms of runtime efficiency anyway. But in terms of programmer efficiency, I think it might be the right choice until I come across a better solution. That's because with the this implementation, doing things such as the following is super easy compared to what I used to have to do.

hashtable(uint32_t, int) table;
hashtable_init(table, 8, 0);

struct {
   uint32_t    key;
   int         value;
   size_t      hash;
} insert_data = {123, 5, hash_u32(123)};
hashtable_insert(table, insert_data);

struct {
   uint32_t    key;
   size_t      hash;
} find_data = {123, hash_u32(123)};
int *value = hashtable_find(table, find_data);
if (value) {
   ...
}

Okay, maybe it isn't the most intuitive API ever, but I like it far more than having to write type/function declaration macros like I used to. And for my hashtable use-cases, I'm willing to pay for the potential runtime-overhead.

Also, I didn't actually have a hashtable that stored the actual keys in MUTA before. While not storing keys is fine for assets and similar items to save on memory, for runtime-generated ID's a table like this will be required.

I've been working on this over the weekend and it likely still has some bugs in it, but in case you're curious, the Git repository can be found here. I've also uploaded the current header and source files here and here, respectively. I'll keep on working on it and start using it in MUTA very soon.

Edit 3.3.2019
After a night of thinking I decided to the change the API so that it would not require special structs created by the user. The reason I used them originally was an attempt to force certain type-safety, namely preventing accidental pointer key casting to void * (since that can happen implicitly in C). An example of such a mistake would be when you try to pass in the address of a char *, but accidentally pass in the pointer's value rather than it's address.

char *key = ...;
hashtable_insert(
   table,
   key, // Should be &key but no error is generated due to implicit cast to void * inside the macro
   &value,
   NULL );

I concluded that similar "type-safety" could be achieved by having the user pass references to variables directly instead of their addresses. The macros will instead fetch the addresses and we should get a compilation error when certain mistakes happen. For example, if the user passes in a key as '&key', the macro would format it to '&&key', which would result in an error. The API change should reduce the amount of lines required to achieve the same result, and I think could also prove out to be more clear. Below is an example of the new API in use.

hashtable(uint64_t, uint32_t) table;
...
uint64_t    key     = 12345;
uint32_t    value   = 54321;
size_t      hash    = hashtable_hash(&key, sizeof(key));
hashtable_insert(table, key, hash, value, NULL);

I've also uploaded the newer source files here and here.

Sorting isometric sprites on the GPU

27.2.2019 15:05:30

I've been reworking much of the MUTA client for the first two months of 2019. One of the reworks has been to rendering, and amongst other changes to it, rendering now uses hardware sorting for sprites to draw 2D tiles from back to front.

Anyone who's put a little thought into isometric 2D games knows the draw order of tiles and objects is a little bit tricky. I love the isometric look and have worked on a bunch of isometric 2D game prototypes, but in all of them before this, I've done sprite sorting on the CPU. Not so this time.

Starting out, as any modern developer would, I searched the internet for the formula to correctly calculate each tile's depth based on it's position in the visible part of the map. Surprisingly, I didn't find it. Guess isometric 2D games aren't so hot anymore. Anyway, that's kind of the reason I'm making this post.

Here's a descriptive picture of the hypothetical area we are rendering.

I first attempted to calculate the depth based on the tile's distance from the camera.

int area_width  = 3;
int area_height = 2;

for (int i = 0; i < area_width; ++i)
   for (int j = 0; j < area_width; ++j)
       for (int k = 0; k < area_height; ++k)
           float z = (float)(i + j + k) / (float)(area_width + area_width + area_height);

This approach works fine if all objects are of the same height, that is, the height of a single tile. But we run into problems when that isn't the case.

Above, the character is being clipped by tiles that are higher than him on the map when the tiles should actually be rendering behind him.

A different forumula does the job better.

int area_width  = 3;
int area_height = 2;

for (int i = 0; i < area_width; ++i)
   for (int j = 0; j < area_width; ++j)
       for (int k = 0; k < area_height; ++k) {
           float row_max   = (float)(i + j + 2) / (float)(area_width + area_width);
           float row_base  = (float)(i + j) / (float)(area_width + area_width);
           float row_diff  = row_max - row_base;
           float z         = row_base + row_diff * ((float)k / (float)area_height);
       }

And the result looks like follows.

The idea is that since the z value must be between zero and 1, we need a maximum value. The render order we need is that the closer to the camera a "pillar" of tiles is (a pillar being a stack area_height of tiles at an x-y coordinate), the later it gets rendered. How ever, we also need to have depth values for the individual tiles that constitute a pillar so that the higher a tile is, the later it gets rendered. So, we have to find the depth value of the highest tile in the pillar and gradually increase each tile's depth value, coming from bottom to top, until we reach that maximum.

Anyway, thought I'd throw that out there.

Own your tech: on the Unity-Improbable incident

15.1.2019 07:32:31

In the past week there's been a debacle between the game technology companies Improbable and Unity Technologies. Improbable, as you probably know, produces a set of software called SpatialOS, apparently intended to make the creation of scalable massively multiplayer online worlds easier. The software is aimed at the users of the major contemporary game engines and, if I'm not mistaken, also has support for custom engines.

As I understand it, the way Unity came in the way of Improbable was by changing their Terms of Service. Currently, it seems that Unity is planning on its own cloud-gaming service, and after failed negotiations with Improbable, decided to cut out the competition via legal means.

There's been a lot of talk about how this change will affect the companies already using Improbable's technology with Unity, many of which are small independent companies. Epic has claimedly used the incident to its advantage, funding the transition of Unity games to Unreal Engine by millions of dollars. It is still unclear to me whether or not existing games or games in development will be allowed to operate with Unity and SpatialOS, but if not, the effects will be devastating for the smaller companies.

The main point to take home from this? In my view, it is that you should always own your technology, or use free and open source software. The IT industry has this obsession today with "saving" development time by buying all the technological parts of products from different sources. But not only does this lead to poorer software due to the commercial parts not being specialized for the types of products being made, but as this incident shows, using    proprietary third party parts can lead to serious legal issues.

While I retain my opinion that all of the best games ever made were written using custom engines tailored specifically for them, if I was to use an existing game engine, I would use a free and open source one, such as Godot, which has been making great strides in the past few years and gained quite a bit of popularity. It takes tech of your own or an open license to ensure no catastrophy such as this one caused by greedy companies fighting each other can ever happen to your game.

Hello proxy

24.11.2018 16:17:03

This just in: MUTA shards can now only be connected to through the MUTA proxy server (still in the login-branch of the Git repository). Pictured is the first play session through a proxy, the server applications running in the terminals above.

Yep, still the same old unfinished developer graphics though.

Buy nothing day

23.11.2018 07:21:24

It's that time of the year again: the year's biggest celebration of consumerism, Christmas, is looming over. But even here in Finland it's not just Christmas anymore that makes people go and mindlessly buy things they don't need; no, we need to start the wasteful buying of things early before December. The American phenomenon of Black Friday has made it's way over here, machined by the merchants who know well how to exploit the generations that grew up on the internet, learning everything about all the cool American "traditions".

If you're about to take part in the consumption mania of today, I would ask you to consider the following: make it a buy nothing day instead. At the very least, ask yourself the following questions.

  • Do you really need the thing you thought of buying today? Would you buy it even if it wasn't on sale? If you have an item similar to what you're about to buy, still in good condition, do you really need a new one this soon?
  • Consider what you might do with the money saved if you didn't spontaneously spend it on things just because they were on sale for one day (or in the case of Finland, for a whole week!)
  • Think about what the impact of overconsuming things such as electronics is to the environment.

Happy buy nothing day.

Event-based server architecture

22.11.2018 12:10:49

What's the right way to do multithreading? There's not one single answer to this as every program has it's specific needs and purposes. But on a per-program basis, one approach will probably be better than another.

Game developers not working on high-end engines rarely have to worry much about multithreading. Even if a game is multithreaded, at the core most games' logic is single-threaded.

For instance, if your game has a dedicated rendering thread, your previous frame might be getting rendered while you simulate the current one on the main thread. But what you're doing is you're just replicating the relevant data from the main thread for use by the render thread, thus not affecting the actual game logic, which remains single-threaded.

For another example, your physics system might parallelize a for-loop to multiple threads, all of which are synchronized at the end of the loop. This also does not change your game's logic from single- to multithreaded, you're just iterating some loops faster by parallelizing them.

There's a reason the above two examples are common in games, and it is that they are easy to program for, while getting some of the benefits of dividing work to multiple threads. When all of the real logic happens on a single thread of execution, the control flow is predictable and easy to understand. It might also be more efficient than a truly multithreaded alternative, in case the multithreading is done badly, or the algorithm is not suitable for threading.

Networking and threads

While you can well write a game using just a single thread, writing network code without threads is more difficult. This is because in network programming many blocking calls are used, such as poll() or recvfrom(). Usually, we want a thread to sit there and wait until network messages or new connections arrive. But a sleeping thread cannot do much else, so the other logic must happen on another thread.

It is usually possible to give blocking functions a timeout, so that even if they have no new events to report to us, they will return after a given amount of time. In this way, one can imagine a game-loop-like server architecture that polls for messages and new connections, all the while also executing all of the logic on the same thread as well. With a timed wait, we could have the loop run at approximately, say, 60 FPS.

But servers that want scalability probably don't run on a single thread. I'm no expert of course, so take this with a grain of salt, but so my investigations on the internet suggest. It seems common to have threads dedicated simply to polling for new network events.

State makes things difficult

If the only job of one of our threads, the network thread, is to sleep on poll(), epoll() or similar call and wait for new incoming messages and connections, the logic of the program is handled somewhere else. Instead of interpreting incoming data right there and then on the network thread as it arrives, the interpretation happens on a different thread.

Applications such as game servers use long-lasting connections instead of quickly terminated ones. For long connections, the server has to store some kind of per-client state: the socket of the client, a session id, incomplete messages, game character variables, etc. After acknowledging we have a lot of state, the question becomes: where's the right place to manipulate it, considering our program is multithreaded?

To store the state of our clients we might have a struct called client_t. We also have some kind of a container that contains those structs. Lets say the structs are stored in a pool of a fixed size.

typedef struct client_t
{
   int     socket;
   uint32  session_id;
   uint8   unread_messages[512];
   int     num_bytes_unread;
} client_t;

client_pool_t clients;

/* Returns NULL if no free clients are left in the pool */
client_t *new_client(client_pool_t *pool);
void free_client(client_pool_t *pool, client_t *client);

With the above premise, let's assume our network thread accepts clients and reserves client_t structs for them. If the clients disconnect (recv() returns 0), the network thread frees the client struct.

Now, if all of our connections need to be stored in our global client_pool_t data structure, how do we access it safely from multiple threads? Remember that the network thread does not handle any of the actual (game) logic in the program: that happens elsewhere, and that logic also touches the client structures as it must send messages to the clients or interpret their messages stored in the unread_messages buffer, etc.

The naive answer to safe access is to use a mutex. Wrap the new_client() and free_client() calls to a mutex and you're fine. Now when the network thread accepts a new client, it can safely call new_client() and then somehow signal the main thread that a new client was accepted, passing a pointer to the newly allocated struct along.

There are two problems with this approach: the possible performance concerns, and the (in my opinion) more important topic of code complexity. These problems are somewhat intertwined.

What kind of problems might the above approach lead to? Say that at some point in development you realize that there are situations where, on the main thread, you have to iterate through all of your clients. For that, you might create an array of pointers to the currently used client structs. So now, when you accept a new client and reserve a new client struct from the pool, you have to also push a pointer to the struct to said array.

But now, what might happen is that while the main thread iterates through all of the clients, one of the clients disconnects. In this case the networking thread frees the client struct of the disconnected client. Since we don't want this to happen while we're iterating through the clients, we have to lock the pool's mutex in both cases: while iterating, and while deleting a client.

The possible performance implications of the above are easy understand: the iteration can leave the mutex locked for a while, meaning the networking thread gets blocked until the iteration finishes.

But more importantly, using the above approach we are forced to constantly remind our selves of the fact we can't just go doing anything to our clients on the main thread without remembering the mutex and the possible race conditions. As we add features, the code becomes more and more complicated, and easier to make a mistake at. The above was a simple and somewhat dumb example. For another example, consider what should happen if the networking thread writes recvd() data into a client's unread_messages buffer, and the main thread handles that data. Are you going to use a per-client mutex to protect access to the data buffer while it's getting written to or read from? That's gonna be some more complex code using locks.

To some extent I'm speaking from experience here. When I started writing the MUTA server, I had little clue about how to write a complex server capable of handling a high amount of clients. I'm not sure if I still have the right idea of that, but I know that multithreading was a pain in my ass for a long while, and to some extent it remains so, although I think it's gettin getter.

Handling state on a single thread

So far I've found that the simplest way of dealing with the above woes is to be simple and handle all state changes on on a single thread. No allocations of stateful data structures or otherwise manipulating them from the networking thread or anywhere else. I've referred to this as event-based architecture, but I'm not sure if that's the correct term; all I know is that I've heard the term elsewhere and assumed this was what they meant. Then again, who cares what it's called.

In this model, when a network thread receives a message, it will not handle the message immediately. Instead it will pass it on to another thread as an event. The thread that handles the events (usually the main thread) can sleep on a blocking function from which it wakes when new events are posted from another thread.

Essentially, both the main and network thread will sleep in this model until something relevant to them happens. Of course, you can have more than one thread pushing events to the main thread though, and you can also have the main thread posting events to other threads. MUTA's login server has a database thread with which the main thread communicates in a two-way fashion, posting and receiving events from it.

Current implementation in MUTA

MUTA's server consists of multiple programs now: the login server, the shard server, the world simulation server, the database server and the proxy server. The two newest applications, the login and proxy servers, implement the architecture described in this post.

The sleeping and waking of a thread is implemented using a condition variable that sleeps on a mutex. The same mutex is locked when items are pushed to the event queue.

The event queue is also a blocking queue. It has a fixed size, and if it fills up, meaning the thread pushing events is pushing them faster than the main thread can handle them, the push function will sleep until space becomes free. This is also implemented using a condition variable.

The event data type of the MUTA login server looks as follow at the moment.

enum event_type
{
   EVENT_ACCEPT_CLIENT,
   EVENT_READ_CLIENT,
   EVENT_ACCOUNT_LOGIN_QUERY_FINISHED,
   EVENT_ACCEPT_SHARD,
   EVENT_READ_SHARD,
   EVENT_LOGIN_REQUEST_RESULT
};

... various event structures ...

struct event_t
{
   union
   {
       int                                     type;
       accept_client_event_t                   accept_client;
       read_client_event_t                     read_client;
       account_login_query_finished_event_t    account_login_query_finished;
       accept_shard_event_t                    accept_shard;
       read_shard_event_t                      read_shard;
   };
};

As an example event, we can take the accept_client event, which is fired when the network thread accept()s a new connection. It looks as follows.

struct accept_client_event_t
{
   int         type;
   socket_t    socket;
   addr_t      address;
};

As we can see, the event data type is a union. That's because they all get posted into the same array. But what about events that have to transfer variable amounts of data from thread to thread, such as socket recvs() (that is, packets received from the network)? In such cases, we use a simply mutex-locked allocation pool. Lock the allocator's mutex, allocate data, unlock the mutex. At read, we free the allocated memory back to the allocator in a similar fashion.

The allocator used here is a simple segregate fit -type of allocator. The logic is this: on malloc-requests, round up the requested amount to the nearest power of two. Find out the highest (leftmost) bit of the amount and use it as an index into an internal pool of memory blocks of the given size.

While the allocator calls real malloc when it needs more memory, it has an upper capacity to which it will allocate, since the event queue is blocking. Thus it is not possible to DDOS the server to run out of memory by simply spamming packets at it.

The event API itself looks as follows.

typedef struct event_buf_t event_buf_t;

void event_init(event_buf_t *buf, uint32 item_size, int32 max);
void event_push(event_buf_t *buf, void *evs, int32 num);
int event_wait(event_buf_t *buf, void *evs, int32 max, int timeout_ms);
/* -1 as timeout means infinite */

And, using the API, the login server's main loop looks as follows now.

for (;;)
{
   event_t events[MAX_EVENTS];
   int num_events = event_wait(event_buf, _events, MAX_EVENTS, 500);
   for (int i = 0; i < num_events; ++i)
   {
       event_t *e = &_events[i];
       switch (e->type)
       {
       case EVENT_ACCEPT_CLIENT:
           cl_accept(&e->accept_client);
           break;
       case EVENT_READ_CLIENT:
           cl_read(&e->read_client);
           break;
       case EVENT_ACCOUNT_LOGIN_QUERY_FINISHED:
           cl_finish_login_attempt(&e->account_login_query_finished);
           break;
       case EVENT_ACCEPT_SHARD:
           shards_accept(&e->accept_shard);
           break;
       case EVENT_READ_SHARD:
           shards_read(&e->read_shard);
           break;
       default:
           muta_assert(0);
       }
   }
   cl_check_timeouts();
   shards_flush(); // Flush queued messages to shards
   cl_flush();     // Flush queued messages to clients
}

And so, most logic is handled on a single thread.

Converting other systems to the same approach

The MUTA server's older parts are not implemented using the model discussed in this post. Although they also handle most logic on a single thread based on events generated by other threads, the model they use is not quite so refined. Instead it is kind of an ad-hoc implementation of the same idea, so instead of having a single event buffer, there are many. These buffers are not blocking like in the case of the login server how ever.

The old parts of the server are also frame-based, in that their main thread runs the logic in specified increments of time (FPS, frames per second). With an event queue the thread can sleep on we can remove some of the latency provided by the frame-based approach. To make timers based on delta-time still function correctly, we can simply calculate a sleep-time for the event_wait() function much like we would sleep during a frame if we were waiting for vsync in 3D program.

I'll be looking at converting the old programs to this model at some point in the distant future, unless I come across something better.

Give me proper networking tools on Windows

10.11.2018 21:44:09

Last week I spent multiple days creating a bad, inefficient abstraction over WSAPoll, which itself is a bad, inefficient abstraction of other functions, simply provided for UNIX compatibility by Microsoft.

The thing is, Windows has efficient, scalable methods of managing network IO, namely, IO completion ports. But it's such a pain to use in a cross-platform application. With IOCP, because you only get informed after a recv() on a socket is done, not when you can do one yourself, you must maintain unique buffers on each one of your sockets. And if your application is a cross-platform one, you basically have to make your Linux abstraction work in a similar fashion, because the socket recv() function just cannot be used the same way with both systems. (As a side note, I've built an API for async socket IO that works on both systems, using IOCP on Windows and epoll on Linux. It's called netqueue and it's included in MUTA's source code.

Linux on the other has epoll, one of the comfiest APIs I've every used. I love epoll in terms of how easy to use it is. Add a file descriptor or many to an epoll instance and just sleep on epoll_wait() until something happens, like you receive some data over the network.

I want epoll on Windows! But I can't get it, not without additional dependencies to external libraries or a lot of work.

My saving grace is that our game's servers don't need to run on Windows in release mode. The only reason the server runs on Windows is that sometimes we have to develop on it, and in such cases its handy to have the server run on the same machine with the client. So what I've done is I've wrapped WSAPoll to look a bit like epoll. It's a terrible hack and certainly inefficient, but it'll do for our purposes. Of course, features such as EPOLLET, edge-triggered behaviour, will not work on Windows (in edge-triggered mode, new events for sockets are only notified of once, not until the socket is recv()d from), and in fact even a library I found, attempting to provide epoll for Windows, didn't support said feature.

Still. I want epoll or something similar to it on Windows. Why does Win32 programming have to always be such a pain?

MUTA devlog 4: proxy and login servers and future art direction

2.11.2018 17:48:17

The past few months have granted me little time to work on MUTA, our 2D MMORPG. There have been events to attend (NGS), another trip to India (IGS), and work. But I've done a bit of MMO work nevertheless.

The proxy server

I began working on MUTA's proxy server. It is a server that sits between the client and the game server, dispatching messages in both directions. The purpose of it is to never reveal the true IP address of the game server, making it more resilient towards DDOS attacks. Each proxy could be connected to multiple game servers, and each game server to multiple proxies, so that two people playing on the same server might be connected to completely different IP addresses.

I'm writing the initial implementation using TCP, in C as the rest of the game is written in. Adding another layer of connectivity, especially with TCP, will of course increase latency, so it remains to be seen if we should do the game-server-to-proxy connection via UDP later.

The idea for the proxy came from many sources, but one of them is CipSoft's Matthias Rudy's GDC talk about the infrastrucuture of Tibia. The slides for the talk are here.

The login server

Once I had sat down and begun writing the proxy server, I went as far as accepting clients and reading their packets before I realized I should probably write the login server at this point, too. Currently in MUTA, the game server acts as a temporary login server. It wouldn't make much sense to write login-related functionality code between the proxy and game servers, only to remove it later. So I also started up the login server project.

Event based architecture

On the login server I thought I might try out something that's new for me. I'm not a very experienced network programmer, and hence I haven't had the chance to try out that many different tricks yet.

Dividing work between threads is still an issue I constantly run into when writing server software. Usually, because of my background in game programming, I've simply settled for having a main loop that runs at a certain frame rate, and the other threads feed work to that thread, such as handling incoming accepts or reads. Work is fed using various buffers that store different kinds of events (for example, a client received something in it's buffer, the message should be handled). This is not very elegant or energy-efficent design.

On the login server, I thought I might still have one thread executing the main logic, but instead of locking the thread's frame rate, I'll try having the main thread sleep until it gets work from other threads. I will also try to unify the event buffers so that most events would be inserted into a single queue. Insertion would block the feeding thread if the queue is full. Well, we'll see how that goes.

Art direction

Something thing we've recently (and before, too) been pondering with Lommi is what would be the most practical direction we could take in regards to art in MUTA. It's been a tough subject!

We know the feeling we want the game to go for: the sort of gritty fantasy provided by Robert E. Howard's Conan stories and other sword-and-sorcery works. We also really dig the artstyle of many of the comics made of Conan. In a 2D game that would look awesome. But 2D art in 8 isometric directions is painstaking, and it's complicated by things such as displaying equipment on characters while keeping animations smooth-looking.

We thought of a couple of directions we could follow. One would be to reduce the game's resolution from 64 pixel tiles to 32 or 16 pixels. Such a low resolution would allow even us programmers to make art for the game, albeit low quality art. We didn't really like the idea, as it doesn't let us present the game world as such a gritty, "mature" type of place as we'd like, and it would limit the art style a lot. Besides, the programmers probably shouldn't do the art anyway.

Another thought we had was going 3D. Neither of us two working on the game can do 3D art, but 3D would mean not needing to draw 8 different directions for each object (or at least each character). Then again, 3D would probably present our style in quite a bit of a worse manner than well-made 2D in isometric space.

We didn't decide anything yet, other than that the direction we have sought so far would look the best.

Going to India Game Summit 2018

21.9.2018 21:46:40

As I wrote last month, I travelled to Andhra Pradesh, India during the summer to teach game programming on a summer course over at a university there. At the end of this September (less than a week from now) I have the pleasure of going back to the country, this time as an organizer for the first ever India Game Summit.

The event this year will be a game jam, held at four different universities and colleges across the country simultaneously. How ever, it is in the hopes of organizers (the main one of whom is Kajaani University of Applied Sciences) that the event will grow into a games industry conference in the future, something akin to the Northern Game Summit we have in Finland, with speakers from different companies and organizations sharing their knowledge.

An interesting point about organizing a game jam event in India is that it seems to be a fairly foreign concept, at least at universities (including the capital area). Even the term game jam has been mostly unheard of; hence, we've dubbed the event a hackathon instead - a term familiar to most IT people. Organizers have also been contacted by to-be participants worried if it is even possible to build a game in 2 days. It may seem funny to a regular game jammer, but then again there are monetary prizes on the line here.

The event will begin on 28th September and end on the 30th of the same month. I myself will be travelling to Aditya Engineering College in Andhra Pradesh. I'm quite excited and interested to see how things work out, to be honest.

New website logo

In other news, this website has a fancy new logo. Our friend Laura made it. Big thanks to her!

Page: 0 1 2