We fulfill power fantasies

RSS Feed

Generic hashtable in C (2): the final API

14.8.2019 18:00:23

Some time back I wrote a generic hashtable in C. Since that post, the API has seen some iteration as I've been using the code in my projects.

This hashtable heavily relies on macros that call type-generic functions which modify struct members based on their offsets and do other evil stuff of that sort, all in a way that isn't necessarily completely standard-conforming nor the most efficient way to do things. In exchange, it accepts any kind of data as keys or values, and accepts any hash function that returns a size_t. Below's a short description of the final API that differs a little from the one described in the original post.

Initialization

hashtable(uint64_t, uint32_t) table;
int err;
hashtable_init(table, 16, &err);
if (err)
   /* Handle error */

Insertion

int         err;
uint64_t    key     = 12345;
uint32_t    value   = 54321;
size_t      hash    = hashtable_hash(&key, sizeof(key));
hashtable_insert(table, key, hash, value, &err);
if (err)
   /* Handle error */

Searching

uint64_t    key     = 12345;
size_t      hash    = hashtable_hash(&key, sizeof(key));
uint32_t    *value  = hashtable_find(table, key, hash);
if (value)
   /* Do something with value. */

Erasing

uint64_t    key  = 12345;
size_t      hash = hashtable_hash(&key, sizeof(key));
hashtable_erase(table, key, hash);

Iteration

uint64_t key;
uint32_t value;
hashtable_for_each_pair(table, key, value) {
   /* Do something with key and value */
}

Strings and other comlex data types

This hashtable saves a copy of each key in its entirety. For non-fixes size types, three functions must be defined compare_keys, copy_key and free_key. Below is an example of insertion into and erasure from a string-int table.

Allocation and copying function examples

int compare_keys(const void *a, const void *b, size_t size)
   {return strcmp(*(const char**)a, *(const char**)b);}

int copy_key(void *dst, const void *src, size_t size)
{
   size_t len = strlen(*(const char**)src);
   *(char**)dst = malloc(len + 1);
   if (!*(char**)dst)
       return 1;
   memcpy(*(char**)dst, *(const char**)src, len + 1);
   return 0;
}

void free_key(void *key)
   {free(*(char**)key);}

Insertion

char    *key    = "one";
size_t  hash    = hashtable_hash(key, strlen(key));
int     value   = 1;
hashtable_insert_ext(table, key, hash, value, compare_keys, copy_key,
   &err);
if (err)
   return -1;

Erasing

char    *key = "one";
size_t  hash = hashtable_hash(key, strlen(key));
hashtable_erase_ext(table, key, hash, compare_keys, free_key);

Typesafe function declarators

Finally, there's a couple of convenience macros for defining type-safe inline functions for tables of a given type. Their purpose is to reduce the need to type in all of the macro parameters every time as there can be quite a lot of them otherwise.

/* Define a struct named struct str_int_table and a set of functions for it */
hashtable_define_ext(str_int_table, const char *, int, compute_hash,
   compare_keys, copy_key, free_key);

...

struct str_int_table table;

if (str_int_table_init(&table, 8))
   /* Handle error */

if (str_int_table_insert(&table, "one", 1))
   /* Handle error */

Might still be bugs in the code, but I'll fix 'em as I come across 'em.

hashtable.h
hashtable.c
git

Crossplatform makefiles (Nmake and GNU Make)

5.8.2019 19:49:37

Here's a handy little cross platform makefile tip I came across recently. The original tip is from user Bevan Collins on StackOverflow. Thanks to him! This tip is useful if you need build stuff using nmake on Windows and GNU Make on what ever platform(s) you're using it on. Heck, it may work with other make implementations, too.

We're going to need 3 files in the same directory:

  • nmake.mk
  • gnu.mk
  • Makefile

Now start your Makefile with the following lines.

# \
!ifndef 0 # \
include nmake.mk # \
!else
include gnu.mk
# \
!endif

Next, place any Nmake-specific variables in nmake.mk and GNU Make specific variables in gnu.mk. You'll want to at least define a variable for the path separator (slash or backslash) and use it for any file paths (I just name the variable 'S').

  • For *nix: S = /
  • For Windows: S = \\

Finally, place all your build targets in the Makefile. Done!

Bevan's include code, as he explains, works because GNU Make recognizes the symbol \ as a line continuation even in comments where as Nmake doesn't.

Why can't we just use ifdefs you ask? That's because conditional statements use different keywords in Nmake and GNU make: nmake ifdef instructions start with an exclamation mark where as GNU make ones don't.

The unfortunate thing about this approach in comparison to two separate, platform-specific makefiles is that you won't be able to benefit from parallelization on Windows if you simply build the same targets on both platforms. That's because while GNU make has the -j flag to process multiple targets in parallel, Nmake lacks similar functionality. If you want to create parallel builds using cl (the Microsoft compiler) on the command line, you must do so at the compiler level (/MP), which means the makefiles' targets are going to inevitably look a little different. If your project isn't too big though, you're unlikely to benefit much from parallel building on Windows - I rather take the maintainability of a single Makefile than that. And I still get to keep away from awful, overly complicated build systems like CMake.

Trying out a different desktop environment

30.6.2019 14:28:44

I'm a fan of tiling window managers on GNU/Linux. There's something real comfy about using your graphical desktop environment with nothing but your keyboard. I find the paradigm is excellent especially for programming, where you often have multiple terminal windows open at the same time. Now, GNU Screen and tmux exist to split up the terminal into multiple windows, but they're a bit of a hassle and come with their own problems. A desktop environment usually automatically supports multiple monitors and virtual desktops, and has sane default keybindings to work with, with the added benefit that you can also tile any non-terminal programs.

i3 has been my window manager of choice for many years now, and it has worked well. I love the workflow I've developed for it. While programming, I usually have...

  • One virtual desktop for Vim.
  • One virtual desktop for compiling and running (2 terminals).
  • One virtual desktop for Firefox and other graphical programs.
  • Possibly one virtual desktop for running multiple programs after compilation at the same time.
  • One virtual desktop on a secondary monitor with IRC, a music player and possible other chat programs.

Typically I spend most of my time in Vim and compiling/running the program I'm currently developing - two different virtual desktops. i3, like many window managers, has great, intuitive virtual desktop support.

i3 vs spectrwm

Having used the same desktop environment for such a long time, I felt like maybe I should try out something new for a change, you know, to give the old brain a tickle of sorts, and also to see if there's optimizations to be made in my workflow. Sometimes you just get stuck doing the same thing because you've always done it that way, right? So, I tried a few different window managers.

Of the ones I tried, the only tiling WM I ended up liking was spectrwm. It's quite similar to i3, but there were a couple of features about it I really liked:

  • The default config is very sane and good-looking.
  • No mouse required (like i3).
  • It feels light. For some reason, opening windows felt faster than i3 - in i3, you can sometimes kind of see the window resizing after it's been opened. Placebo? I don't know!
  • The bottom bar configuration scheme with a single shell script awesome.

Especially the bottom bar configuration is a feature I like. In spectrwm, the bottom bar's text is the output of a user-defined shell script. You can print just about anything there - just write a bash script with a timed while loop in it, print out the current time, network status, etc. and you have a nice-looking bottom bar with all your important info on it. What would make this event nicer is if there was a way to call into spectrwm to refresh the script - that way, when certain events happen, the script could be updated immediately rather than in the next iteration of the timed loop. Maybe there already is a way and I'm ignorant, I don't know. Of course, there are more compilated ways of doing such a thing in bash, but I don't think I'm up for that much work.

All in all, I really like spectrwm, but it has one feature that makes it bad for my desktop use. The feature is that with a dual-monitor setup, selecting any virtual desktop will place it in your current window, so that even if I created the desktop on the right monitor, it may end up in the left monitor later by accident. I find this a very distracting feature, and if there's a way to disable the behaviour, I'd appreciate to hear of it.

After all that, I am keeping i3 on my desktop machine, but on my laptop, I've moved to spectrwm, since it only has a single monitor and thus doesn't suffer from the aforementioned desktop-monitor-switching feature. It hasn't really changed my workflow, but feels faster and looks slicker by default. Besides, the configuration scheme with the shell scripts and all makes me childishly feel like a bit of a hacker.

MUTA devlog 4: solo dev, client entity system, master server rewrite, world database server...

25.6.2019 18:05:54

It's been a while since the last one of these, but now as we're celebrating midsummer and nightless nights here in Finland, I think its time for me to sit down and write a bit of a catch-up post about what's going on with MUTA, the Multi-User Timewasting Activity.

Officially developing an MMORPG solo

To recap, MUTA is a free and open source MMOPRG project started in 2017 as a student project. The idea for it came from myself and another Kajaani UAS student, Lommi, as we had both been talking of writing an MMO for a while. In fact, wanting to write an MMO was the primary reason I personally applied to the school's game development programme.

We spent two full project courses, each 2-3 months long, on writing the game. During that time we got much assistance from many different people. Game art students created art, and programming students helped us with some great tools and certain parts of the engine, and production students helped us get organized.


MUTA after 2 months of development, during a test with around 20 people online. The 64x64 art at this point was created by three Singaporean summer school students.

After (and between) said courses, it was just me and Lommi working on the game. I'm more of an engine guy, he's more of a gameplay guy, though neither of us is one thing exclusively. While it initially only took us two and a half months to get an OK-looking demo game running, writing an MMO engine and toolset properly takes a lot of time. And while such tools are in development, content creation is difficult, if not impossible. The lack of visible progress during a technology development phase such as this, I think, is problematic for people who are primarily driven by gameplay and visuals, and so after a while, I was mostly working on the game alone, always telling my friend I would try to get the engine and tools ready for content-development ASAP.

Last month, Lommi finally told me he felt the project was too big for him to work on it on the side at this point (he's also employed full-time and has been for quite a while). That leaves me as the only official developer of MUTA. But that's not such a big change after all: likely more than 90% of the current codebase was already written by me alone at this point, as Lommi has not really been involved during the last year or more.

In fact, it's a bit of a relief for me. I no longer need to worry about getting the game ready for others to develop content for it. I'm working a full-time job and try to spend the time I can on MUTA, but often time is simply hard to come by.

With this organizational shift in mind, I have some more changes coming up. I'm planning on reworking the theme of the game and possibly making the art myself. I'm a terrible artist, so it will probably come to simple indie pixel crap. But that's alright, the gameplay is the important part. As for the theme, I'm planning on simple high fantasy, due to the fact I'm not good enough of a visual artist to present a universe anything like what we originally planned for - the original idea was a sword and sorcery -type world inspired by the works of Robert E. Howard. I'm of the opinion that the theme should support the gameplay and not the other way around. However, I don't want to see fireballs flying all over in the style of Warcraft either - it's gonna be something lower key than that.


Character "art" from my first-ever game project, Isogen.

For now, MUTA remains a hobby project I try to pour much of my free time into. Time will tell what it actually evolves into, but I've got high hopes that one day it will be a real, online MMORPG. If not that, at least the code will be available for anyone to inspect.

Code changes

Phew, it's been many months since I last wrote about MUTA, so a lot of things have changed in the codebase, and some of them I don't even remember anymore. Some of the changes include (in a semi-chronological order):

       
  • Reworking of the immediate mode GUI into a standalone library.
  •    
  • Shared code cleanup (mostly just renaming things and organizing them into files)
  •    
  • Moving to MojoAL from OpenAL Soft on at least GNU/Linux. I don't know how great of an idea this is, but MojoAL is easy to embed into the project, having only two source files as opposed to OpenAL Soft's CMake hell.
  •    
  • New entity system for the client.
  •    
  • Rewriting the client's rendering.
  •    
  • Completely rewriting the master server.
  •    
  • Writing a world database server.
  •    
  • Writing a new async database API.
  •    
  • packetwriter2 tool for network message serialization.
  •    
  • Shared API and authentication for server-server connections (svchan_server and svchan_client).
  •    
  • New generic hashtable written as a separate library.

Client entity system

The entity system on the client needed a rework. It was something of an entity-component system (and I know how pretentious that term is) and remains so. This job had two distinct motivations:

       
  • Making the code clearer
  •    
  • Performance
I feel like both goals were achieved. First of all, the code needed breaking into more source files as previously the whole world code was in a single file (I feel this sort of isolation is more future-proof for this project), but second, I really wanted less weird macros and more flexibility in component implementation. To recap the new system:
       
  • An entity has a fixed-size array of components.
  •    
  • Each component has an enumerator it's referred with. The enum is an    index to an entity's component array.
  •    
  • Components in an entity generally point to a handler structure. The    component might have an iterable element in a tight array associated with    it, but this is not visible to the component's user - they access it    through a set of functions.
  •    
  • Components communicate mainly through events (callbacks).

Components are defined by creating an instance of a static-duration component_definition_t struct.

struct component_definition_t {
   int (*system_init)(world_t *world, uint32 num_components);
   void (*system_destroy)(world_t *world);
   void (*system_update)(world_t *world, float dt);
   component_handle_t (*component_attach)(entity_t *entity);
   void (*component_detach)(component_handle_t handle);
   entity_event_interest_t     *entity_event_interests;
   uint32                      num_entity_event_interests;
   component_event_interest_t  *component_event_interests;
   uint32                      num_component_event_interests;
   uint32                      index; /* Autofilled, initialize to 0! */
   component_event_callback_t  *component_event_callbacks; /* Autofilled, initialize to 0! */
};

So the component_definition_t structure is really just a set of callbacks. Components are also pooled, but the pools are members of world instances, hence not visible int he above example (the functions just accept a pointer to a world_t, as seen).

Using a component definition, components can be added to an entity and then manipulated.

component_handle_t entity_attach_component(entity_t *entity,
   component_definition_t *component_definition);

The component handle returned by entity_attach_component can be used to access the component. It could be laid out in memory in various ways - the API does not set restrictions on this, except that the handle must be a constant address pointer until destruction.

void mobility_component_set_speed(component_handle_t handle, float speed);

Component event callbacks are attached to the component definitions rather than individual components. This does away with some flexibility, but saves memory and likely performs better in the average case, since in MUTA, certain sets of components in a single entity type are very common (creatures have a certain set of components, players another, etc.) The callbacks get called mostly immediately when a component fires an event. An example use case of events would be animations: when the mobility component fires a "start move" event, the event can trigger the animation component to start playing a different animation.

Rewriting the client's rendering

This one's a pretty simple one. Tile depth sorting was moved to the GPU, and with the new entity system, entity rendering was also changed.

Previously, the world rendering system walked through each rendering component in the world every frame, looked up the entity's position from a separate memory address, then decided whether to cull it or not, and so on. In the new system, positions are cached in more CPU cache-friendly structures. For example, if a entity moves, an event is fired to it's rendering component, and the rendering component logic culls the entity and caches its position in an array of render commands. The array of render commands is iterated through every frame to draw all visible entities - render commands contain all the necessary data to place the entity's sprites properly on the screen.

Master server rewrite

The master server is the authoritative part of a single MUTA shard/world. It knows all the entities in the world and generates unique IDs for everything. Multiple simulation servers connect to the master server, each one of them simulating different parts of the world map.

While writing MUTA's proxy server (of which I also wrote my bachelor thesis), I feel like I finally "got" how I want to do multithreading with servers: handle all state on one thread, have other threads post events to that thread. The event loop works with a wait function akin to poll. Basically, an event-based approach.

Since then, I've been wanting to rewrite the rest of MUTA's server applications to use a similar architecture. To explain a little, the below table displays the programs that make up the server side software.

                                                                                                                                                                                                                                                       
ProgramDirectory name in repoEvent-based?
MasterserverNo
Simulation ServerworlddNo
Login Serverlogin-serverYes
Proxy ServerproxyYes
Old database serverdb-serverNo
World database serverworld_db (server_rewrite branch)Yes

Don't worry about the discrepansies in subproject naming conventions, I do have a plan for them now, believe it or not. It's just that the plan keeps on changing...

For the master server, an architecture change means that the main loop will no longer only run at a fixed rate: it will also be able to respond to events immediately, using blocking event queues. This is achieved with a structure akin to the pseudo-code example below.

int target_delta    = 17; /* Milliseconds */
int last_tick       = time_now();
int time_to_wait    = target_delta;
for (;;) {
   event_t events[64];
   int num_events = event_wait(events, 64, time_to_wait);
   for (int i = 0; i < num_events; ++i)
       _handle_event(&events[i]);
   int delta_time = time_now() - last_tick;
   if (delta_time < target_delta) {
       time_to_wait = target_delta - delta_time;
       continue;
   }
   update(delta_time);
}

Changing the architecture has meant a rather large amount of refactoring, as it affects nearly all systems on the master server. Since this has largely meant a complete rewrite, I have taken to also rewriting some of the systems into a mold I feel is better suited for the future. For example, the world/entity system that's used to control player characters, creatures and other game objects, is being completely written from scratch in the server_rewrite Git branch. The world API contains functions such as the ones below.

uint32 world_spawn_player(uint32 instance_id, client_t *client,
   player_guid_t id, const char *name, int race, int sex, int position[3], int direction);
void world_despawn_player(uint32 player_index);
int world_player_find_path(uint32 player_index, int x, int y, int z);
Calls such as the ones above are asynchronous in nature, as they involve the simulation server instances connected to the master server. Hence, I've been thinking of reworking them in such a way that they would accept a callback, "on_request_finished" (or whatever). That would be alright for code clarity, but then, that would involve some memory overhead. The alternative is to handle finished requests inside the world API itself, meaning it will have to call back to some other API's that called it. You know, I'm constantly pondering where the line of abstraction should lie: tight coupling isn't pretty, but abstraction often comes at a great programmatic resource cost. In the above case, there's little reason to create a datastructure for saving callbacks and their user data (void pointers) if there's really only one logical path the code can take when a response arrives. I try not to fall down the trap of "OOP" and "design patterns" just for the sake of such silly things, but at the same time, sometimes I have an engineer's urge to overengineer things. Usually I end up with the more practical, less abstraction-based approach. After all, I know every dark corner of my own program, or so I at the very least believe.

The rewrite has taken about two months now and I think it will still take some more time, partly because at the same time I must make changes to other programs in the server side stack as well. At the same time, new programs are coming in, such as the world database server. It will be interesting to see how things will work out when the server starts up again for the first time... Well, maybe frustrating is a more appropriate word.

World database server

The world database server is a new introduction to the server side stack. Previously, MUTA had a "db-server" application, but there was no separation between individual world/shard databases and account databases - now, that separation is coming.

The WDB is an intermediate program in between the MySQL server and the MUTA master server. It's sole purpose is to serve requests by master servers through a binary protocol while caching relevant results. The intention is that this is the de-facto way to access a single shard's database.

There's an asynchoronous API associated with the WDB. It's consists of a set of functions, each one of which performs a specific query. The query functions also take in callbacks as parameters - the callbacks are called when the query completes or fails.

wdbc_query_id_t wdbc_query_player_character_list(
   void (*on_complete)(wdbc_query_id_t query_id, void *user_data, int error,
       wdbc_player_character_t *characters, uint32 num_characters),
   void *user_data, account_guid_t account_id);

Not much else to say about it right now... It's event-based like the rest of the newer applications on the server side. Will keep working on it!

packetwriter2

Back when the MUTA project was started, Lommi wrote an application called MUTA_Packetwriter for network packet serialization. It compiles C code, structs and serialization functions, from a simple file format where the fields of each network packet are defined.

Lommi's tool has saved us countless of hours of writing arbitrary code and debugging it, but now that he is no longer working on the project, and there are many new packets making their way into the protocol with new applications such as the world database coming, I've deemed it necessary to write a new version of this program.

packetwriter2 will use a new, simpler file format, the ".def" format used in many of MUTA's data files. It will support features I've wanted for a long time, such as arrays of structs and nested structs. I've started writing the parser, and below is an example of the file format.

include: types.h
include: common_defs.h

group: twdbmsg
   first_opcode = 0

group: fwdbmsg
   first_opcode = 0

struct: wdbmsg_player_character_t
   query_id    = uint32
   id          = uint64
   name        = int8{MIN_CHARACTER_NAME_LEN, MAX_CHARACTER_NAME_LEN}
   race        = int (0 - 255)
   sex         = int (0 - 1)
   instance_id = uint32
   x           = int32
   y           = int32
   z           = int8

packet: fwdbmsg_reply_query_player_character_list
   __group     = fwdbmsg
   query_id    = uint32
   characters  = wdbmsg_player_character_t{MAX_CHARACTERS_PER_ACC}

Along the way, I think the encryption scheme needs a rework, too. Not the basic algorithms behind it (MUTA uses libsodium for that), but the fact that currently, messages are encrypted on a per-message-type basis. Being able to turn encryption on and off in the stream would save bandwith and improve performance, as multiple messages could be encrypted in a single set.

Keepin' busy

Honestly, I've been having a tough time scraping up enough time to work on MUTA after starting my current job. Also, turns out motivating myself is difficult if I don't constantly have something to prove to someone (you know, like progress reports to people you know in real life and stuff). Guess I should go and get one of those productivity self-help books soon.

A generic hashtable in C

2.3.2019 23:58:15

When ever I've implemented hashtables in C in the past, the way I've made them "generic" has been to create macros that declare new types and sets of functions for the tables wanted. For example, taken from the MUTA source code:

#define DYNAMIC_HASH_TABLE_DEFINITION(table_name, value_type, key_type, \
   hash_type, hash_func, bucket_sz) \
   ... \

...

DYNAMIC_HASH_TABLE_DEFINITION(str_int_table, int, char *, uint32,
   fnv_hash32_from_str, 4);

The above macro would declare a new type, str_int_table_t, and functions such as str_int_table_init(), str_int_table_insert() and str_int_table_erase().

With such macros, it's a bother to have to call the macro before using any new type of a hashtable. However, for example the stb library implements easier-to-use, generic dynamic arrays with macro trickery, so why not try the same with hashtables?

Turns out working with a more complex structure like a hashtable takes a little more work than dynamic arrays. We cannot automatically create versions of a single function for different data types in C like we could with C++ templates, so the main problem is passing compile-time data such as struct member types and sizes to functions. And we do want to use functions because we will need complex statements such as loops - such statements do not (very cleanly at least) fit inside plain macros.

The above paragraph might not have made immediate sense, so I'll attempt to clarify my point in practice. Below is the new hashtable declaration macro.

#define hashtable(key_type, value_type) \
   struct { \
       struct { \
           struct { \
               key_type    key; \
               value_type  value; \
               size_t      hash; \
               uint8_t     reserved; \
           } items[HASHTABLE_BUCKET_SIZE]; \
       } *buckets; \
       size_t num_buckets; \
       size_t num_values; \
   }

Below is how it's used.

/* Declare a hashtable called 'table' whose key type is char * and value type
* is int. */
hashtable(char *, int) table;

As may be observed, the type of the resulting struct is anonymous, meaning it cannot even be passed to a struct as a void * and then cast back to its original type. So how do we do what we need to do? Below is the hashtable_insert_ext() macro and the signature of the _hashtable_insert() function the macro calls.

#define hashtable_insert_ext(table_, pair_, compare_keys_func_, \
   copy_key_func_, ret_err_) \
   (table_.buckets = _hashtable_insert((ret_err_), \
       (uint8_t*)table_.buckets, &table_.num_buckets, &table_.num_values,        \
       sizeof(*table_.buckets), sizeof(table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0], \
           &table_.buckets[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].key, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].value, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].hash, \
           &table_.buckets[0].items[0]), \
       _hashtable_ptr_offset(&table_.buckets[0].items[0].reserved, \
           &table_.buckets[0].items[0]), \
       &pair_.key, sizeof(pair_.key), pair_.hash, &pair_.value, \
       sizeof(pair_.value), compare_keys_func_, copy_key_func_))

void *_hashtable_insert(int *ret_err, uint8_t *buckets, size_t *num_buckets,
   size_t *num_values, size_t bucket_size, size_t item_size,
   size_t items_offset, size_t key_off1, size_t value_off1, size_t    hash_off1,
   size_t reserved_off1, void *key, size_t key_size, size_t hash, void    *value,
   size_t value_size,
   int (*compare_keys)(const void *a, const void *b, size_t size),
   int (*copy_key)(void *dst, const void *src, size_t size));

Yep, instead of passing in a struct to _hashtable_insert(), we pass pointers to each of the struct's relevant members. Sizes of some data types are also passed, and finally, struct offsets using the macro _hashtable_ptr_offset(). That allows us to modify everything at the byte level, which we do inside _hashtable_insert(). As a side note, the function callbacks passed are there to allow for dynamic types such as copyable strings.

One may wonder if this is an efficient way to do things. I would say it probably isn't, not in terms of runtime efficiency anyway. But in terms of programmer efficiency, I think it might be the right choice until I come across a better solution. That's because with the this implementation, doing things such as the following is super easy compared to what I used to have to do.

hashtable(uint32_t, int) table;
hashtable_init(table, 8, 0);

struct {
   uint32_t    key;
   int         value;
   size_t      hash;
} insert_data = {123, 5, hash_u32(123)};
hashtable_insert(table, insert_data);

struct {
   uint32_t    key;
   size_t      hash;
} find_data = {123, hash_u32(123)};
int *value = hashtable_find(table, find_data);
if (value) {
   ...
}

Okay, maybe it isn't the most intuitive API ever, but I like it far more than having to write type/function declaration macros like I used to. And for my hashtable use-cases, I'm willing to pay for the potential runtime-overhead.

Also, I didn't actually have a hashtable that stored the actual keys in MUTA before. While not storing keys is fine for assets and similar items to save on memory, for runtime-generated ID's a table like this will be required.

I've been working on this over the weekend and it likely still has some bugs in it, but in case you're curious, the Git repository can be found here. I've also uploaded the current header and source files here and here, respectively. I'll keep on working on it and start using it in MUTA very soon.

Edit 3.3.2019
After a night of thinking I decided to the change the API so that it would not require special structs created by the user. The reason I used them originally was an attempt to force certain type-safety, namely preventing accidental pointer key casting to void * (since that can happen implicitly in C). An example of such a mistake would be when you try to pass in the address of a char *, but accidentally pass in the pointer's value rather than it's address.

char *key = ...;
hashtable_insert(
   table,
   key, // Should be &key but no error is generated due to implicit cast to void * inside the macro
   &value,
   NULL );

I concluded that similar "type-safety" could be achieved by having the user pass references to variables directly instead of their addresses. The macros will instead fetch the addresses and we should get a compilation error when certain mistakes happen. For example, if the user passes in a key as '&key', the macro would format it to '&&key', which would result in an error. The API change should reduce the amount of lines required to achieve the same result, and I think could also prove out to be more clear. Below is an example of the new API in use.

hashtable(uint64_t, uint32_t) table;
...
uint64_t    key     = 12345;
uint32_t    value   = 54321;
size_t      hash    = hashtable_hash(&key, sizeof(key));
hashtable_insert(table, key, hash, value, NULL);

I've also uploaded the newer source files here and here.

Sorting isometric sprites on the GPU

27.2.2019 15:05:30

I've been reworking much of the MUTA client for the first two months of 2019. One of the reworks has been to rendering, and amongst other changes to it, rendering now uses hardware sorting for sprites to draw 2D tiles from back to front.

Anyone who's put a little thought into isometric 2D games knows the draw order of tiles and objects is a little bit tricky. I love the isometric look and have worked on a bunch of isometric 2D game prototypes, but in all of them before this, I've done sprite sorting on the CPU. Not so this time.

Starting out, as any modern developer would, I searched the internet for the formula to correctly calculate each tile's depth based on it's position in the visible part of the map. Surprisingly, I didn't find it. Guess isometric 2D games aren't so hot anymore. Anyway, that's kind of the reason I'm making this post.

Here's a descriptive picture of the hypothetical area we are rendering.

I first attempted to calculate the depth based on the tile's distance from the camera.

int area_width  = 3;
int area_height = 2;

for (int i = 0; i < area_width; ++i)
   for (int j = 0; j < area_width; ++j)
       for (int k = 0; k < area_height; ++k)
           float z = (float)(i + j + k) / (float)(area_width + area_width + area_height);

This approach works fine if all objects are of the same height, that is, the height of a single tile. But we run into problems when that isn't the case.

Above, the character is being clipped by tiles that are higher than him on the map when the tiles should actually be rendering behind him.

A different forumula does the job better.

int area_width  = 3;
int area_height = 2;

for (int i = 0; i < area_width; ++i)
   for (int j = 0; j < area_width; ++j)
       for (int k = 0; k < area_height; ++k) {
           float row_max   = (float)(i + j + 2) / (float)(area_width + area_width);
           float row_base  = (float)(i + j) / (float)(area_width + area_width);
           float row_diff  = row_max - row_base;
           float z         = row_base + row_diff * ((float)k / (float)area_height);
       }

And the result looks like follows.

The idea is that since the z value must be between zero and 1, we need a maximum value. The render order we need is that the closer to the camera a "pillar" of tiles is (a pillar being a stack area_height of tiles at an x-y coordinate), the later it gets rendered. How ever, we also need to have depth values for the individual tiles that constitute a pillar so that the higher a tile is, the later it gets rendered. So, we have to find the depth value of the highest tile in the pillar and gradually increase each tile's depth value, coming from bottom to top, until we reach that maximum.

Anyway, thought I'd throw that out there.

Own your tech: on the Unity-Improbable incident

15.1.2019 07:32:31

In the past week there's been a debacle between the game technology companies Improbable and Unity Technologies. Improbable, as you probably know, produces a set of software called SpatialOS, apparently intended to make the creation of scalable massively multiplayer online worlds easier. The software is aimed at the users of the major contemporary game engines and, if I'm not mistaken, also has support for custom engines.

As I understand it, the way Unity came in the way of Improbable was by changing their Terms of Service. Currently, it seems that Unity is planning on its own cloud-gaming service, and after failed negotiations with Improbable, decided to cut out the competition via legal means.

There's been a lot of talk about how this change will affect the companies already using Improbable's technology with Unity, many of which are small independent companies. Epic has claimedly used the incident to its advantage, funding the transition of Unity games to Unreal Engine by millions of dollars. It is still unclear to me whether or not existing games or games in development will be allowed to operate with Unity and SpatialOS, but if not, the effects will be devastating for the smaller companies.

The main point to take home from this? In my view, it is that you should always own your technology, or use free and open source software. The IT industry has this obsession today with "saving" development time by buying all the technological parts of products from different sources. But not only does this lead to poorer software due to the commercial parts not being specialized for the types of products being made, but as this incident shows, using    proprietary third party parts can lead to serious legal issues.

While I retain my opinion that all of the best games ever made were written using custom engines tailored specifically for them, if I was to use an existing game engine, I would use a free and open source one, such as Godot, which has been making great strides in the past few years and gained quite a bit of popularity. It takes tech of your own or an open license to ensure no catastrophy such as this one caused by greedy companies fighting each other can ever happen to your game.

Hello proxy

24.11.2018 16:17:03

This just in: MUTA shards can now only be connected to through the MUTA proxy server (still in the login-branch of the Git repository). Pictured is the first play session through a proxy, the server applications running in the terminals above.

Yep, still the same old unfinished developer graphics though.

Buy nothing day

23.11.2018 07:21:24

It's that time of the year again: the year's biggest celebration of consumerism, Christmas, is looming over. But even here in Finland it's not just Christmas anymore that makes people go and mindlessly buy things they don't need; no, we need to start the wasteful buying of things early before December. The American phenomenon of Black Friday has made it's way over here, machined by the merchants who know well how to exploit the generations that grew up on the internet, learning everything about all the cool American "traditions".

If you're about to take part in the consumption mania of today, I would ask you to consider the following: make it a buy nothing day instead. At the very least, ask yourself the following questions.

  • Do you really need the thing you thought of buying today? Would you buy it even if it wasn't on sale? If you have an item similar to what you're about to buy, still in good condition, do you really need a new one this soon?
  • Consider what you might do with the money saved if you didn't spontaneously spend it on things just because they were on sale for one day (or in the case of Finland, for a whole week!)
  • Think about what the impact of overconsuming things such as electronics is to the environment.

Happy buy nothing day.

Event-based server architecture

22.11.2018 12:10:49

What's the right way to do multithreading? There's not one single answer to this as every program has it's specific needs and purposes. But on a per-program basis, one approach will probably be better than another.

Game developers not working on high-end engines rarely have to worry much about multithreading. Even if a game is multithreaded, at the core most games' logic is single-threaded.

For instance, if your game has a dedicated rendering thread, your previous frame might be getting rendered while you simulate the current one on the main thread. But what you're doing is you're just replicating the relevant data from the main thread for use by the render thread, thus not affecting the actual game logic, which remains single-threaded.

For another example, your physics system might parallelize a for-loop to multiple threads, all of which are synchronized at the end of the loop. This also does not change your game's logic from single- to multithreaded, you're just iterating some loops faster by parallelizing them.

There's a reason the above two examples are common in games, and it is that they are easy to program for, while getting some of the benefits of dividing work to multiple threads. When all of the real logic happens on a single thread of execution, the control flow is predictable and easy to understand. It might also be more efficient than a truly multithreaded alternative, in case the multithreading is done badly, or the algorithm is not suitable for threading.

Networking and threads

While you can well write a game using just a single thread, writing network code without threads is more difficult. This is because in network programming many blocking calls are used, such as poll() or recvfrom(). Usually, we want a thread to sit there and wait until network messages or new connections arrive. But a sleeping thread cannot do much else, so the other logic must happen on another thread.

It is usually possible to give blocking functions a timeout, so that even if they have no new events to report to us, they will return after a given amount of time. In this way, one can imagine a game-loop-like server architecture that polls for messages and new connections, all the while also executing all of the logic on the same thread as well. With a timed wait, we could have the loop run at approximately, say, 60 FPS.

But servers that want scalability probably don't run on a single thread. I'm no expert of course, so take this with a grain of salt, but so my investigations on the internet suggest. It seems common to have threads dedicated simply to polling for new network events.

State makes things difficult

If the only job of one of our threads, the network thread, is to sleep on poll(), epoll() or similar call and wait for new incoming messages and connections, the logic of the program is handled somewhere else. Instead of interpreting incoming data right there and then on the network thread as it arrives, the interpretation happens on a different thread.

Applications such as game servers use long-lasting connections instead of quickly terminated ones. For long connections, the server has to store some kind of per-client state: the socket of the client, a session id, incomplete messages, game character variables, etc. After acknowledging we have a lot of state, the question becomes: where's the right place to manipulate it, considering our program is multithreaded?

To store the state of our clients we might have a struct called client_t. We also have some kind of a container that contains those structs. Lets say the structs are stored in a pool of a fixed size.

typedef struct client_t
{
   int     socket;
   uint32  session_id;
   uint8   unread_messages[512];
   int     num_bytes_unread;
} client_t;

client_pool_t clients;

/* Returns NULL if no free clients are left in the pool */
client_t *new_client(client_pool_t *pool);
void free_client(client_pool_t *pool, client_t *client);

With the above premise, let's assume our network thread accepts clients and reserves client_t structs for them. If the clients disconnect (recv() returns 0), the network thread frees the client struct.

Now, if all of our connections need to be stored in our global client_pool_t data structure, how do we access it safely from multiple threads? Remember that the network thread does not handle any of the actual (game) logic in the program: that happens elsewhere, and that logic also touches the client structures as it must send messages to the clients or interpret their messages stored in the unread_messages buffer, etc.

The naive answer to safe access is to use a mutex. Wrap the new_client() and free_client() calls to a mutex and you're fine. Now when the network thread accepts a new client, it can safely call new_client() and then somehow signal the main thread that a new client was accepted, passing a pointer to the newly allocated struct along.

There are two problems with this approach: the possible performance concerns, and the (in my opinion) more important topic of code complexity. These problems are somewhat intertwined.

What kind of problems might the above approach lead to? Say that at some point in development you realize that there are situations where, on the main thread, you have to iterate through all of your clients. For that, you might create an array of pointers to the currently used client structs. So now, when you accept a new client and reserve a new client struct from the pool, you have to also push a pointer to the struct to said array.

But now, what might happen is that while the main thread iterates through all of the clients, one of the clients disconnects. In this case the networking thread frees the client struct of the disconnected client. Since we don't want this to happen while we're iterating through the clients, we have to lock the pool's mutex in both cases: while iterating, and while deleting a client.

The possible performance implications of the above are easy understand: the iteration can leave the mutex locked for a while, meaning the networking thread gets blocked until the iteration finishes.

But more importantly, using the above approach we are forced to constantly remind our selves of the fact we can't just go doing anything to our clients on the main thread without remembering the mutex and the possible race conditions. As we add features, the code becomes more and more complicated, and easier to make a mistake at. The above was a simple and somewhat dumb example. For another example, consider what should happen if the networking thread writes recvd() data into a client's unread_messages buffer, and the main thread handles that data. Are you going to use a per-client mutex to protect access to the data buffer while it's getting written to or read from? That's gonna be some more complex code using locks.

To some extent I'm speaking from experience here. When I started writing the MUTA server, I had little clue about how to write a complex server capable of handling a high amount of clients. I'm not sure if I still have the right idea of that, but I know that multithreading was a pain in my ass for a long while, and to some extent it remains so, although I think it's gettin getter.

Handling state on a single thread

So far I've found that the simplest way of dealing with the above woes is to be simple and handle all state changes on on a single thread. No allocations of stateful data structures or otherwise manipulating them from the networking thread or anywhere else. I've referred to this as event-based architecture, but I'm not sure if that's the correct term; all I know is that I've heard the term elsewhere and assumed this was what they meant. Then again, who cares what it's called.

In this model, when a network thread receives a message, it will not handle the message immediately. Instead it will pass it on to another thread as an event. The thread that handles the events (usually the main thread) can sleep on a blocking function from which it wakes when new events are posted from another thread.

Essentially, both the main and network thread will sleep in this model until something relevant to them happens. Of course, you can have more than one thread pushing events to the main thread though, and you can also have the main thread posting events to other threads. MUTA's login server has a database thread with which the main thread communicates in a two-way fashion, posting and receiving events from it.

Current implementation in MUTA

MUTA's server consists of multiple programs now: the login server, the shard server, the world simulation server, the database server and the proxy server. The two newest applications, the login and proxy servers, implement the architecture described in this post.

The sleeping and waking of a thread is implemented using a condition variable that sleeps on a mutex. The same mutex is locked when items are pushed to the event queue.

The event queue is also a blocking queue. It has a fixed size, and if it fills up, meaning the thread pushing events is pushing them faster than the main thread can handle them, the push function will sleep until space becomes free. This is also implemented using a condition variable.

The event data type of the MUTA login server looks as follow at the moment.

enum event_type
{
   EVENT_ACCEPT_CLIENT,
   EVENT_READ_CLIENT,
   EVENT_ACCOUNT_LOGIN_QUERY_FINISHED,
   EVENT_ACCEPT_SHARD,
   EVENT_READ_SHARD,
   EVENT_LOGIN_REQUEST_RESULT
};

... various event structures ...

struct event_t
{
   union
   {
       int                                     type;
       accept_client_event_t                   accept_client;
       read_client_event_t                     read_client;
       account_login_query_finished_event_t    account_login_query_finished;
       accept_shard_event_t                    accept_shard;
       read_shard_event_t                      read_shard;
   };
};

As an example event, we can take the accept_client event, which is fired when the network thread accept()s a new connection. It looks as follows.

struct accept_client_event_t
{
   int         type;
   socket_t    socket;
   addr_t      address;
};

As we can see, the event data type is a union. That's because they all get posted into the same array. But what about events that have to transfer variable amounts of data from thread to thread, such as socket recvs() (that is, packets received from the network)? In such cases, we use a simply mutex-locked allocation pool. Lock the allocator's mutex, allocate data, unlock the mutex. At read, we free the allocated memory back to the allocator in a similar fashion.

The allocator used here is a simple segregate fit -type of allocator. The logic is this: on malloc-requests, round up the requested amount to the nearest power of two. Find out the highest (leftmost) bit of the amount and use it as an index into an internal pool of memory blocks of the given size.

While the allocator calls real malloc when it needs more memory, it has an upper capacity to which it will allocate, since the event queue is blocking. Thus it is not possible to DDOS the server to run out of memory by simply spamming packets at it.

The event API itself looks as follows.

typedef struct event_buf_t event_buf_t;

void event_init(event_buf_t *buf, uint32 item_size, int32 max);
void event_push(event_buf_t *buf, void *evs, int32 num);
int event_wait(event_buf_t *buf, void *evs, int32 max, int timeout_ms);
/* -1 as timeout means infinite */

And, using the API, the login server's main loop looks as follows now.

for (;;)
{
   event_t events[MAX_EVENTS];
   int num_events = event_wait(event_buf, _events, MAX_EVENTS, 500);
   for (int i = 0; i < num_events; ++i)
   {
       event_t *e = &_events[i];
       switch (e->type)
       {
       case EVENT_ACCEPT_CLIENT:
           cl_accept(&e->accept_client);
           break;
       case EVENT_READ_CLIENT:
           cl_read(&e->read_client);
           break;
       case EVENT_ACCOUNT_LOGIN_QUERY_FINISHED:
           cl_finish_login_attempt(&e->account_login_query_finished);
           break;
       case EVENT_ACCEPT_SHARD:
           shards_accept(&e->accept_shard);
           break;
       case EVENT_READ_SHARD:
           shards_read(&e->read_shard);
           break;
       default:
           muta_assert(0);
       }
   }
   cl_check_timeouts();
   shards_flush(); // Flush queued messages to shards
   cl_flush();     // Flush queued messages to clients
}

And so, most logic is handled on a single thread.

Converting other systems to the same approach

The MUTA server's older parts are not implemented using the model discussed in this post. Although they also handle most logic on a single thread based on events generated by other threads, the model they use is not quite so refined. Instead it is kind of an ad-hoc implementation of the same idea, so instead of having a single event buffer, there are many. These buffers are not blocking like in the case of the login server how ever.

The old parts of the server are also frame-based, in that their main thread runs the logic in specified increments of time (FPS, frames per second). With an event queue the thread can sleep on we can remove some of the latency provided by the frame-based approach. To make timers based on delta-time still function correctly, we can simply calculate a sleep-time for the event_wait() function much like we would sleep during a frame if we were waiting for vsync in 3D program.

I'll be looking at converting the old programs to this model at some point in the distant future, unless I come across something better.

Page: 0 1 2