Jeesdev

Async MySQL C API usage. Also, bugs.

Kuvis

16.3.2018 10:02:26

I believe wrote in an earlier post that I had lately been working on a (proprietary) database server with MariaDB. I have little experience with database programming, so I've had to look a thing or two up on the internet and documentation. What I'll write about here is, I'm sure, very old news to anyone who's done work in this area before, but to me it was fresh knowledge.

So one thing I noticed about the MySQL C API, which MariaDB also uses, as I expect anyone who starts working with it would notice quickly, is that it isn't exactly asynchronous. mysql_query() is a blocking call that will connect to the server, send out the query and then wait for the reply, all in one function.

I didn't want to use the API like this if I didn't have to, and thanks to the internet, finding a solution wasn't too time-consuming. That's thanks to Jan Kneschke who has written about a fix right here about ten years ago.

As it turns out, the MySQL C API has a bunch of less documented functions available through the headers, namely 'mysql_send_query()' and 'mysql_read_query_result()'. With these functions available one can:

Initialize multiple MYSQL instances (create multiple connections),
Register each MYSQL struct's socket file descriptor with an event interface, such as Linux's epoll,
Send a bunch of queries, each through a different connection using mysql_send_query(),
Wait on the event interface until an event happens on one of the file descriptors, then call mysql_read_result().

The big thing here is opening up multiple MySQL connections and monitoring them all with a select()-like interface. In (more or less pseudo) code on Linux:

#define NUM_MYSQLS 32

MYSQL mysqls[NUM_MYSQLS];
int epoll_fd = epoll_create(1);

for (int i = 0; i < NUM_MYSQLS; ++i) {
    mysql_init(&mysqls[i]);
    mysql_real_connect(db, "localhost", "admin", "password",
    "my_database",
        3306, 0, 0);

    /* Make non-blocking */
    int flags = fcntl(mysqls[i].net.fd, F_GETFL, 0);
    fcntl(mysqls[i].net.fd, flags | O_NONBLOCK);

    struct epoll_event ev;
    ev.events   = EPOLLIN | EPOLLRDHUP | EPOLLHUP;
    ev.data.ptr = &mysqls[i];
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, mysqls[i].net.fd, &ev);
}

/* Start some queries */

const char *query = "SELECT * FROM my_table";

for (int i = 0; i < NUM_MYSQLS; ++i)
    mysql_send_query(mysql_send_query(&mysqls[i], query, strlen(query)));

/* Wait for replies to arrive */

struct epoll_event evs[NUM_MYSQLS];

int num_evs = epoll_wait(epoll_fd, evs, NUM_MYSQLS, -1);

for (int i = 0; i < num_evs; ++i) {
    MYSQL *mysql = evs[i].data.ptr;
    int err = mysql_read_query_result(mysql);

    if (err) {
        /* Handle error */
        continue;
    }

    MYSQL_RES *res = mysql_store_result(mysql);
    /* Handle result */

    mysql_free_result(res);
}

Now I thought that was pretty handy. On Windows, I expect you could get similar behaviour using select(). This approach has worked well enough in my current project so far, although obviously things aren't quite as simple as in the example.

The edge-triggered listening socket incident

I'm by no means very experienced in asynchornous programming (or any specific area of programming for that matter). So, writing a server program last week I ran into an interesting bug with a listening socket tracked by an epoll instance, which was really caused by me not thinking of the flags I passed to epoll properly.

The issue was this: a thread was sleeping on an epoll_wait() call until either one of the client's would send something, or a listening socket would receive a new connection that needed to be accept()-ed. When an epoll event on the listening socket fired (a new connection arrived), a notification would be passed to another thread in the program, which would then call the accept() function. An important bit of informatoin is that there was an atomic counter for how many times accept() had to be called. Every time an epoll event was generated by the listening socket, that counter would be incremented, and it was by this counter the other thread would then call "accept()" when notified - a counter of 10 meant you had to call accept 10 times.

But the accept queue on the other thread was being starved - clients constantly had to wait for long periods as the server would simply only accept one client at a time after having been running for about 15 minutes. Obviously there was a problem somewhere.

The solution: don't use the EPOLLET flag when adding the listening socket to the epoll instance. See, when the EPOLLET (ET for edge-triggered) flag is defined for a file descriptor as it is registered with epoll, events will only fire if there were no previous unhandled events. So if accept() wasn't called immediately after an event and another connection arrived, no new event would be generated for the second new connection. Thus, our counter for how many accepts had to be called would not work

Eventually I added another counter measure, which was to call accept() on the epoll thread immediately after an event and then pass the resulting file descriptor on to the other thread. This way, the listening socket's backlog shouldn't fill up before accept is called on another thread even under high load.

Thank goodness for AddressSanitizer

Async bugs aren't the only bugs I've been fighting with this week, although this one was also obscure due to things having happened on multiple threads. Just two nights ago I tracked a heap corruption bug I thought was either happening because of OpenAL or the code calling it (which I didn't write) for multiple hours of work time.

But compiling with gcc's -fsanitize=address flag, I realized it was just my own old code, in a completely different place. Oops! Good thing there are tools for easily detecting problems such as this nowadays!

We fulfill power fantasies

Async MySQL C API usage. Also, bugs.

The edge-triggered listening socket incident

Thank goodness for AddressSanitizer