Blacklist

In most network applications, managing incoming flow is an important thing, and is a quite hard thing to set up. In case your algorithm is too restrictive, you will drop too much connection, and in case it’s too permissive, you will accept undesired connections. The real need is to tell your application: « Accept N connection(s) in a X second(s) time range ».

Concept

The way you should decide if a connection have to be dropped or not is looking in an historic of X second(s) how many connection(s) from an IP have been performed, and then deducing the count. This is the « simple » algorithm that does that:


class Blacklist {
private:
    qint32 m_maxConnectCount;
    quint32 m_historyRetention;

    QHash < quint32, QQueue < quint32 > > m_history;

    bool isBlacklisted(quint32 addr) {
        quint32 curTs = QDateTime::currentDateTime().toTime_t();

        QQueue < quint32 > & addrHistory = m_history[addr];
        while(!addrHistory.isEmpty() && (curTs - addrHistory.head()) > m_historyRetention)
            addrHistory.dequeue();

        bool blacklisted = addrHistory.count() > m_maxConnectCount;
        if(blacklisted) addrHistory.dequeue();

        m_history[addr].enqueue(curTs);
        return blacklisted;
    }

public:
    Blacklist(qint32 maxConnectCount, quint32 historyRetention) {
        m_maxConnectCount = maxConnectCount;
        m_historyRetention = historyRetention;
    }

    bool isBlacklisted(const QHostAddress & addr) {
        if(m_maxConnectCount > 0) {
            switch(addr.protocol()) {
            case QAbstractSocket::IPv4Protocol:
                return isBlacklisted(addr.toIPv4Address());
            case QAbstractSocket::IPv6Protocol:
                break;
            case QAbstractSocket::UnknownNetworkLayerProtocol:
            default:
                return true;
            }

            // ipv6
            union { const quint8 * b; const quint32 * d; } ipv6;
            ipv6.b = (quint8 *)&(addr.toIPv6Address().c);
            return isBlacklisted(ipv6.d[0] ^ ipv6.d[1] ^ ipv6.d[2] ^ ipv6.d[3]);
        }

        return false;
    }
};

All is done in this function « bool isBlacklisted(quint32 addr); ».

Explanations

This function will returns true of false in case this connection should be drop or not. This is simple concept, we have to lookup the historic associated with this connection using the associative array « m_history », pruning out of scope values (if you’re telling your application to drop if an amount of 10 connections have been performed in 10 seconds, historic values that are older than 10 seconds are just out of scope), and then just count how many connections happens in this historic. If this amount is greater than X, we have to drop, else we have to accept. Obviously, we have to take in count the current connection, even if we choose to drop it.

The main issue of this algorithm is you have to keep a full historic, which can lead into an huge amount of memory usage if you defined a large scope (10000 connections in 10 minutes). Moreover the Qt queue (QQueue) is a slow mechanism cause it has to strafe the whole list of value when calling QQueue::dequeue() (remove [0], and while there are value, move his index from i to i – 1). Thanks to fzort for this erratum :)

Which kind of algorithm could leads to a low memory usage and a low CPU usage ?

Solution: average

Average sounds like an like notion we have all learn at elementary school, so how could it solve such a difficult issue ? Well, let’s explain this little trick: If we want to compute the average of the set (12, 13, 14), we have to sum all of those values, and divide by the size of the set. In that case, 13, well. But now we want to take in count a new value, let say 17, in this set without having to reconsider the whole set. We have to know only two thing: the average, and the count of value this average have been computed with: ((13 * 3) + 17) / 4. So more mathematically : ((average * count) + newVal) / (count + 1).

How can this stuff be useful for our issue ?

Well, let’s consider we will now keep in the historic the average time the last N connections have been performed, and just drop if the difference between this average and the current time stamp is greater the X value (the amount of seconds we want to consider a network flood). Here is the new algorithm:

class AverageBlacklist {
private:
    struct AverageHistory {
        quint64 m_averageTime;
        qint32 m_currentCount;

        AverageHistory() {
            reset();
        }

        bool isAverageReached(quint32 ts, qint32 maxConnectCount, quint32 historyRetention) {
            // daylightsaving or time change, or simply no communication since a long time
            if(ts < m_averageTime || ts > (m_averageTime + (historyRetention * 2)))
                reset();

            m_averageTime = ((m_averageTime * m_currentCount) + ts) / (m_currentCount + 1);
            if(m_currentCount >= maxConnectCount)
                return ts <= (m_averageTime + historyRetention);

            m_currentCount++;
            return false;
        }

        void reset() {
            m_averageTime = 0;
            m_currentCount = 0;
        }
    };

    qint32 m_maxConnectCount;
    quint32 m_historyRetention;

    QHash < quint32, AverageHistory > m_averageHash;

    bool isBlacklisted(quint32 addr) {
        return m_averageHash[addr].isAverageReached(QDateTime::currentDateTime().toTime_t(),
                                                    m_maxConnectCount, m_historyRetention);
    }

public:
    AverageBlacklist(qint32 maxConnectCount, quint32 historyRetention) {
        m_maxConnectCount = maxConnectCount;
        m_historyRetention = historyRetention;
    }

    bool isBlacklisted(const QHostAddress & addr) {
        if(m_maxConnectCount > 0) {
            switch(addr.protocol()) {
            case QAbstractSocket::IPv4Protocol:
                return isBlacklisted(addr.toIPv4Address());
            case QAbstractSocket::IPv6Protocol:
                break;
            case QAbstractSocket::UnknownNetworkLayerProtocol:
            default:
                return true;
            }

            // ipv6
            union { const quint8 * b; const quint32 * d; } ipv6;
            ipv6.b = (quint8 *)&(addr.toIPv6Address().c);
            return isBlacklisted(ipv6.d[0] ^ ipv6.d[1] ^ ipv6.d[2] ^ ipv6.d[3]);
        }

        return false;
    }
};

I tested those two algorithms in competition to see if they are rendering the same behavior, and it seems yes. Except in case we’re performing EXACTLY one connection each seconds on a 10/10 configuration (10 connections in 10 seconds), where it seems to be a little bit confusing, but in a real life system, it has exactly the same efficiency.

Feel free to post any feedback on this algorithm, or debating on this topic with me, via private message or comments.

, , , , , , ,
Trackback

3 comments untill now

  1. Nice article, but this sounded a bit strange:

    « Moreover the Qt queue (QQueue) is a slow mechanism cause it has to strafe the whole list of value when calling QQueue::dequeue() (remove [0], and while there are value, move his index from i to i – 1). »

    QQueue is implemented as a linked list. Dequeueing is a constant-time operation; there’s no data copying involved.

  2. Indeed, I’m gonna strike that part :)

  3. softlion @ 2012-01-31 11:28

    Did you reinvented the token bucket algorithm ?
    http://en.wikipedia.org/wiki/Token_bucket

Add your comment now

CommentLuv badge