Walletfox.com

QList vs QSet - containment testing with QBENCHMARK


This article compares between QSet and QList for the method contains(). It examines the performance of contains() for different sizes of containers, namely different orders of magnitude of the container sizes (1, 10, 100, 1000, etc.). The performance is examined with the help of the QBENCHMARK macro from QTestLib.

Expected behavior

We expect QList to have a O(n) complexity for containment testing and QSet potential O(1) complexity for containment testing.

Searching through an unsorted sequential container such as QList requires examing every single element and results in a runtime complexity O(n). On the other hand, QSet is a hash-table based container. Unlike sequential containers, sets do not keep elements in the order they were inserted. This has some advantages. For example, containment testing is faster for sets with potential O(1) complexity, as we do not have to perform a linear search.

Sets have this capability thanks to a technique called hashing. A hash function computes an integer value (hash code) from an object. A hash function does not necessarily produce a unique hash code for every object. Normally, it maps a key to a bucket which may contain one or more elements (the number of these buckets can be exposed with QSet::capacity()). The container then iterates over the contents of the bucket until it finds a match. In the best case, with either no collisions and all buckets either empty or with a single element, checking for containment takes constant O(1) time.

Testing with QBENCHMARK

We are going to measure the performance of our code with QBENCHMARK, which is an extension to the QTestLib framework. To measure the performance of code all we need to do is to place it into curly brackets preceded by the macro QBENCHMARK. The performance can be measured based on walltime (default), CPU tick counter, valgrind/callgrind or an event counter. In this example, we use the default method based on walltime.

Our aim is to identify the size of the container for which containment testing could affect the performance of an application in such a way that it is noticeable to the user. To this end, we create containers with sizes that represent different orders of magnitude such as 1, 10, 100, 1000, 10000, 100000 and 1000000.

Implementation specifics

In order to perform the test, we create a class called TestContainer. The test itself will be executed by the private slot testContainment(). We perform the test with the use of a data set, which is nothing else than a table of container types (QSet, QList) and the different sizes of containers (1, 10, 100, 1000, 10000, 100000 and 1000000) that we would like to test. The data set is created by the method testContainment_data(). The data method has to have the same name as the test method appended by '_data()'. We also introduce an enum ContainerType to distinguish among different container types.

class TestContainer : public QObject
{
    Q_OBJECT
public:
    enum ContainerType {
        Container_Set,
        Container_List
    };

private slots:
    void testContainment_data();
    void testContainment();
};

Q_DECLARE_METATYPE(TestContainer::ContainerType)

Now let's look at our data method testContainment_data() that constructs the test table. We use the method addColumn() to add a column for the container type and the container size. The for loop creates the individual test rows with QTest::newRow(). The name of the test row, i.e. the argument in the parentheses is created from the name of the container and the size of the container.

void TestContainer::testContainment_data(){
    QTest::addColumn("type");
    QTest::addColumn("size");

    for (int size = 1; size <= 1000000; size *= 10) {

        const QByteArray sizeString = QByteArray::number(size);

        QTest::newRow(QByteArray("QSet" + sizeString).constData())
                << Container_Set << size;
        QTest::newRow(QByteArray("QList" + sizeString).constData())
                << Container_List << size;
    }
}

The data method pretty much creates the table below:

type size
QSet1 Container_Set 1
QList1 Container_List 1
QSet10 Container_Set 10
QList10 Container_List 10
QSet100 Container_Set 100
QList100 Container_List 100
... ... ...
QSet1000000 Container_Set 1000000
QList1000000 Container_List 1000000

The test method itself, i.e. testContainment() can be seen below. The QFETCH macro creates a local variable 'ContainerType type' and 'int size'. These local variables get populated with the data from the test table located within the testContainment_data(). Once we have the data, we call the helper function testContains().

void TestContainer::testContainment() {
    QFETCH(ContainerType, type);
    QFETCH(int, size);

    if (type == Container_Set)
        testContains<QSet<int> >(size);
    else
        testContains<QList<int> >(size);
}

The role of testContains() is to populate the container (QList or QSet) with data. Since we want the function to work with any container, we made testContains() a template function. The template function is both declared and defined in the header file.

Once the container is populated with data, we perform the actual test with the macro QBENCHMARK which tests containment of a single value. The integer value that we check is an arbitrary negative value. Since we populated the container with non-negative values only, we can be sure that the negative value won't be present in the container. This represents the worst-case scenario for QList when the entire QList will have to be searched. The assertion that comes after the QBENCHMARK closing braces is there to make sure that 'val' is actually used and not optimized away by the compiler.

const int negValue = -1;

template <typename T>
void testContains(int size)
{
    T container;

    for (int i = 0; i < size; ++i)
        container << i;

    bool val;

    QBENCHMARK {
            val = container.contains(negValue);
    }

    Q_ASSERT(val == false);
}

We run the test from main.cpp with QTest::qExec():

int main(int argc, char *argv[])
{
    TestContainer cTest;
    return QTest::qExec(&cTest, argc, argv);
}

The test produces the following result:

If we summarize the results in a table, we can see that the QSet has O(1) complexity for containment testing, in other words, the performance of the contains() method is independent of the number of entries in the QSet. On the other hand, QList has O(n) complexity for containment testing, thus the time is linearly growing with the number of elements in the list.

Container size QSet - walltime [ms] QList - walltime [ms]
1 0.000038 0.000040
10 0.000038 0.000086
100 0.000049 0.00055
1000 0.000049 0.0051
10000 0.000049 0.050
100000 0.000049 0.50
1000000 0.000038 5.1

Conclusion: To put the results into perspective - in unoptimized Debug mode, checking whether a single integer value is present in a QList container of 1,000,000 entries might take up to around 5 ms, which means that checking 200 integer values in the list of the same size might take up to 1 second! The same check for QSet would take 200*0.000038 ms = 0.0076 ms, i.e. 0.0000076s.

Tagged: Qt