Here’s an exercise: given a nice piece of code sitting in a file, how do you take that code and make it generic, in the style of an STL algorithm?
For our example, let’s consider an algorithm that isn’t (yet) in the STL. First, the problem it solves. Imagine that you have a set (in the mathematical sense, not the container sense) of numbers; discontiguous, unordered, and notionally taken from the positive integers, e.g.:
[latex]\{5,2,7,9,4,0,1\}[/latex]
What we want to find is the smallest number that isn’t included in the set. For the example given above, the answer is 3. If the set is sorted, the solution is apparent; adjacent_find
will find the first gap for us. So one solution would be:
// assuming our set is in an array or vector
std::sort(a.begin(), a.end());
auto p = std::adjacent_find(
a.begin(), a.end(),
[] (int x, int y) { return y-x > 1; });
int unused = a.back() + 1;
if (p != a.end())
unused = *p + 1;
The asymptotic complexity of this solution is of course O(n.log(n)), since adjacent_find
is linear and the complexity of sort
dominates. Is there a better solution? It’s not obvious at first glance, but there is a linear time solution to this problem. The key is a divide and conquer strategy using that workhorse of algorithms, partition
.
The smallest unused number
Let’s start by assuming that the minimum unused value is zero. We’ll choose a pivot value, m
, which is the assumed midpoint of the sequence – it doesn’t actually have to be present in the sequence, but we’re going to simply assume that it’s the middle value. Partitioning the sequence about the pivot, we’ll get a situation like this:
where the first value equal to or greater than m
is at position p
(which is returned by the call to partition
). If the lower half of the sequence has no gaps, m
will be equal to p
, and we can recurse on the top half of the sequence, setting the new minimum unused value to m
.
We know that m
cannot be less than p
since there cannot be any repetitions in the set. Therefore, if m
is not equal to p
, it must be greater – meaning that there is at least one gap below p, and that we can recurse on the bottom half of the sequence (keeping the minimum unused value as-is).
The base case of the algorithm is when the sequence we need to partition is empty. At that point we will have found the minimum unused value. Here’s the algorithm in code, with a bit of setup:
void min_unused()
{
// initialize RNG
std::array seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 gen(seq);
// fill an array with ints, shuffle, discard some
int a[10];
std::iota(&a[0], &a[10], 0);
std::shuffle(&a[0], &a[10], gen);
int first_idx = 0;
int last_idx = 7; // arbitrary truncation
for (int i = first_idx; i < last_idx; ++i)
std::cout << a[i] << '\n';
// the algorithm
int unused = 0;
while (first_idx != last_idx) {
int m = unused + (last_idx - first_idx + 1)/2;
auto p = std::partition(&a[first_idx], &a[last_idx],
[&] (int i) { return i < m; });
if (p - &a[first_idx] == m - unused) {
unused = m;
first_idx = p - &a[0];
} else {
last_idx = p - &a[0];
}
}
std::cout << "Min unused: " << unused << '\n';
}
You can also see the algorithm on wandbox. I leave it to you to convince yourself that the algorithm works. The asymptotic complexity of the algorithm is linear; at each stage we are running partition
, which is O(n), and then recursing, but only on one half of the sequence. Thus, the complexity is:
[latex]O(n + \frac{n}{2} + \frac{n}{4} + \frac{n}{8} + ...) = O(\sum_{i=0}^\infty \frac{n}{2^i}) = O(2n) = O(n)[/latex]
From concrete to generic: first steps
So, now we have an algorithm that works... for C-style arrays, and in one place only. Naturally we want to make it as generic as we can; what, then, are the steps to transform it into a decent algorithm that can be used just as easily as those in the STL? Perhaps you're already thinking that this is overly concrete - but hey, this is a pedagogical device. Bear with me.
First, we can pull the algorithm out into a function. You've already seen the setup code, so you should be able to imagine the calling code from here on. While we're at it, let's change the index-based arithmetic to pointers. We can pass the sequence in to the function as an argument; for now, we may as well say that the sequence is a vector
, because vector
is always the right default choice! Our updated function:
// version 1: a function that takes a vector
int min_unused(std::vector& v)
{
int* first = &v[0];
int* last = first + v.size();
int unused = 0;
while (first != last) {
int m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (int i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}
This is already looking a little clearer. Using pointers removes some of the syntactic noise and allows us to see what's going on better. But, as you may already have thought, once we have vector
s, the next logical step is to consider using iterators. Let's make that change now so that we can run the algorithm on a subrange within any vector
.
// version 2: use iterators, like a real algorithm
template
inline int min_unused(It first, It last)
{
int unused = 0;
while (first != last) {
int m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (int i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}
Now instead of using the vector
directly, now we can pass in begin()
and end()
, or any other valid range-delimiting pair of iterators. Note also that because min_unused
is now a function template, we added inline
on line 3, meaning that we can put it in a header without causing link errors when it's instantiated the same way in multiple translation units.
This is a good start, but we can make it more generic yet! At the moment it's still only working on sequences of int
s, so let's fix that.
// version 3: infer the value_type
template
inline typename std::iterator_traits::value_type min_unused(It first, It last)
{
using T = typename std::iterator_traits::value_type;
T unused{};
while (first != last) {
T m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}
Here we're using iterator_traits
to discover the value_type
of the iterators passed in. This works with anything that satisfies the Iterator concept. Consequently we've also now removed the int
s that were scattered through the code and instead are using T
s. Note that we are value-initializing the T
on line 6, which will properly zero-initialize integral types.
Since we aren't actually calculating the initial value of unused
, we need not assume it to be zero; we could let the caller pass it in, and default the argument for convenience. In order to do that, we'll pull out T
into a defaulted template argument.
// version 4: let the user supply the initial minimum
template ::value_type>
inline T min_unused(It first, It last, T init = T{})
{
while (first != last) {
T m = init + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (p - first == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}
A caller-provided minimum might be quite handy - it's common for zero to be a reserved value, or to have a non-zero starting point for values.
Iterators again
At this point, min_unused
is starting to look more like a real STL algorithm, at least superficially, but it's not quite there yet. Lines 7 and 10 are doing plain arithmetic, which assumes that the iterators passed in support that. A fundamental concept in the STL is the idea of iterator categories: concepts which define operations available on different kinds of iterators. Not all iterators support random access, and any time we are writing a generic algorithm, we had better use the appropriate functions to manipulate iterators rather than assuming valid operations:
// version 5: handle iterator arithmetic generically
template ::value_type>
inline T min_unused(It first, It last, T init = T{})
{
while (first != last) {
T m = init + (std::distance(first, last)+1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (std::distance(first, p) == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}
In the code above, the arithmetic on lines 7 and 10 has been replaced with calls to distance. This is a key change! Now we are no longer limited to random access iterators like those of vector
or just plain pointers. Our algorithm now works on all kinds of iterators, but which ones exactly? It behooves us to define exactly what we require of these iterators being used as our arguments. Until C++ has Concepts, we can't enforce this, but we can document it, typically by calling the template argument something more descriptive than 'It
'. Looking at the references for distance
and partition
, we find that a forward iterator is required.
For many algorithms in the STL, there are more efficient versions available for stronger iterator categories. A good example is partition
itself: with just a forward iterator, it may do N swaps, but with a bidirectional iterator, it need only perform, at most, N/2 swaps. Wherever we want to use different algorithms for different iterator categories, overloading with tag dispatch is a common technique; the STL provides iterator tag types for use as arguments to these overloads.
But with min_unused
, we don't have any differences in the top level algorithm that would give us greater efficiency based on iterator category. Both distance
and partition
do that work themselves.
// version 6: document the relaxed iterator category
template ::value_type>
inline T min_unused(ForwardIt first, ForwardIt last, T init = T{})
{
while (first != last) {
T m = init + (std::distance(first, last)+1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (std::distance(first, p) == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}
Strength reduction
Now is also a good time to perform some manual strength reduction on our code, making sure that each operation used is the weakest option to do the job. This doesn't really apply to any operation instances in min_unused
, but in practical terms it typically means avoiding postfix increment and decrement. If the algorithm implements more than trivial mathematics it may also require quite a bit of decomposition to discover all potential efficiencies and opportunities to eliminate operations. For more examples, I recommend watching any and all of Alex Stepanov's excellent series of video lectures.
We do have one operation in this algorithm that stands out: on line 7 there is a divide by 2. Twenty years ago, I might have converted this to a shift, but in my opinion the divide is clearer to read, and these days there is no competitive compiler in the world that won't do this trivial strength reduction. According to my experiments with the Compiler Explorer, MSVC and GCC do it even without optimization.
STL-ready?
So there we have it. Our algorithm is looking pretty good now by my reckoning - appropriately generic and, overall, a good addition to my toolkit.
Remaining are just a couple of final considerations. The first is ranges: the Ranges TS will change the STL massively, providing for new algorithms and changes to existing ones. A specific consequence of ranges is that begin and end iterators need not be the same type any more; how that may affect this algorithm I leave as an exercise to the reader.
Even more generic?
The other consideration takes us further still down the path of making our algorithm as generic as possible. A useful technique for discovering the constraints on types is to print out the code in question and use markers to highlight variables that interact with each other in the same colour. This can reveal which variables interact as well as which ones are entirely disjoint in usage - a big help in determining how they should be made into template parameters. In general, examining how variables interact can be a fruitful exercise.
Thinking about how the variables are used in our case, we see (on line 7) that we need the ability to add the result of distance
to T
, and (on line 10) that the difference between T
s is comparable to the result of distance
. We are used to thinking about addition and subtraction on integers, where they are closed operations (i.e. adding two integers or subtracting two integers gives you another integer), but there is another possibility. And in fact, there is already a good example of it in the STL.
Arithmetic the chrono
way
It is possible for the type produced by subtraction on T
s to be something other than T
. This is what happens with chrono
, with time_point
and duration
. Subtracting two time_point
s yields a duration
. It is meaningless - and, therefore, illegal - to add two time_point
s, but duration
s may be added both to themselves and to time_point
s.
In mathematical language, chrono
models a one-dimensional affine space. (Thanks to Gašper Ažman for that particular insight!)
This is also the case with min_unused
: T
is our time_point
equivalent, and the result of distance
is our duration
equivalent, suggesting that we can pull out the difference type as a template parameter:
template ::value_type,
typename DiffT = typename std::iterator_traits::difference_type>
inline T min_unused(ForwardIt first, ForwardIt last, T init = T{})
{
while (first != last) {
T m = init + DiffT{(std::distance(first, last)+1)/2};
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (DiffT{std::distance(first, p)} == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}
The advantage here is that we can now use min_unused
to deal with any code that has different value and difference types, like chrono
:
// a vector of time_points, 1s apart
constexpr auto VECTOR_SIZE = 10;
std::vector v;
auto start_time = std::chrono::system_clock::time_point{};
std::generate_n(std::back_inserter(v), VECTOR_SIZE,
[&] () {
auto t = start_time;
start_time += 1s;
return t;
});
std::shuffle(v.begin(), v.end(), gen);
v.resize(3*VECTOR_SIZE/4);
for (auto i : v)
std::cout
<< std::chrono::duration_cast(i.time_since_epoch()).count()
<< '\n';
// we can now find the minimum time_point not present
auto u = min_unused<
decltype(v.begin()), decltype(start_time), std::chrono::seconds>(
v.begin(), v.end());
cout << "Min unused: "
<< std::chrono::duration_cast(u.time_since_epoch()).count()
<< '\n';
See this code on wandbox.
This concludes our exercise in developing a generic algorithm - at least, for now. Good luck with your own experiments!
Nice blog! The inline keyword is not required. Function templates and constexpr functions are implicitly inline AFAIK.
Templaterex, thanks! And yes, you’re right (although, having searched, this is pretty unclear to me from the standard – I think it’s implied by the template rules interacting with the ODR rules).
I’m still going to prefer and recommend explicit inline usage to be consistent with regular function usage, and function template specializations (17.7.3/12 [temp.expl.spec]).
I am specifically looking at [basic.def.odr]/6, which clearly mentions “template specialization for which some template parameters are not specified” to be allowed to have multiple definitions, that have to consist of the same sequence of tokens. So only full specializations of function templates (which are regular functions) require the inline keyword (or just make them constexpr, which implies inline).
You are right that it is consistent to apply inline also to function templates, but it’s not used in any of the STL implementations or other template heavy code that I know of. So it’s perhaps not idiomatic and surprising/distracting for readers of your code.
Or, one can use counting sort. Here’s the code in Haskell (as it can hardly can get more succinct than that):
Just found the following variant in my code. Slow for large sets but probably OK for small sets.
Advantages: const vector, duplicates allowed