chunkedseq
container library for large in-memory data sets
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
Weighted container

The chunkedseq containers can easily generalize to weighted containers. A weighted container is a container that assigns to each item in the container an integral weight value. The weight value is typically expressed as a weight function that is defined by the client and passed to the container via template argument.

The purpose of the weight is to enable the client to use the weighted-split operation, which divides the container into two pieces by a specified weight. The split operation takes only logarithmic time.

Example: split sequence of strings by length

The following example program demonstrates how one can use weighted split to split a sequence of string values based on the number of even-length strings. In this case, our split divides the sequence into two pieces so that the first piece goes into d and the second to f. The split function specifies that d is to receive the first half of the original sequence of strings that together contain half of the total number of even-length strings in the original sequence; f is to receive the remaining strings. Because the lengths of the strings are cached internally by the weighted container, the split operation takes logarithmic time in the number of strings.

#include <iostream>
#include <string>
#include "chunkedseq.hpp"
const int chunk_capacity = 512;
int main(int argc, const char * argv[]) {
using value_type = std::string;
using weight_type = int;
class my_weight_fct {
public:
// returns 1 if the length of the string is an even number; 0 otherwise
weight_type operator()(const value_type& str) const {
return (str.size() % 2 == 0) ? 1 : 0;
}
};
using my_cachedmeasure_type =
using my_weighted_deque_type =
my_weighted_deque_type d = { "Let's", "divide", "this", "sequence", "of",
"strings", "into", "two", "pieces" };
weight_type nb_even_length_strings = d.get_cached();
std::cout << "nb even-length strings: " << nb_even_length_strings << std::endl;
my_weighted_deque_type f;
d.split([=] (weight_type v) { return v >= nb_even_length_strings/2; }, f);
std::cout << "d = " << std::endl;
d.for_each([] (value_type& s) { std::cout << s << " "; });
std::cout << std::endl;
std::cout << std::endl;
std::cout << "f = " << std::endl;
f.for_each([] (value_type& s) { std::cout << s << " "; });
std::cout << std::endl;
return 0;
}

The program prints the following:

nb even strings: 6
d =
Let's divide this

f =
sequence of strings into two pieces