Disclaimer
Today we are going to talk about a structure that I'll call segment tree. It might not be the politically correct name for it, but anyway.
Introduction
One of the bit tricks we are using is __builtin_ctz which gives us the amount of trailing zeroes in the binary representation of the number given to it as argument. It is a builtin function provided by the GNU C++ compiler, but if it is not available we can emulate it with a while loop.
The other trick is doing len&-len, iff len != 0 this expression returns the rightmost bit set in len.
So basically what we do when answering a query is: we position ourselves in the position a, and then we accumulate the result up to b using the binary representation of the distance between them. Every bit set to 1 in a position k of the binary representation of that distance is advanced using the segment of length 2^k starting at the current position.
Complexity
build: O(n . log(n))
query: O(log(n))
Now we have the problem of the update. Let's try to get that right.
Second approach
We can start by noticing that there is a lot of redundant information. We can create a different 'segment structure' that only creates enough segments of length 2^k to cover the entire array, not n segments of length 2^k for every k. Here is when the binary tree structure comes in.
We are going to store the information of those segments using a binary tree.
The root will contain the information of the segment of length 2^k that covers the array completely (so 2^k > n). Then the left child will contain the information of the segment of length 2^(k-1) that covers the first half of the array, and the right child will contain the information of the other segment of length 2^(k-1) corresponding to the other half, and so on for their children's until we are left with segments of length 2^0. And those will be our leafs.
We can see that the resulting tree is a complete binary tree, except for the leaf level, and we know those trees are very easy to store in practice using an array of length equal to the number of nodes. The number of nodes is going to be two times the length of the original array. To see this let's count the number of nodes. The leaf level will have n nodes, the level above this one n/2, and the one above it n/4, until we hit just one node.
n + n/2 + n/4 + n/8 + .. = n . (1 + 1/2 + 1/4 + 1/8 + ...) = n . 2
Now let's go with the implementation.
The query segments are interpreted as a closed/open interval [a,b). We need to call all functions with the arguments node number 1, left 0, and right n.
And that's pretty much it.
Complexity:
Assuming combine operation runs in constant time.
build: O(n)
The argument is the one we used before, the number of nodes is linear.
query: O(log(n))
Suppose that we are going to be queried for a segment that starts where a segment of length 2^k starts in our binary tree, but ends 2^k-1 positions ahead, that means that we are going to have to use all segments 2^(k-1) , 2^(k-2), 2^(k-3)... until we reach 2^0. That is clearly a sequence of length at most log(n).
Now let's forget about that restriction (starts where a 2^k length segment does and all other segments used to calculate have length less than 2^k) and find the biggest segment that splits our range in two (it is also a split in the binary tree), and do the exact same calculation as we did before but for the first half and for the second half now. Now we see that the maximum possible number of segments that we can touch is two times log(n).
update: O(log(n))
This one is easier, we only update the segments that intersect our update point, and there are exactly log(n) such segments.
Lazy Propagation
The idea behind lazy propagation is to do just that. In the update event, if we are updating a contiguous sequence of cells, instead of updating all the cells in the given range, we propagate the change while the segment being analysed by the update call is not contained completely by the updated range. When that happens we just mark a flag saying 'hey continue to update this range latter'. We'll have a problem to practice this.
That delays the update action until a query comes in to get the actual value. Just like in lazy evaluation.
Problems
The problems are by no means easy, but if you manage to solve them you are pretty much done with learning about segment tree.
Frequent values: this is a classic example of segment tree. The easy one.
Can you answer these queries I: not so classic, but fun. The hard one.
Ahoy, Pirates!: try to solve this problem, it is hard and it also needs lazy propagation for it to be within the time limit, but it is a very good one.
Other problems
Other problems we are able to solve using this structure are:
Given a dynamic set of integers, meaning that you can add/remove elements, getting the k-th element in log(n). Note that the sorting approach is not good enough and that using k-th element of STL gives you linear time, while it will diminish your ability to remove elements. Exercise with K-th Number.
Sweep-line 'geometric' algorithms, like we have in Election Posters.
Pretty sure I'm missing some techniques or tricks, so if you know about one, let me know so I can add it.
Today we are going to talk about a structure that I'll call segment tree. It might not be the politically correct name for it, but anyway.
Introduction
The segment tree data structure described in this post is useful when the solution to our problem comes from a reduction with a given operator within a range of elements in an array. Updating the original array is property that we'll ask for later, and the time complexity for the query/update actions will be O(log(n)).
Let's start by giving some examples for what is it good for. Suppose we have an array of numbers arr of length n. Given [a, b) where 0 <= a < b <= n, a range of elements:
- Getting the maximum/minimum element of the range. (the max/min operator)
- Getting the sum/product/xor of the elements in the range. (the +/*/^ operator). *
- Getting the biggest continuous subsequence in that range. (custom operator that combines segments)
* OK, we can do this ones easily in O(1), but if we are going to update the values in the array, we are going to need a better data structure.
First approach
First approach
Now let's think of how can we accomplish such time complexity without update. We are going to write a simpler than segment tree data structure that only answers query operations. To achieve this we are going to store the result of applying the given operation in all segments starting at any cell of the array and of length 2^k for k = 0..log2(n) and we'll call the array seg_tree(even though it is not a tree).
So to calculate the operation on the segment of length 2^(k+1) starting at position i we are going to combine the segment of length 2^k starting at i and the segment of length 2^k starting at (i+2^k).
So to calculate the operation on the segment of length 2^(k+1) starting at position i we are going to combine the segment of length 2^k starting at i and the segment of length 2^k starting at (i+2^k).
int seg_tree[MAXN][LOGN]; void build(int n) { for (int i = 0; i < n;i++) seg_tree[i][0] = arr[i]; for (int j = 1; j <= n;j<<=1) for (int i = 0; i < n;i++) seg_tree[i][j] = seg_tree[i][j-1]; if (i+j < n) seg_tree[i][j] += seg_tree[i+j][j-1]; } } int query(int a, int b) { int idx = a, len = b-a-1, r = 0, rmbit; while(len) { r += seg_tree[idx][__builtin_ctz(rmbit = len&-len)];
idx += rmbit; len -= rmbit; } return r; }
One of the bit tricks we are using is __builtin_ctz which gives us the amount of trailing zeroes in the binary representation of the number given to it as argument. It is a builtin function provided by the GNU C++ compiler, but if it is not available we can emulate it with a while loop.
The other trick is doing len&-len, iff len != 0 this expression returns the rightmost bit set in len.
So basically what we do when answering a query is: we position ourselves in the position a, and then we accumulate the result up to b using the binary representation of the distance between them. Every bit set to 1 in a position k of the binary representation of that distance is advanced using the segment of length 2^k starting at the current position.
Complexity
build: O(n . log(n))
query: O(log(n))
Now we have the problem of the update. Let's try to get that right.
Second approach
We can start by noticing that there is a lot of redundant information. We can create a different 'segment structure' that only creates enough segments of length 2^k to cover the entire array, not n segments of length 2^k for every k. Here is when the binary tree structure comes in.
We are going to store the information of those segments using a binary tree.
The root will contain the information of the segment of length 2^k that covers the array completely (so 2^k > n). Then the left child will contain the information of the segment of length 2^(k-1) that covers the first half of the array, and the right child will contain the information of the other segment of length 2^(k-1) corresponding to the other half, and so on for their children's until we are left with segments of length 2^0. And those will be our leafs.
We can see that the resulting tree is a complete binary tree, except for the leaf level, and we know those trees are very easy to store in practice using an array of length equal to the number of nodes. The number of nodes is going to be two times the length of the original array. To see this let's count the number of nodes. The leaf level will have n nodes, the level above this one n/2, and the one above it n/4, until we hit just one node.
n + n/2 + n/4 + n/8 + .. = n . (1 + 1/2 + 1/4 + 1/8 + ...) = n . 2
Now let's go with the implementation.
int seg_tree[MAXN*2+1]; int combine(int a, int b) { // const complexity function to combine two solutions(+,-,^,min,max). ... } void build(int node, int left, int right) { if (left+1 == right) { // leaf case. seg_tree[node] = arr[left]; return; } int mid = left+right>>1; build(node * 2, left, mid); build(node * 2 + 1, mid, right); // we are a non-leaf node, so we should do some processing. seg_tree[node] = combine(seg_tree[node * 2], seg_tree[node * 2 + 1]); } int query(int node, int left, int right, int a, int b) { // we are completely outside the range. if (b <= left or a >= right) return -1; // neutral of the 'operator' // we are completely inside the range. if (left <= a and b <= right) return seg_tree[node]; int mid = left+right>>1; int answ_left = query(node * 2, left, mid, a, b); int answ_right = query(node * 2 + 1, mid, right, a, b); return combine(answ_left, answ_right); } int update(int node, int left, int right, int idx, int val) { if (left+1 == right) { // leaf node, implies left == idx return seg_tree[node] = arr[left] = val; } int mid = left+right>>1; if (idx > mid) update(node * 2, left, mid, idx, val); else update(node * 2 + 1, mid, right, idx, val); return seg_tree[node] = combine(seg_tree[node * 2], seg_tree[node * 2 + 1]); }
The query segments are interpreted as a closed/open interval [a,b). We need to call all functions with the arguments node number 1, left 0, and right n.
And that's pretty much it.
Complexity:
Assuming combine operation runs in constant time.
build: O(n)
The argument is the one we used before, the number of nodes is linear.
query: O(log(n))
Suppose that we are going to be queried for a segment that starts where a segment of length 2^k starts in our binary tree, but ends 2^k-1 positions ahead, that means that we are going to have to use all segments 2^(k-1) , 2^(k-2), 2^(k-3)... until we reach 2^0. That is clearly a sequence of length at most log(n).
Now let's forget about that restriction (starts where a 2^k length segment does and all other segments used to calculate have length less than 2^k) and find the biggest segment that splits our range in two (it is also a split in the binary tree), and do the exact same calculation as we did before but for the first half and for the second half now. Now we see that the maximum possible number of segments that we can touch is two times log(n).
update: O(log(n))
This one is easier, we only update the segments that intersect our update point, and there are exactly log(n) such segments.
Lazy Propagation
The idea behind lazy propagation is to do just that. In the update event, if we are updating a contiguous sequence of cells, instead of updating all the cells in the given range, we propagate the change while the segment being analysed by the update call is not contained completely by the updated range. When that happens we just mark a flag saying 'hey continue to update this range latter'. We'll have a problem to practice this.
That delays the update action until a query comes in to get the actual value. Just like in lazy evaluation.
Problems
The problems are by no means easy, but if you manage to solve them you are pretty much done with learning about segment tree.
Frequent values: this is a classic example of segment tree. The easy one.
Can you answer these queries I: not so classic, but fun. The hard one.
Ahoy, Pirates!: try to solve this problem, it is hard and it also needs lazy propagation for it to be within the time limit, but it is a very good one.
Other problems
Other problems we are able to solve using this structure are:
Given a dynamic set of integers, meaning that you can add/remove elements, getting the k-th element in log(n). Note that the sorting approach is not good enough and that using k-th element of STL gives you linear time, while it will diminish your ability to remove elements. Exercise with K-th Number.
Sweep-line 'geometric' algorithms, like we have in Election Posters.
Pretty sure I'm missing some techniques or tricks, so if you know about one, let me know so I can add it.