Profiling revealed that actually a lot of this time (~50%) is spent on the lazy computation - this is kinda surprising to me.
That is interesting. Could you place a counter in the lazy computation function to make sure it isn't executed more often than the number of territories?
As you say, though, it's not the bottleneck so the priority is low.
While your algorithm seems to make sense, it looks like there are usually not enough polygons for it to speed the code up much.
The key to the algorithm was in the last part (the part that I don't really remember :P). By choosing the point to cut the area up in a "smart" way, one was able to make a guarantee that no more than a * lg (n)
(where a
is a constant) cuts are needed, while also making sure that every polygon would then be compared to a constant number of other polygons or less, meaning that the algorithm would run in O(n * lg(n))
which is significantly better than the O(n^2)
we had before. This means that probably you will be getting gains even on low polygon counts. Unfortunately, though it probably still wouldn't help much considering this isn't what we are spending most of our time on.
Profiling also revealed that about 40% of the connection checking time is spent computing the distances between points using #distance_sq method. Since this method can't be made any faster or simpler, we'll have to, well, compute less distances :P
First off, I would suggest actually calculating the distance between the line segments.
Defining distLP(line, point)
as the distance between the line and the point and distPP(point, point)
as the distance between two points, we can define distLsP
for the distance between a line segment and a point as below:
distLsP(lineSegment, point)
{
linedist = distLP(lineSegment, point)
endpoint1dist = distPP(lineSegment.endpoint1, point)
endpoint2dist = distPP(lineSegment.endpoint2, point)
linelength = distPP(lineSegment.endpoint1, lineSegment.endpoint2)
if (endpoint1dist ^ 2 > linelength ^ 2 + linedist ^ 2 ||
endpoint2dist ^ 2 > linelength ^ 2 + linedist ^ 2)
{
return min(endpoint1dist, endpoint2dist);
}
else
{
return linedist;
}
}
(Note that the >
in the code above should of course be a ">". The problem is that the markdown implementation used on this website is conflicting with the forum software used here and as such, it is being displayed incorrectly.)
using this function, one can define the distance between two line segments:
distLsLs(lineSegment1, lineSegment2)
{
return min(distLsP(lineSegment1, lineSegment2.endpoint1),
distLsP(lineSegment1, lineSegment2.endpoint2),
distLsP(lineSegment2, lineSegment1.endpoint1),
distLsP(lineSegment2, lineSegment1.endpoint2))
}
Unfortunately, we are adding distance calculations (linelength
and linedist
) instead of removing them. (Note that here lazy computation actually pays off. At the moment, we are using each of the distances between two points multiple times, and in a moment, we will be removing a number of these checks, making sure that we don't need all of the distances between points anymore).
However, while this was possible before, it becomes a lot more interesting now to remove some of these checks. We'll just use more bounding boxes. Currently, we are only using bounding boxes on the level of territories. We can also use them here at the level of line segments. While this wouldn't make much sense normally, it does here because we are allowing a certain distance between the shapes here. I think we should really be able to make interesting gains here, even though we ate going for more interesting matching here.
I have found something quite interesting: https://en.wikipedia.org/wiki/Gilbert–Johnson–Keerthi_distance_algorithm and an implementation in D: http://code.google.com/p/gjkd/ This looks like exactly what I need here, but the thing seems quite complicated. Maybe I'll get around to implementing this, someday.
That is indeed looking quite interesting and might indeed be a much more efficient way to go about this. At the same time, it does also look like it indeed is quite complicated...