I’m working on an application that needs to compute set operations (union, intersection and difference) on possibly infinite ordered lists. In my application I’m working with lists of dates, but dates aren’t important for this problem.
Since it is not generally possible to compute the intersection or difference of two infinite sets, we need an approximation. One possible approach is to truncate the inputs when you perform difference or intersection, reducing the domain to finite lists. This is the approach I’m currently using, largely because my application (which happens to be written in Python) depends on the Labix dateutil.rruleset, and I’m leveraging its set operations.
Today I happened to stumble upon a short discussion of this problem on Brent Yorgey’s blog. In response to a reader’s question, Brent sketches out a clever approach that introduces a new algebraic type to represent infinite lists, normal lists, and ranges. He then suggests that it might be possible to implement difference by converting an infinite list type into a range type. It’s not clear to me that this is the best approach, so I decided to write up my own solution.
I think a more meaningful way to approximate the set operations is by bounding the number of comparisons that you make as you compute each successive item in the difference or intersection. This works well when you can assume that you will either produce an item “nearby”, or not find one at all. This approach has the advantage of preserving infinite lists.
This is a literate Haskell post, so you can copy and paste it verbatim into your editor.
> module InfiniteSets where
First consider the infinite difference function, infDiff. With infinite inputs we run into trouble if the two sets contain a long, equal subsequence. We avoid that situation by returning the empty set after a given number ‘n’ steps.
> infDiff :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a] > infDiff n l1 l2 = f n l1 l2 > where f _ [] _ = [] > f _ xs [] = xs > f i (x:xs) (y:ys) | i <= 0 = [] > | x == y = f (i-1) xs ys > | x < y = x : (f n xs (y:ys)) > | otherwise = f n (x:xs) ys
Next consider the infinite intersection function, infIntersect. With infinite inputs we fall into unbounded execution if the two sets contain a long sequence with an empty intersection. Again , we handle that situation by returning an empty set after a given number ‘n’ steps.
> infIntersect :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a] > infIntersect n l1 l2 = f n l1 l2 > where f _ [] _ = [] > f _ _ [] = [] > f i (x:xs) (y:ys) | i <= 0 = [] > | x == y = x:(f n xs ys) > | x < y = f (i-1) xs (y:ys) > | otherwise = f (i-1) (x:xs) ys
Here are a few examples followed by the expected output.
> testInfDiff1 = infDiff 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50] > [2,5,20,100] > testInfDiff2 = infDiff 10 [3,5..] [3,5..] > [] > testInfDiff3 = infDiff 10 ([1,2] ++ [3,5..]) [3,5..] > [1,2] > testInfDiff4 = take 20 $ infDiff 10 [1,3..] [2,4..] > [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39] > testInfDiff5 = take 20 $ infDiff 10 ([1,2] ++ [3,5..]) [3,5..25] > [1,2] > testInfIntersect1 = infIntersect 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50] > [1,3,10,50] > testInfIntersect2 = take 20 $ infIntersect 10 [3,5..] [3,5..] > [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41] > testInfIntersect3 = take 20 $ infIntersect 10 [3,5..] [2,4..] > []
I received some helpful comments about this problem on the #haskell irc channel, but I would appreciate other comments about this problem and this solution. Since I’m such a Haskell novice, I’m particularly interested in alternate solutions.