Differences and Intersections on Infinite Lists

I’m working on an application that needs to compute set operations (union, intersection and difference) on possibly infinite ordered lists. In my application I’m working with lists of dates, but dates aren’t important for this problem.

Since it is not generally possible to compute the intersection or difference of two infinite sets, we need an approximation. One possible approach is to truncate the inputs when you perform difference or intersection, reducing the domain to finite lists. This is the approach I’m currently using, largely because my application (which happens to be written in Python) depends on the Labix dateutil.rruleset, and I’m leveraging its set operations.

Today I happened to stumble upon a short discussion of this problem on Brent Yorgey’s blog. In response to a reader’s question, Brent sketches out a clever approach that introduces a new algebraic type to represent infinite lists, normal lists, and ranges. He then suggests that it might be possible to implement difference by converting an infinite list type into a range type. It’s not clear to me that this is the best approach, so I decided to write up my own solution.

I think a more meaningful way to approximate the set operations is by bounding the number of comparisons that you make as you compute each successive item in the difference or intersection. This works well when you can assume that you will either produce an item “nearby”, or not find one at all. This approach has the advantage of preserving infinite lists.

This is a literate Haskell post, so you can copy and paste it verbatim into your editor.

> module InfiniteSets where

First consider the infinite difference function, infDiff. With infinite inputs we run into trouble if the two sets contain a long, equal subsequence. We avoid that situation by returning the empty set after a given number ‘n’ steps.

> infDiff :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a]
> infDiff n l1 l2 = f n l1 l2
>     where f _ [] _  = []
>           f _ xs [] = xs
>           f i (x:xs) (y:ys) | i <= 0    = []
>                             | x == y    = f (i-1) xs ys
>                             | x <  y    = x : (f n xs (y:ys))
>                             | otherwise = f n (x:xs) ys

Next consider the infinite intersection function, infIntersect. With infinite inputs we fall into unbounded execution if the two sets contain a long sequence with an empty intersection. Again , we handle that situation by returning an empty set after a given number ‘n’ steps.

> infIntersect :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a]
> infIntersect n l1 l2 = f n l1 l2
>     where f _ [] _  = []
>           f _ _  [] = []
>           f i (x:xs) (y:ys) | i <= 0    = []
>                             | x == y    = x:(f n xs ys)
>                             | x <  y    = f (i-1) xs (y:ys)
>                             | otherwise = f (i-1) (x:xs) ys

Here are a few examples followed by the expected output.

> testInfDiff1      = infDiff 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50]
> [2,5,20,100]
> testInfDiff2      = infDiff 10 [3,5..] [3,5..]
> []
> testInfDiff3      = infDiff 10 ([1,2] ++ [3,5..]) [3,5..]
> [1,2]
> testInfDiff4      = take 20 $ infDiff 10 [1,3..] [2,4..]
> [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39]
> testInfDiff5      = take 20 $ infDiff 10 ([1,2] ++ [3,5..]) [3,5..25]
> [1,2]

> testInfIntersect1 = infIntersect 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50]
> [1,3,10,50]
> testInfIntersect2 = take 20 $ infIntersect 10 [3,5..] [3,5..]
> [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41]
> testInfIntersect3 = take 20 $ infIntersect 10 [3,5..] [2,4..]
> []

I received some helpful comments about this problem on the #haskell irc channel, but I would appreciate other comments about this problem and this solution. Since I’m such a Haskell novice, I’m particularly interested in alternate solutions.

About these ads

3 Comments »

  1. Brent said

    I like this approach. The one issue I have with it is that you may not like the restriction of having to decide *in advance* how far you want the difference or intersection functions to look before stopping. Also, if and when they do stop, there’s a lot of information that’s lost: although you may know that, say, infDiff went through 10 iterations before giving up, you won’t know anything about the list elements it discarded along the way.

    Here’s a slight variation I thought of which solves these problems. Define a special list type that looks like this:

    data StutterList a = Nil | Cons a (StutterList a) | AtLeast a (StutterList a)

    The value (Cons x xs) of course tells you that the first element in the list is x, and the remaining elements are given by xs. Since we’re dealing with sorted lists we also know that all the elements of xs are at least x. The value (AtLeast x xs), on the other hand, just says that all the values in the list are at least x, without telling you what the first element is (it might be x, or it might be something bigger; we just don’t know yet).

    Using this type, it is not hard to write infDiff and infIntersect in a way that is guaranteed not to go into infinite recursion, even when called on infinite lists, since every recursive call will be guarded by a constructor (a Cons constructor, in the case of finding a value which is actually contained in the output list, and an AtLeast constructor, in the case of finding one that isn’t). More importantly, this means that code calling infDiff and infIntersect doesn’t have to decide up front how far to look — it can decide on the fly, possibly making use of knowledge about the size of potential remaining list elements (i.e., it could keep running until being sure that any remaining elements are greater than 100), or data computed from the initial elements of the result list, and so on. I leave writing such implementations as an exercise. =)

  2. [...] tech guy in midtown the notebook of a computer scientist living in midtown manhattan « Differences and Intersections on Infinite Lists. [...]

  3. harfga said

    Hi Brent,

    Thanks for taking the time to reply. I wrote up another post expanding upon the approach that you suggest.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: