Archive for April, 2008

Differences and Intersections on Infinite Lists

I’m working on an application that needs to compute set operations (union, intersection and difference) on possibly infinite ordered lists. In my application I’m working with lists of dates, but dates aren’t important for this problem.

Since it is not generally possible to compute the intersection or difference of two infinite sets, we need an approximation. One possible approach is to truncate the inputs when you perform difference or intersection, reducing the domain to finite lists. This is the approach I’m currently using, largely because my application (which happens to be written in Python) depends on the Labix dateutil.rruleset, and I’m leveraging its set operations.

Today I happened to stumble upon a short discussion of this problem on Brent Yorgey’s blog. In response to a reader’s question, Brent sketches out a clever approach that introduces a new algebraic type to represent infinite lists, normal lists, and ranges. He then suggests that it might be possible to implement difference by converting an infinite list type into a range type. It’s not clear to me that this is the best approach, so I decided to write up my own solution.

I think a more meaningful way to approximate the set operations is by bounding the number of comparisons that you make as you compute each successive item in the difference or intersection. This works well when you can assume that you will either produce an item “nearby”, or not find one at all. This approach has the advantage of preserving infinite lists.

This is a literate Haskell post, so you can copy and paste it verbatim into your editor.

> module InfiniteSets where

First consider the infinite difference function, infDiff. With infinite inputs we run into trouble if the two sets contain a long, equal subsequence. We avoid that situation by returning the empty set after a given number ‘n’ steps.

> infDiff :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a]
> infDiff n l1 l2 = f n l1 l2
>     where f _ [] _  = []
>           f _ xs [] = xs
>           f i (x:xs) (y:ys) | i <= 0    = []
>                             | x == y    = f (i-1) xs ys
>                             | x <  y    = x : (f n xs (y:ys))
>                             | otherwise = f n (x:xs) ys

Next consider the infinite intersection function, infIntersect. With infinite inputs we fall into unbounded execution if the two sets contain a long sequence with an empty intersection. Again , we handle that situation by returning an empty set after a given number ‘n’ steps.

> infIntersect :: (Eq a, Ord a, Enum a) => Int -> [a] -> [a] -> [a]
> infIntersect n l1 l2 = f n l1 l2
>     where f _ [] _  = []
>           f _ _  [] = []
>           f i (x:xs) (y:ys) | i <= 0    = []
>                             | x == y    = x:(f n xs ys)
>                             | x <  y    = f (i-1) xs (y:ys)
>                             | otherwise = f (i-1) (x:xs) ys

Here are a few examples followed by the expected output.

> testInfDiff1      = infDiff 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50]
> [2,5,20,100]
> testInfDiff2      = infDiff 10 [3,5..] [3,5..]
> []
> testInfDiff3      = infDiff 10 ([1,2] ++ [3,5..]) [3,5..]
> [1,2]
> testInfDiff4      = take 20 $ infDiff 10 [1,3..] [2,4..]
> [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39]
> testInfDiff5      = take 20 $ infDiff 10 ([1,2] ++ [3,5..]) [3,5..25]
> [1,2]

> testInfIntersect1 = infIntersect 10 [1,2,3,5,10,20,50,100] [1,3,4,6,10,15,30,50]
> [1,3,10,50]
> testInfIntersect2 = take 20 $ infIntersect 10 [3,5..] [3,5..]
> [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41]
> testInfIntersect3 = take 20 $ infIntersect 10 [3,5..] [2,4..]
> []

I received some helpful comments about this problem on the #haskell irc channel, but I would appreciate other comments about this problem and this solution. Since I’m such a Haskell novice, I’m particularly interested in alternate solutions.

Comments (3)

WSJ: David Einhorn

Yesterday I read a friendly feature article about David Einhorn in the WSJ. The article, titled “A New Face Of Hedge Funds Isn’t Shy”, runs alongside a color photo of Einhorn wearing a lucky sweatshirt while he stands among poker tables during a break at the 2006 WSOP. The authors touch upon statements that he’s made as an activist investor, some of his charitable actions, and his avant guard working style.

Einhorn has recently ruffled some feathers by criticizing regulatory organizations such as the SEC. He makes the point that regulators don’t have sufficient incentives to properly scrutinize corporations. For example, he’s quoted as saying “The SEC is run by a corporate advocate, not an investor advocate, so investors are getting a false sense of security.”

I learned a little (emphasize little) about corporate oversight when I prepared for the Series 7 and Series 63. My studies left me with the impression that regulations are largely motivated by periodic financial crises, when Congress is pushed to take legislative action by their screaming constituents. This methodology seems flawed and error-prone. We’re seeing it right now with calls to take action in this credit crisis. I’d like to learn more about the details of corporate oversight because it touches on the intersection of finance and politics.

I also liked the article’s description of Einhorn’s working style. It says that he takes afternoon naps and is usually home in time to have dinner with his family. I envy that working style because I think it engenders a healthy, relaxed style of thinking. I personally know that winning poker players bring that attitude to the table. My guess is that Einhorn’s working style has played an important role in his success on Wall Street and at the poker table.

Leave a Comment

I’m Learning Haskell

For the past couple of months I have been learning Haskell. I don’t have an immediate need to know the language, but I like looking at solutions through the lens of functional programming.

I caught the functional bug last year when I wrote a distributed computing framework in Erlang for a large investment bank. The distributed computing features of Erlang were certainly nice, but I also appreciated the code’s predictable execution. If the code compiled, it almost always worked as expected.

Sparked by that experience, I started looking around for another functional language to learn. Rumblings in the blogosphere about Haskell’s strict, succinct expressions caught my attention, so I started learning the language and hacking up some code.

Thanks to excellent documentation coupled with expert answers on the #haskell IRC channel, I feel like I’ve ramped up fairly quickly. I intend to give a little back to the Haskell community by writing posts that attempt to clarify concepts that I found confusing. I’m currently preparing a post that offers yet another introduction to monads. I also want to post some code that I’ve written to allow the expression and manipulation of recurring dates.

Leave a Comment

First Post

I’m excited about developing this blog. Years ago I had a personal homepage, where I posted pictures, book reviews, short pieces of fiction, and technical notes. I enjoyed having a public medium to publish my ideas. Doing so not only helped me remember events or refine thoughts, but occasionally other people would stumble upon the page and find something they liked.

I eventually decided to take the page down because my google presence irked me. I wasn’t embarrassed about any of the content. I was bothered for the same reason that I never wear tshirts with funny slogans on them and my car doesn’t have bumper stickers. I didn’t like that new acquaintances tended to frame my personality or our conversations using material from my homepage. So I took my homepage down.

I intend to keep this blog anonymous to avoid this problem. Also, I work at a hedge fund, so I don’t want to cast any unwanted attention on my employer or my coworkers.

Still, my goals for this blog are similar to those for my old homepage. I hope to save and refine my ideas by writing them down. Along the way, hopefully I’ll interest a few visitors. One difference is that blogging software makes it much easier for visitors to submit comments. I hope that I’ll occasionally get a little feedback that will teach me something or amuse me.

Leave a Comment

Follow

Get every new post delivered to your Inbox.