<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Lessons Learned: Parsing CSV in Haskell and Python</title>
	<atom:link href="http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/</link>
	<description>the notebook of a computer scientist living in midtown manhattan</description>
	<lastBuildDate>Sun, 11 Dec 2011 20:21:06 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Nicolas Bailey</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-79</link>
		<dc:creator><![CDATA[Nicolas Bailey]]></dc:creator>
		<pubDate>Sat, 26 Jul 2008 04:16:01 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-79</guid>
		<description><![CDATA[I suggest getting familiar with beautiful functions &quot;on&quot; (from Data.Function) and (&amp;&amp;&amp;) (Control.Arrow instance for functions) then you can turn this: instance Eq Row where a == b = ticker a == ticker b &amp;&amp; time a == time b instance Ord Row where compare a b = case compare (ticker a) (ticker b) of EQ -&gt; compare (time a) (time b) e -&gt; e comparePrice a b = compare (price a) (price b) to this: instance Eq Row where (==) = (==) `on` ticker &amp;&amp;&amp; time instance Ord Row where compare = compare `on` ticker &amp;&amp;&amp; time comparePrice = compare `on` price and the most interesting lesson is if you have some slow code, learn basis of haskell and join reddit, then you&#039;ll have phds like dons working their asses of, to tune&amp;pimp your code:&gt; (and it&#039;s really great to be part of haskell community, any other programming language community will write like 10 versions of code to help you?)]]></description>
		<content:encoded><![CDATA[<p>I suggest getting familiar with beautiful functions &#8220;on&#8221; (from Data.Function) and (&amp;&amp;&amp;) (Control.Arrow instance for functions) then you can turn this: instance Eq Row where a == b = ticker a == ticker b &amp;&amp; time a == time b instance Ord Row where compare a b = case compare (ticker a) (ticker b) of EQ -&gt; compare (time a) (time b) e -&gt; e comparePrice a b = compare (price a) (price b) to this: instance Eq Row where (==) = (==) `on` ticker &amp;&amp;&amp; time instance Ord Row where compare = compare `on` ticker &amp;&amp;&amp; time comparePrice = compare `on` price and the most interesting lesson is if you have some slow code, learn basis of haskell and join reddit, then you&#8217;ll have phds like dons working their asses of, to tune&amp;pimp your code:&gt; (and it&#8217;s really great to be part of haskell community, any other programming language community will write like 10 versions of code to help you?)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Don Stewart</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-74</link>
		<dc:creator><![CDATA[Don Stewart]]></dc:creator>
		<pubDate>Sat, 19 Jul 2008 22:14:10 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-74</guid>
		<description><![CDATA[Efficient bytestring-based Double parsing was added this weekend.

    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-lexing

I&#039;d love to know if that solves your performance issues. (I was able to parse a 50M file of Doubles in around 5seconds, so a 160k CSV parser should be easy).]]></description>
		<content:encoded><![CDATA[<p>Efficient bytestring-based Double parsing was added this weekend.</p>
<p>    <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-lexing" rel="nofollow">http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-lexing</a></p>
<p>I&#8217;d love to know if that solves your performance issues. (I was able to parse a 50M file of Doubles in around 5seconds, so a 160k CSV parser should be easy).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-73</link>
		<dc:creator><![CDATA[Greg]]></dc:creator>
		<pubDate>Fri, 18 Jul 2008 17:27:08 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-73</guid>
		<description><![CDATA[ADEpt:

Not initially, because I was getting an error about not having a debug library for bytestring.  But once I installed that library, I profiled it, and most of the time was being spent going ByteString -&gt; String -&gt; Double.

Greg]]></description>
		<content:encoded><![CDATA[<p>ADEpt:</p>
<p>Not initially, because I was getting an error about not having a debug library for bytestring.  But once I installed that library, I profiled it, and most of the time was being spent going ByteString -&gt; String -&gt; Double.</p>
<p>Greg</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ADEpt</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-72</link>
		<dc:creator><![CDATA[ADEpt]]></dc:creator>
		<pubDate>Fri, 18 Jul 2008 14:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-72</guid>
		<description><![CDATA[Reading the whole series of posts, I cant help but wonder - did you do a profiling to find out what is actually the biggest time consumer in your code?]]></description>
		<content:encoded><![CDATA[<p>Reading the whole series of posts, I cant help but wonder &#8211; did you do a profiling to find out what is actually the biggest time consumer in your code?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: atp</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-71</link>
		<dc:creator><![CDATA[atp]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 15:27:56 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-71</guid>
		<description><![CDATA[Greg,

Reading my comment now it seems as though I was implying that in this particular function unsafePerformIO might be inappropriate: it&#039;s not.  You are completely correct to use it here, as strtod is a pure function.  I was commenting in a more general way.

Haskell&#039;s type system separates pure functions from impure ones for practical reasons: you can make assumptions about pure functions that you can&#039;t make about impure ones, allowing optimizations and such.  I guess all I really wanted to say was that unsafePerformIO discards the information that marks the function as impure, and that&#039;s why it&#039;s called &quot;unsafe&quot;.  This doesn&#039;t mean you shouldn&#039;t use it, just that you should be aware that the function you&#039;re calling really does need to be pure, in the sense that its behavior can be completely characterized by its arguments (it doesn&#039;t access or depend on values stored in global variables or internal static buffers) and that it doesn&#039;t produce I/O side-effects or similar.

I guess my point was more to add a word of caution -- when using the FFI, make sure that you only use unsafePerformIO in situations when the function truly is pure.  If it&#039;s not pure, just don&#039;t use unsafePerformIO -- leave the value encapsulated in the IO monad, and Haskell&#039;s type system will take care of making sure that everything happens as it should.]]></description>
		<content:encoded><![CDATA[<p>Greg,</p>
<p>Reading my comment now it seems as though I was implying that in this particular function unsafePerformIO might be inappropriate: it&#8217;s not.  You are completely correct to use it here, as strtod is a pure function.  I was commenting in a more general way.</p>
<p>Haskell&#8217;s type system separates pure functions from impure ones for practical reasons: you can make assumptions about pure functions that you can&#8217;t make about impure ones, allowing optimizations and such.  I guess all I really wanted to say was that unsafePerformIO discards the information that marks the function as impure, and that&#8217;s why it&#8217;s called &#8220;unsafe&#8221;.  This doesn&#8217;t mean you shouldn&#8217;t use it, just that you should be aware that the function you&#8217;re calling really does need to be pure, in the sense that its behavior can be completely characterized by its arguments (it doesn&#8217;t access or depend on values stored in global variables or internal static buffers) and that it doesn&#8217;t produce I/O side-effects or similar.</p>
<p>I guess my point was more to add a word of caution &#8212; when using the FFI, make sure that you only use unsafePerformIO in situations when the function truly is pure.  If it&#8217;s not pure, just don&#8217;t use unsafePerformIO &#8212; leave the value encapsulated in the IO monad, and Haskell&#8217;s type system will take care of making sure that everything happens as it should.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-70</link>
		<dc:creator><![CDATA[Greg]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 14:21:36 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-70</guid>
		<description><![CDATA[atp: Thanks for clarifying that.  This program is single threaded, but I might write concurrent programs in the future.  

Can you suggest an alternate approach?  What can I use instead of unsafePerformIO?


augustss: Got it.  Thanks.]]></description>
		<content:encoded><![CDATA[<p>atp: Thanks for clarifying that.  This program is single threaded, but I might write concurrent programs in the future.  </p>
<p>Can you suggest an alternate approach?  What can I use instead of unsafePerformIO?</p>
<p>augustss: Got it.  Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: augustss</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-69</link>
		<dc:creator><![CDATA[augustss]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 13:00:49 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-69</guid>
		<description><![CDATA[The documentation suggest NOINLINE because there&#039;s a theoretical chance the call could get duplicated and the IO would happen twice.  In this case that wouldn&#039;t matter for correctness since strtod really is a pure function; it&#039;s just the mechanics of calling it that is impure.]]></description>
		<content:encoded><![CDATA[<p>The documentation suggest NOINLINE because there&#8217;s a theoretical chance the call could get duplicated and the IO would happen twice.  In this case that wouldn&#8217;t matter for correctness since strtod really is a pure function; it&#8217;s just the mechanics of calling it that is impure.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: atp</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-68</link>
		<dc:creator><![CDATA[atp]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 12:55:10 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-68</guid>
		<description><![CDATA[Regarding the FFI and unsafePerformIO: by default the FFI wraps everything in the IO monad, because foreign functions are not guaranteed to be side-effect free.  Using unsafePerformIO in this context is only a good idea if you know the function you&#039;re calling doesn&#039;t have any side effects.  This doesn&#039;t just mean producing I/O.  In general, any function that is not reentrant is not pure and could conceivably cause problems.  Unfortunately, some C library functions use internal static buffers as a relic of older, simpler times.  It&#039;s important to be aware of this when you use unsafePerformIO.  Although, if your application is not and will never be threaded, perhaps this isn&#039;t an issue.]]></description>
		<content:encoded><![CDATA[<p>Regarding the FFI and unsafePerformIO: by default the FFI wraps everything in the IO monad, because foreign functions are not guaranteed to be side-effect free.  Using unsafePerformIO in this context is only a good idea if you know the function you&#8217;re calling doesn&#8217;t have any side effects.  This doesn&#8217;t just mean producing I/O.  In general, any function that is not reentrant is not pure and could conceivably cause problems.  Unfortunately, some C library functions use internal static buffers as a relic of older, simpler times.  It&#8217;s important to be aware of this when you use unsafePerformIO.  Although, if your application is not and will never be threaded, perhaps this isn&#8217;t an issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-67</link>
		<dc:creator><![CDATA[Greg]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 11:26:11 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-67</guid>
		<description><![CDATA[Don: 

Thanks.  I made those changes, and although they didn&#039;t effect the execution time, I think they&#039;re logical.

Augustss:

I wondered that myself because the &lt;a href=&quot;http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html&quot; rel=&quot;nofollow&quot;&gt;documentation for NOINLINE&lt;/a&gt; says something like, &quot;you probably don&#039;t need this.&quot;  But then the &lt;a href=&quot;http://tinyurl.com/62qsum&quot; rel=&quot;nofollow&quot;&gt;documentation for unsafePerformIO&lt;/a&gt; suggests using that pragma.  ]]></description>
		<content:encoded><![CDATA[<p>Don: </p>
<p>Thanks.  I made those changes, and although they didn&#8217;t effect the execution time, I think they&#8217;re logical.</p>
<p>Augustss:</p>
<p>I wondered that myself because the <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html" rel="nofollow">documentation for NOINLINE</a> says something like, &#8220;you probably don&#8217;t need this.&#8221;  But then the <a href="http://tinyurl.com/62qsum" rel="nofollow">documentation for unsafePerformIO</a> suggests using that pragma.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: augustss</title>
		<link>http://techguyinmidtown.com/2008/07/16/lessons-learned-parsing-csv-in-haskell-and-python/#comment-66</link>
		<dc:creator><![CDATA[augustss]]></dc:creator>
		<pubDate>Thu, 17 Jul 2008 10:18:27 +0000</pubDate>
		<guid isPermaLink="false">http://techguyinmidtown.wordpress.com/?p=44#comment-66</guid>
		<description><![CDATA[I don&#039;t think you need that NOINLINE.]]></description>
		<content:encoded><![CDATA[<p>I don&#8217;t think you need that NOINLINE.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

