What it was like writing my first "real" Haskell program

Sometimes, doing the hard thing pays off.

I spent much of my spare time in 2018 studying Haskell. There are a number of good resources, but it was the hardest self-teaching effort I've made.

Did I learn anything? Was it worth it?

In order to find out, I spent the first weeks of 2019 writing two variations of a "real" program, one in Python and one in Haskell. The program is a web scraper that collects artist and track information from the Kahvi Collective. No, it does not download all their music. That would be legal but really rude. (You should check out their collection though; it's great listening when you need to stay creative and focused.) I chose to write a scraper because I've written a few dozen in Python for clients, and because they cover several important areas such as data storage, network IO and parsing potentially messy input.

Python

For the Python version, I used Scrapy to manage scraping. Scrapy is a well established framework. It includes classes, functions and documented best practices to extract and transform scraped data and a custom object type to represent scraped data. It supports plugins to persist data, crawl the pages, avoid duplicating page requests, and oh yes, a whole lot more. I've used Scrapy on a number of projects, and it has proven to be effective and flexible.

The biggest source of bugs I've come across with Scrapy is parsing HTML. It is common that you'll review a site and anticipate that the data will be presented with specific tags. You may collect several sample pages and build tests around them, only to find out during a live run that you missed an edge case, or that the site itself is inconsistent.

For example, if you look at the first release on Kahvi, you'll see that the artist name has a link to an artist page. The eighth has no such link. That changes how you need to access the artist's name, and what information can be retrieved about a given artist. Mistakes here can lead to exceptions being thrown in production, or missing data when Scrapy successfully plows through.

Overall, the fact that I cited bad input as the biggest source of trouble with Scrapy should tell you that it's a solid framework that does the job well.

Haskell

For Haskell I used the Scalpel library to manage scraping. Scalpel is a library rather than a framework. Scalpel provides functions to retrieve HTML pages and extract tags and text. Anything beyond that (storage, parsing, transformations) are left to the programmer.

This was my first time writing any program in Haskell. Getting it to work, at least this first time, was a challenge.

The biggest show-stopper happened when I got stuck trying to work with Haskell's strict type system. I had a hard time correctly combining pieces of data scraped from the HTML while simultaneously addressing the possibility of data being unavailable.

If you think on that a moment, you may realize that this is the exact same problem I had with the Python scraper! The difference wast that Haskell's type system simply would not allow me to compile until I fully addressed how to handle the Nothing result. Combining possible Nothings was tough for me to get right. I eventually e-mailed Scalpel's author to ask for a hand. He was able to help me work through it. Thanks!

The first time it compiled, it worked.

One thing I've noticed with Haskell is that even though the type system has a steep learning curve, it's extremely consistant. The solution I was taught for dealing with Nothing values did not rely on the Scalpel tool at all, and could readily by applied to almost any scenario where one needs to build up a piece of data from multiple inputs in Haskell.

Summary

Python's framework was a more complete, batteries included solution. The learning curve to start applying Python felt much less steep than it did with Haskell.

Haskell's libraries were more than adequate to get the job done, and less "opinionated" about how you could put the parts together. Haskell was also much less prone to unexpected run-time errors. It would be possible to compile a Haskell program that looked for an HTML comment and got Nothing, but it would tough to trick Haskell into compiling a program that didn't have a specific, intentional way to handle the Nothing case.

Was it worth it?

Pragmatically... maybe?

It seems to me that the practice of thinking about types and separating pure functionality from side effects could be helpful and transferable while working it other languages. It's also very clear Haskell has some real advantages in terms of preventing runtime bugs. Both of those ought to be worth something, but I don't know how hard it would be to land a Haskell job or if the benefits justify a pay bump. I might not suggest learning Haskell if your biggest concerns are getting started quickly or making money, unless you know it's necessary for a particular industry. I hope to complete more Haskell projects in the future and may have more to say about it pragmatically in the future.

Does fun count?

If so, then Haskell was absolutely worth it. Before I learned Haskell I'd seen the term composability tossed around, but didn't have a practical sense of it. In practice, it means being able to string together numerous operations on a single piece of data. Here's an easy snippet I found fun:

strip . takeWhile ('/' /=) . drop 1 . dropWhile (':' /=)

That's four functions strung together that would take "#413: Jim Black / Elysian Underground," and return "Jim Black." Not counting my English punctuation...! Haskell uses techniques this all over the place, not just for strings and not just for doing one thing after another. Because the language is so consistent and compostable, overcoming the challenges and getting to the fun parts is a real joy.