Monday, October 21, 2013

I'm moving

Hi,

Due to some irritations with Blogger, I'm moving to bitbucket + Hakyll. The new link is

jozefg.bitbucket.org

Thursday, August 8, 2013

Lisp Ran^H^H Hacking

I've recently been writing a lot of Lisp stuff to deal with AST outputs from some of my hobby compilers and since it's lisp, there's some inevitable amounts of pain. So much in fact that I was planning on writing a lisp-rant blog post.

2 things made me change my mind

  1. I read the comments to Steve Yegge's Lisp is not an Acceptable Lisp and decided that if there's one community to not piss off, it's the Common Lisp community
  2. I don't think I'm qualified to rant about Lisp after 1 year of using it. Well at least in a meaningful, not "CL sucks for beginners" way.
To clarify 2, it's not that I don't know CL; I've read the hyperspec and written a lot of Lisp code in the last year. But if someone like me posted a rant on Haskell, which I know much more about, I'd be annoyed. I haven't done anything sufficiently large or maintained it sufficiently long that I think I really know what it's like to use Lisp in the wild.

This can be addressed though, to give myself a bit of a better understanding of Lisp, I've decided to take the tools I've been using to analyze AST dumps from the CML compiler and turn them into 2 things. First a static analyzer for CML and second a reasonable VM for CML.

I have hacked up versions of these 2 things scattered throughout about 10 lisp programs, but nothing that I could actually publish as a nice project. These 2 tools are very much needed since it's frankly my currently written tools are inadequate. Nothing's better motivation to finish a project then the propect of having to read a 1mb directory full of bytecode or typing derivations if you don't.

Most importantly, this will give me some much needed experience with Lisp in the somewhat larger. I wouldn't say in the large, these won't be 10k LOC projects, but it's a good way to solve a few non-trivial problems in Lisp. More importantly, it's code that I actually have to maintain since there are a lot of things I want to do with CML that will need these 2 tools.

Hopefully this is at least some kind of example for people angry at Language X; Yes it sucks sometimes and yes it's crufty but how well do you really know it? Have you contributed to the compiler for X? Maintained 10k+ lines of code in X? Written a commonly used library for X? Do you really know it well enough to declare that it sucks, completely and totally?

Maybe you do and X really does suck, but you probably don't. Give it a shot like I'm trying and who knows, you might learn that you like it.

Here goes nothing

Tuesday, July 30, 2013

The Monomorphism Restriction

In Haskell, there's one quirk of the type inferencing algorithm that seems to bite people: The Monomorphism Restriction.

It stems from the fact that Haskell thinks of something like

 foo = bar baz quux 

As a "simple pattern binding". However, if we eta convert and add

 foo eta = bar baz quux eta 

Then Haskell calls it a "function binding". Now this is important because when Haskell goes to generalize type variables it asks the following set of questions:

  1. Is the type variable constrained? (Part of a typeclass)
  2. Is it a simple pattern binding?
  3. Does the pattern binding have no type signature?
And if all of those answers are yes, Haskell doesn't generalize the variable to be polymorphic and instead treats it as a monomorphic type. This means that it will unify with the first concrete type it can and then refuse to unify with anything else.

This is what the monomorphism restriction is. There are very good reasons for it, but they have been detailed nicely elsewhere. I just wanted to clarify what the standard specifies since it's poorly explained everywhere else I've found.

As a simple example, consider the below code


-- /* A pattern binding
-- Monomorphic since we provide no type signature */
foo = show

-- /* A pattern binding
-- Polymorphic with because we have a type signature */
bar :: (Show a) => a -> String
bar = show

-- /* A function binding
-- Polymorphic because it is not a pattern */
baz x = show x

main =
    mapM_ putStrLn
      [foo 'a', -- // Now `foo`s monomorphic type unifies with
                -- // Char -> String
                -- // giving it anything else now is an error
       foo  1,  -- // ERROR
       bar 'a',
       bar  1, -- // Both fine, bar is polymorphic
       baz 'a',
       baz  1, -- // Also fine, baz is polymorphic
      ]

Hopefully this helps to clarify some of the monomorphism restriction for you.

Saturday, July 27, 2013

On Compiler Organization

I've spent the last two days restructuring my compiler for Cute ML. The new structure in my opinion, is much more pleasant.

Saturday, July 20, 2013

On Starting

Learning to program is hard. It's an uphill battle for anyone and as soon as a programmer achieves any level of competency, they start to form opinions on how others should learn to program. Everyone has opinions from what literature, language, editor, compiler, and hairstyle is appropriate for learning to program. After chatting with a few people today on stackoverflow, I thought that I'd write down my own set of ill formed and half baked thoughts on the matter.

First things first when you set about learning to program, know what you want out of it. If you are doing this because you want to write excel macros then great! Go learn about VB and be productive, more power to you. Otherwise, read on


1. It's All About High Level Concepts


If you're getting into this because you want to make a career than you should approach this differently. Your goal isn't to learn the appropriate keys to punch to make the screen light up. Rather you're aiming to learn the concepts that drive software that you'll write.

This is inherently less productive at first. If you're trying to really understand why things are the way that they are, it's far more laborious than just accepting it blindly. However it comes with the benefit that these concepts are timeless. 

Computer Science has been around for 60+ years. In that time, we've gone from C being high level to very low level, computers have vastly increased in speed, we've gone from vacuum tubes to multicore machines. However in that time algorithmic complexity hasn't suddenly stopped mattering, data structures haven't vanished. Sure you may be writing your memoizing algorithm in Haskell instead of assembler, but you still have to know how to memoize!

The next intelligent question is, what concepts are important?

Abstraction is King


All these buzzwords people through around, Object Orientation, Lazy Evaluation, Declarative Programming, Parametric Polymorphism, it's all there for one reason. To think as little as possible about how things are done, instead focusing on what things are done. For example
  • Object Orientation - Hide implementation details
  • Declarative Programming - Don't talk about implementation details
  • Lazy Evaluation - Don't talk about how and when things are evaluated
Notice the theme?

I think this is the single most important concept for new programmers to learn. Good code is all about finding the right level of abstraction for the problem.

Algorithms are Queen


In my opinion, if you're writing code that doesn't deal at all with algorithms, you're not even programming. Doing so is like saying "I'm writing without thinking about plot".

Algorithms are the substance of your programs. Learning them comes in 2 stages.
  1. Algorithmic thinking
  2. Algorithms
Algorithmic thinking is about learning how to make algorithms. Stuff like dynamic programming, and divide and conquer techniques.  These techniques provide a nice foundation for how to approach problems and understand other algorithms. I've found that learning them is best done through example which brings me to 2.

Learning specific algorithms is a tricky business, on one hand you want to learn how to apply algorithmic thinking and techniques, but you don't want to memorize a compendium of algorithms. So don't go and memorize 20 sorting algorithms, but I think a few important ones are
  1. Some greedy algorithms (Dijkstra's is my favorite)
  2. Some divide and conquer (Merge sort is easy to grok)
  3. Some dynamic programming (Knapsack problem)
  4. Some graph/tree search algorithms (BFS & DFS)
And many many more. People write lots and lots of algorithms books, I'd encourage any new programmer to read one as a second book. Personally I favor Skiena's Algorithm Design Manual

Data Structures are Bedrock


Data Structures are up there with algorithms. I was unsure of which should come first actually. They're bedrock in the sense that your program is fundamentally based on them. If you can't represent data well, then how can you expect to operate on it?

Learning data structures is similar to algorithms, you start with some basic ideas and learn a few examples of how to apply them.

In my opinion the "basic idea"s of data structure are ADTs, Abstract Data Types. These are just a set of operations that are defined upon however you represent your data. The most common ones are
  1. Lists
  2. Sets
  3. Dictionaries (associative lists)
  4. Graphs
  5. Trees
So you learn what set of operations define a list and what a graph can do, and then you learn a few implementations like
  1. Linked lists
  2. Hash tables
  3. Balanced binary trees
  4. Association tables
And so on. Much the same as algorithms, the best way to do this is with a good book. Once again I'll recommend Skiena's.

2.  Don't Fixate on Details

I admit it, when I first picked up programming I focused way to much on unimportant stuff. I thought it was cool, "Oh did you know that <insert absurdly detailed fact> was implemented in C++98 by <insert obscure compiler> and later adopted by the standard for C++03 but dropped for C++11!!". Looking back I realize that this was a dumb thing to do.

Languages are Tools


This hurt a little to say. It's completely true, but it's still hard to accept.

In the end, the fact that you're using Python instead of Ruby couldn't matter less. That doesn't stop people (including me) from endlessly debating which is better (Haskell!1!!) but really it's like craftsman discussing power tools. Sure people have preferences, but when your starting it matters very little. You'll have plenty of time to become opinionated later :)

In fact, you should plan on learning a new language after a year. And another one a year later on. And another and another. 

I personally maintain that you should be learning a new language once a year. This will make you a better programmer and cause you to focus more on the Big Issues. Plus, if you're developing software in the Real World, you'll often have to deal with new languages and technologies.

There's just no point in fussing over what language you learn.

The important thing is to not get stuck in trivia-style details. This brings me to my next point

Complicated is Bad


I learned C++ first. C++ is complicated. I had a miserable first 2 months just getting off the ground at all. I just said that all languages are equal, well that was a bit of a fib. All simple languages are equal.

When suggesting a language for a beginner, my philosophy is: 

If a language is making you focus on the language, not abstraction, algorithms, or data structures, it's a bad language for a beginner.

Does this mean C++ is a terrible language? Not necessarily. But it means that it's a bad language to learn with. Starting with a simple, high level language is a good way to let you focus immediately on the important concepts.

After you've gotten the ball rolling, then you can take a look at more complex languages like C++. Your prior knowledge will make learning them much more manageable.

And yes I've very deliberately avoiding mentioning which high level language to start with.

Environments Are Less Than Languages


Often when I get asked how to start programming, 2 questions are asked
  1. What Language?
  2. What IDE/Compiler/OS?
I just said that the language doesn't really matter. Well the environment matters even less.

Your IDE is a place where you type. Your compiler makes 1s and 0s for you. Your OS allocates memory and lets you play videos of kittens.

Unless you've got 10 million lines of code to crawl through, you'll be just fine with any reasonable environment. Personally I don't even use an IDE, I'm just using Emacs. I fluctuate between something like 4 C compilers depending on my mood. I know people who have gone through 30 years of professional software development without using an IDE. They're just not important.

Pick one that makes you feel comfortable, than get back to focusing on what matters. You can always switch later.

3. Dip Your Toes in Many Areas

Aside from the basics, you shouldn't stagnate in one area. As I've said a few times, any one area of computer science/programming will quickly become outdated and die off.

Just think, until you get a decent overall feel for computer science, you'll have no idea what's useful and interesting. Perhaps you'll discover a hidden talent for compiler or OS design.

Now this is the part that takes a long time, think in the order of years. The good news is that you can (and should) be writing software at this point.

Learn Different Abstractions


This is the big one, learning different abstractions is critical to being a good programmer. When all you know how to do is hammer nails, you'll have a heck of time working with screws. Learning new abstractions will help you deal with the diverse set of problems you'll encounter.

Now there are 2 main ways to do this
  1. Learn new paradigms
  2. Learn different levels of abstraction
For 1. this means learning about functional programming, object oriented programming. Fiddle with different paradigms for different things. Learn what logic programming is good for and when imperative programming is ill-suited.

For 2. this means trying low and high level languages. I still advise starting with a high level language, but don't skimp on low level stuff. Learning C is pretty mind bending coming from Scheme, learning assembler is pretty mind bending coming from Haskell. Learning abstractions at different levels means that you'll develop an intuition for how some of these abstractions work. This in turn will deepen your understanding of how to use this abstraction.

Learn Different Areas of CS


Learning about different areas of CS will help in 2 ways
  1. Give you new ways to think about problems
  2. Let you discover your interests
Some of the most profound advancements come from mixing 2 divergent ideas. Fermat's last theorem was finally proven with a theorem about elliptic curves! CS is no different. By learning different areas you'll be able to apply a whole different set of tools to your problems. For example, Machine Learning offers a whole different set of approaches that my lab applies to a classic Human-Computer interaction problem. This leads to all sorts of interesting results!

Hopefully 2. is obvious; you just have no idea what'll fall in love with until you look. I had no idea I'd like compilers so much until I tried reading a book on them.

This last sentence brings me to another key point

Read Lots


This is really the best way to learn new things. Online resources are hit and miss. But books have a much higher signal to noise ratio. Personally I'm always reading a book on computer science (currently Semantics of Programming Languages).

If you're interested in the list of books I think are good, there's a link at the top of my blog. As for what book to start with, I always recommend Structure and Interpretation of Computer Programs. This one comes with the added benefit of being free online! You'd be surprised how many books are for computer science.

Note the difference between "Computer Science" and "Programming" books. One teaches a concept/theory, one teaches a tool. Both are important so make sure you read a mix of each. SICP is kind of an exception to this as it teaches both Scheme and basic CS. It's the primary reason I recommend it.

4. Apply Your Knowledge


This is the most obvious and arguably the most important. You have to practice to become a great programmer or computer scientist.

Pick a project that interests you, and work on it in tandem with your readings. Some projects I've done and enjoyed are
  1. Write a series of prime number calculators
  2. Write a calculator (Parser or GUI)
  3. Write a Chat Program
  4. Write a Chat Website
  5. Write a Scheme Interpreter
  6. Write a Scheme Interpreter in Your Scheme Interpreter (Repeat as often as you like)
  7. Implement a logic programming DSL
  8. Write a Garbage Collector
  9. Write an SML compiler (Current Project)
  10. Write an FTP server
Do whatever makes you happy!  Scratch an itch and write something you want. Make your project free and open source so people can contribute. You can even contribute to others projects to see how they program.

 Just practice and have fun


Good Luck!


Thursday, June 27, 2013

Proof General Tidbit

I had trouble finding documentation on this issue so I'm going to blog it in case anyone else needs it. Hopefully it'll help someone.

Working with multiple files in Coq is pretty common for any large project. Equally common is to have a couple of ML files mixed in with the Coq ones.

If you are using Proof General, turn the 'Compile Before Require' option OFF. It has a nasty habit of trying to compile ML modules with coqc.

Monday, June 24, 2013

Unix in Haskell and Ruby: 2

This is the ruby version of cat.

As I discovered, cat in ruby is so completely trivial it barely needs an explanation


require 'trollop'
opts = Trollop::options {opt :numberLines, "Number lines"}
ARGF.each_line do |line|
  if opts[:numberLines]
    print "#{ARGF.lineno}  "
  end
  print line
end

This is standard unix cat. Tada! We're using trollop for command line parsing then just walking through ARGF. ARGF for perl hackers is the equivalent of <>. Reads files from command line and all that.

If we want the modified version that I liked more


opts = Trollop::options {opt :numberLines, "Number lines"}
ARGF.each_line do |line|
  if opts[:numberLines]
    print "#{ARGF.lineno}  "
    ARGF.lineno = 0 if ARGF.file.eof?
  end
  print line
end


Ruby is a scripting language and shows it here. The dash of perl sprinkled in there with stuff like ARGF really makes more convenient quick scripting. Even for newcomers like me :)