Wednesday, July 4, 2012

Haskell ADT


ADT stands for Algebraic Data Type.  Name's not that important here's what it is.

data Device = USB | Bluetooth | Serial

A couple of things to pay attention to. 

"data" is Haskell's keyword for creating a new ADT.  

"Device" is the name of the new ADT.  This also shows up in type signatures, but we can tackle that later.  Also note that "Device" is capitalized.  This is a Haskell thing.  Haskell wants type names capitalized.  This convention is enforced to remove some ambiguity from Haskell's syntax, which results in programs that are less verbose.

"USB", "Bluetooth", and "Serial" are all type constructors.  These things are basically functions that are specifically created in order to bring new objects into existence.  Normally, functions are supposed to be lower case, but type constructors need to be capitalized.  Same reasoning as with the type names. 

The "|" is a bit of syntax from BNFs.  It basically means choice or alternative.  The English grammar equivalent is basically a comma.

Take all of the above together and it means that we are creating a new ADT that has a type name of "Device" and has three possible members (ie USB, Bluetooth, and Serial).

ADTs aren't really that useful without pattern matching, so let's take a quick look at that.

In Haskell pattern matching refers to function dispatch based off of the characteristics of the input parameters.  Typically types of the inputs, but also values of the inputs.  So ...

funcName 4 = 5
funcName 5 = 4
funcName x = x

So, this function returns 4 when it is called with the value 5.  5 when called with the value 4.  And finally it returns whatever value it is given when it is not 4 or 5.  You can using pattern matching on arbitrary many parameters of your function.

funcName x _ = x + 1

The "_" means, "I don't care what this is and I'm not planning on using it".  You can use underscore for multiple parameters (reusing it isn't a problem).  

There's also different ways to indicate that you want to pattern match for your function.

funcName 4 = 5
| 5 = 4
| x = x

So it's a return of the "|" from the ADT.  It means basically the same sort of thing.  This or that or that.

Okay, let's do something with the Device ADT.

funcName USB = ...
funcName Bluetooth = ...
funcName Serial = ...


This is very similar to a switch statement and enum from the C programming language.

However, ADTs can be used for a lot more than as enums.


We can also create ADTs that take parameters.

data Circle = Circle Double

Notice that the type name and the type constructor are the same.  Haskell allows this.

We can now do things like:

Circle 1.0

And then we can write functions like:

area (Circle r) = pi * r ^ 2

Notice that we're using pattern matching again, but this time we are capturing the value that is used to construct a Circle.  Underscore can also be used in this type of situation.

funcName (Circle _) = print "It's a circle"

We can also use the "type" keyword to make things a bit more readable.

type Radius = Double
data Circle = Circle Radius

It's important to keep in mind that the "type" keyword doesn't perform any type checking.  It's just telling the type checker that whenever the user uses the name "Radius" they really mean "Double".  The idea is that this makes it easier on people to understand what's going on.

Finally, let's also use the "newtype" keyword.

type Radius = Double
newtype Circle = Circle Radius

I'm actually not super clear on what the deal is with newtype.  Most documentation doesn't really talk about it.  However, it appears that newtype is only usable when you have an ADT which only has a single type constructor (so for example you would not be able to use newtype with the Device ADT that we were looking at earlier).  newtype type checks parameters just like data (so it's not an alias like type is).  The benefit over the data keyword is that newtype does not affect runtime at all.  Which sort of makes sense.  With data, the runtime environment has to keep type information around so that it knows how to choose the correct choice when someone tries to pattern match a variable.  There's not an explicit if statement floating around, but one has to exist under the covers.  On the other hand, with newtype, the type checker will make sure that the user is using variables correctly, but since there's only a single choice there's no reason to carry around any type data.

You might want to do something like this:

type Radius = Double
type Side = Double

newtype Circle = Circle Radius
newtype Square = Square Side

area (Circle r) = pi * r ^ 2
area (Square s) = s ^ 2

However, this will not work in Haskell.  The problem is that the area function requires a well formed type.  Unfortunately, the type system that Haskell uses does not support merging types like this.  However, there is a way to handle this sort of thing using type classes (which I don't want to talk about yet) or existential types or a combination of both type classes and existential types.  

Okay, so we can make ADTs that take parameters which means we can do something like this:

data IntList = Cons Int IntList | Nil

Lots of new stuff here.  The "Cons" type constructor is taking two parameters and the second parameter is of type IntList, which means this thing is recursive.

Cons 5 (Cons 4 Nil)   yields a list like (5, 4, Nil)

However, the current example means that we will need a new list for each type of thing that we want to put into a list.  Right now we can only handle Ints.  And we'll also need different functions for things like length and append, which only depend on the fact that the structure is a list and not on the internal type, because we saw earlier that Haskell only allows us to have well typed functions.

Well...

There's a logic called System F that was invented by Jean-Yves Girard.  The ideas from this logic were eventually adopted into programming languages as generics.  We are able to use System F with ADT in Haskell.

We can now write our list like:

data List a = Cons a (List a) | Nil

And we can write functions that are typed "List a" and they will work on any type of list that we create.  This is sometimes called type polymorphism.

Everything in our polymorphic List looks the same except the "a".  This is a type parameter.  Normally in books you will see these represented as greek letters, but it's almost always easier to type as actual letters.

You can have more than one type parameter.

data Either a b = Right a | Left b

One more thing.  Haskell also has support for something that is very similar to structs from C.

data Frame a = Frame { header :: [Int]
, body :: [Int]
, crc :: [Int]
, misc :: a }

"header", "body", "crc", and "misc" are all basically fields.  Really there's no difference from this ADT and one that looks like:

data Frame a = Frame [Int] [Int] [Int] a

Except the first one has named parameters.  The first one provides more readability.

All of these things can be combined.  And of course you can create one ADT and use it in other ADTs.  While some people say that type checking is all about having the compiler make sure you do not make mistakes, I like to think that types are about describing your problem (the type checking is just a bonus).  With all the ADT features here, there's quite a few problems that you'll be able to describe.  Add in the power of pattern matching and you will be able to expertly craft solutions.

No comments:

Post a Comment