Monday, November 19, 2012

Assembly

Check out the x86-64 ABI (or I guess analogous documents, assuming they exist, for other architectures) for all sorts of pertinent information on assembly coding on x86-64 chips.  Additionally, using gcc -S blah.c will generate the assembly for your program instead of an executable.  I found this very insightful.


Here's some tips.

There are two major portions to an assembly file.  The assembly directives and the assembly opcodes.  The directives tell the assembler different things that need to be done and the opcodes are what is physically run by the CPU.  So directives will include things like .long to emit integer values into your executable and .data to indicate that the values you are emitting need to be in the data section of the executable.  The opcodes of course are more of what you expect when you think about assembly.  Move, add, call, etc.  What exactly do you want the CPU doing.


By the way, the assembler directives contain a lot of very complicated features.  It actually includes a bunch of interesting options that seem like they would be very useful if you were actually programming directly in it.  Honestly, I didn't expect to see an almost fully fledged language present when I started looking through what was available.


Beware of the .align directive.  As far as I can tell the behavior of this directive is very dependent on the set of circumstances that it is used in.  Not only is it system dependent, but it is also dependent on what type of executable you are generating.  There are two different options:  .balign and .p2align.  These directives each represent the possible behaviors that you will see with .align except they always have the same behavior.  However, supposedly they are not supported on every assembler.


For calling conventions check out the ABI document.  Reverse engineering this by hand isn't too hard (just gcc -S a function with a bunch of parameters), but the ABI made things nice and explicit.  Also, notice that variable parameter functions need to have the number of floating point parameters specified by placing that number into the %rax register.  


You might notice that some assembly code uses labels like this:

leaq    wocky(%rip), %rdi

And some assembly code looks like this:

leaq    wocky, %rdi

I think this is basically equivalent as far as what's actually going to happen when you run the code.  The difference is that one of them is position independent code and the other is position dependent code.  


Dereferencing an address held in a register is done like this:

(%rbp)

If you need an offset from that address you can do something like:

-4(%rbp)


If you need to load an address to a location, use the leaq assembly instruction (load effective address).

leaq    wocky(%rip), %rdi

This is how you could implement something like the & operator (address, not bitwise and) in the C programming language.


Calling or jumping to a function pointer is done by prefixing the value with a *.  Kind of like the following:

call    *%rax
jmp    *%rax

Or if the address is somewhere on your stack:

call    *-8(%rbp)


Normally your functions will look something like this:
    
    .text
.globl wocky
wocky:
    pushq    %rbp
    movq     %rsp, %rbp
    subq       $N, %rsp
    ;;; code
    leave
    ret

"subq    $N, %rsp", is making room on the stack for the local variables inside of the wocky function.  Obviously, you don't need to do this if you don't have any local variables, but it is also unnecessary if you are optimizing your assembly to have tail calls (and you are in a function that can be optimized to be a tail call ... and actually *is* being optimized to be a tail call) or if your function doesn't call any other functions (no reason to save stack space if no one else is going to be stomping on it).



As for me, I'm probably going to play around with assembly and perhaps implement a toy language or two.  However, I'm not sure that I'm really in the mood to try to play catch up with the likes of Mike Pall (check out luajit ... it's kind of frightening what he's doing with it) or the pypy team or the JVM team.  A technology like LLVM actually sounds really attractive to me because I can get low level implementation details (they support tail calls!) on multiple platforms and the whole thing is supported by people smarter and more dedicated than me.  So I get can better, faster, more correct, without having to spend a *lot* of time focusing on hardware details.

Either way though, I feel that poking around in assembly is still useful even in our modern era of managed languages and ultra IDEs.  I think understanding how the universe in which we work functions helps us to avoid certain mistakes and fully utilise the capabilities available to us.

Thursday, November 15, 2012

Haskell Types and Type Classes (4): Misc

Here's a few extra extensions to consider.

The following are legal:


class Jabber a where
    wocky :: a -> a

instance Jabber Int where
    wocky = undefined

instance Jabber [a] where
    wocky = undefined

This one isn't:

instance Jabber [Int] where
    wocky = undefined

Unless you are using the following:

{-# LANGUAGE FlexibleInstances #-}

A similar extension is

{-# LANGUAGE TypeSynonymInstances #-}

which let's you do something like:

type Ball = [Int]

instance Jabber Ball where
    wocky = undefined

This is kind of important because String is just [Char]; you will need it if you want to specialize on String without making some sort of silly proxy type.

{-# LANGUAGE TypeSynonymInstances #-}

instance Jabber String where
    wocky = undefined

Okay, remember this guy?

data SEither a b = SLeft a | SRight b

class Jabber a where
    wocky :: a b -> a b

instance Jabber (SEither e) where
    wocky = undefined

Well, you're going to need FlexibleInstances in order to do something like the following:

instance Jabber (SEither Int) where
    wocky = undefined

and also for the more exotic form:

data SMaybe a = SJust a | SNothing

instance Jabber (SEither (SMaybe a)) where
    wocky = undefined

Finally, last time I mentioned functional dependencies.  There's an alternative to them called Type Families.  I'm not going to go into very much detail about them because I haven't fully wrapped my head around them yet.  However, if you looked at functional dependencies and thought, "yeah that's the functionality that I want, but I really don't agree with the philosophy", then check out type families.  They seem similar to generalized algebraic data types (GADTs) at least philosophically, so if you like GADTs they might be the thing for you.

Wednesday, November 7, 2012

Haskell Types and Type Classes (3): Multiple Parameters and Dependencies


{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}

Again this stuff isn't completely based off of the Haskell spec.  It's some stuff that was thrown into GHC and other Haskell implementations.

Multiple parameter type classes *look* similar to higher kinded type classes, but they are actually quite different.

class Jabber a where    -- a has the kind * -> *
    wocky :: a b -> a c

class Jabber a b where   -- a and b both have the kind *
    wocky :: a -> b

They kind of look the same ... they both have a bunch of letters that show up all over the place.  But once you have a good feeling for either parametric polymorphism and kinds or Haskell's type class system, you'll notice that they are doing different things.

And of course you can go crazy with the kinds of the variables in a multi param type class.

class Jabber a b where  -- a and b have kind * -> *
    wocky :: a c -> b c 

class Jabber a b where -- a has kind * -> * and b has kind *
    wocky :: a b -> a c

class Jabber a b c where  -- three type variables all with kind *
    wocky :: a -> b -> c

This is basically the power of function overloading that you see in languages like C++, C#, etc.  But of course there's still a lot of weird thing that you can't quite do, even though they make sense, *AND* then there's a bunch of compiler flags that allow you to do those things that made sense but you couldn't do.  I'm not even sure I really want to go into all of it.  I've browsed through the options before ... and some of them seem pretty straight forward, but on the other hand I can't make sense out of the others.  I suspect that if you just blaze forward with reckless disregard, you'll be able to pick up what you need to know as you find out you really need to know it.  As for me, I suspect I'll be back to figure it out once I run out of other things to figure out ... but for now I'm not sure I need to know the difference between overlapping, incoherent, and undecidable instances (although right after saying something like that is normally when you find out how important that stuff you rationalized as irrelevant actually turns out to be).

One additional compiler extension that I found myself spending the time to understand was functional dependencies.  They look kind of like this:

class Jabber a b | a -> b where
    wocky :: a -> b

So what is that horrible "|" all about and why would anyone care about it?  Well, once you can have multiple type parameters in a type class you get some funny possible outcomes.  Namely something like this can happen:

class Jabber a b where
    wocky :: a -> b


data Ack = Ack
    deriving Show
data Ball = Ball 
    deriving Show

instance Jabber Ack Ball where
    wocky a = Ball 

instance Jabber Ack Ack where
    wocky a = Ack 

It doesn't look so bad.  And in fact it can compile.  But if you run some code like:

wocky Ack

You will get a runtime error.  Which is kind of weird to think about Haskell doing, but it is a possibility in some cases.  The problem is that Haskell is unable to decide which instance of Jabber to use.  Notice that there is actually return type function overloading going on here (something that you do not see in languages like C++ or C# ... at least not yet).  Which means that if the compiler can tell what the result of "wocky Ack" is being stored in it would be able to select the correct function to call.

(wocky Ack) :: Ball

Ah, that's better.  No more runtime error.  However, perhaps you want to be able to prevent this kind of thing from happening at compile time.  After all, maybe the two instances are created in completely different files and a completely innocent accident is about cause your deployed *Haskell* code to produce a runtime error!!!

Well that's functional dependencies.  The following code refuses to compile.

class Jabber a b | a -> b where
    wocky :: a -> b

instance Jabber Ack Ball where
    wocky = undefined

instance Jabber Ack Ack where
    wocky = undefined


So the functional dependencies is basically a constraint on what instances are allowed to be defined for a given type class.  More or less it's preventing conflicting or redundant instances.

By the way, this shows up in some of the monads out there.  That doesn't mean you need to learn about it now, but keep it in mind for when you're looking at a monad definition and you see a bunch of "|"s popping up in disturbing places.