Stack Allocated: 2013

Friday, December 27, 2013

Figuring out how Category Theory works

People occasionally ask me what I do. Kind of a weird question really. Like, I'm pretty big on breathing. I try to keep on doing it *all* the time. So statistically speaking the best answer should be something about that. But that's not what they mean when they ask me what I do. They want to know about my job. Again, this seems kind of like a strange question to me, but perhaps it's just a safe question. I imagine most people have something to say about their job, so people have learned that if you have to engage in conversation with a stranger that you might try asking them what they do.

When I answer this question, I tell a story about developing desktop applications that communicate and process data from a wide range of devices, typically medical devices. This provides a satisfactory direction for the conversation to travel in. Or at least it seems to.

But this isn't really representative of what I do. It answers the intended question, but ignores the literal question. What I do, is figure out how things work.

I've spent quite a bit of time figuring out how programming languages work. And I'm not just talking about compilers. I've also been very interested in understanding the meaning behind different programming language constructs. A great example is the various flavors of macros from various flavors of Lisp. It's not entirely obvious what a macro is, and at the same time there is no shortage of talk concerning "macros" when you start to look into them. It takes awhile to sort everything out when your upbringing involves a bunch of plain C with a helping of "don't use macros; they're bad for you."

Anyway, eventually I bumped into Haskell and at this point monads and category theory reared it's … well I wouldn't say "ugly", but definitely mysterious and mostly incomprehensible head. If you've been present in the online, software engineering, y-combinator-ish [1] crowd then by now you've figured out what kind of blog post this is. This is a "monads are like …" article. Don't panic though, I suspect things will be a little bit different here.

Monads are like abstract mathematical objects with some properties

Okay, so the heading here isn't quite accurate. Monads aren't *like* abstract mathematical objects with some properties; monads *are* abstract mathematical objects with some properties.

Here's the thing. I noticed at some point that mathematicians don't seem to be interested in the same things that software engineers are interested in. This of course makes sense because if it wasn't true then we wouldn't need different words for mathematicians and software engineers; they would be the same profession.

Mathematicians seem very interested in taking abstract objects, describing their properties, and then proving different things that you can do with these objects. Software engineers are interested in generating code that solves problems.

With this in mind, consider where Category theory comes from. It starts with abstract algebra[2]. Abstract algebra is more or less the properties you see on number systems, but you break things down to be more simple or you generalize to handle more cases. The result is a bunch of abstract objects that have certain properties. Move forward by proving new statements. Kind of an exploration of things that don't concretely exist. Group theory and ring theory both seem to be rather big. The extended study of two different kinds of abstract objects.

Next check out algebraic topology[3]. I'm not entirely sure how to introduce this field. It's kind of like the algebraic study of shapes, but the shape part doesn't matter so much … at least how muggles normally think of the concept of shapes. Anyway, there's a bunch of fields within algebraic topology. Knot theory, homotopy, low dimensional algebraic topology, etc. There's plenty to study.

This is all the origin of category theory[4]. If you don't really want to investigate category theory any further, then perhaps you don't need to think much about group theory or knot theory. But if you do check out category theory in more depth (like maybe by reading this book [5]), then you're going to find a *lot* of references to these fields (and other's not mentioned here).

Category theory deals with abstracting patterns found in other mathematical fields. You'll see a lot of concepts concerning groups and rings and sets, but ultimately I believe the motivation was to examine different concepts in algebraic topology. And in the end category theory starts examining itself as an object and things get kind of interesting.

Which brings us back nearly to where we started. Mathematicians have their own job and it's not the same job as programmers or software engineers or computer scientists. So what does category theory (and by extension monads) have to do with writing software? Well for the most part it doesn't.

At least until you take a look at type theory.

Types are Categories

It turns out that if you have a type system like Hindly-Milner[6] or perhaps more formally System F [7] (although I think anything on the Lambda Cube works out [8]). The type system fits the definition of a category in Category Theory.

This means that theorems that you can work out in Category theory hold true for certain programming constructs when the proper type system is attached.

So technically you don't need Category theory. There's more than one way to prove that something is true. Most computer programs don't need the full generality that Category theory provides. You really only need to prove that your function works correctly for a relatively small number of possible inputs (at least most of the time). But if you do have a proof in Category theory and your type system qualifies, then why not use the results.

Monads themselves are useful not because they are *like* any particular thing, but because there's a bunch of proofs associated with them. So if you follow all of the axioms associated with a monad, then you get all of the assurances of one for free. And for a pure functional language like Haskell[9], you can use monads to represent effects (like IO or changing state) and your type system doesn't go crazy. More specifically, I think you're able to use results from Kleisli Categories [10] (which you can use to get the corresponding monad) to ensure that functions that go from A to B to C still type check when they instead go from IO A to IO B to IO C. And in this instance IO <whatever> just means a function that has an IO effect embedded inside of it somewhere and it returns something of type <whatever>.

Is that all?

Okay, so you can use Category theory to make sure your Haskell type system works correctly. If that's it, then can we just ignore it? LIke assuming we stay away from Haskell.

Well, for the most part, sure why not. But unfortunately, you have to watch out for two things. The first one is relatively simple. You have to be able to refute people who try to sell you snake oil because it runs on the power of monads and categories and complicated math stuff. The second is a little more complicated.

It turns out people are doing crazy things with category theory. Like, they're getting Haskell to go faster than C [11]. This is kind of a big deal because common sense tells you that high level language means it's easier to program, but slower than low level languages.

I *think* the reason that it works in Haskell is because someone figured out that you can get some surprisingly powerful computations to fit into a sub-turing complete system by modeling your recursive data structures non-recursively using F-Algebras [12]. Then you feed the whole thing to your processor's SSE [13] instructions.

Again, you could probably make this work without the Category theory proofs, but it is a bit suspicious that the guys using Category theory got there first.

The downside is that you can't just assume everyone using Category theory to advertise their product is a crank. The upside is that you don't have to think of monads as shoeboxes anymore.

[1] - https://news.ycombinator.com/

[2] - http://en.wikipedia.org/wiki/Abstract_algebra#Basic_concepts

[3] - http://en.wikipedia.org/wiki/Algebraic_topology

[4] - http://en.wikipedia.org/wiki/Category_theory

[5] - Categories for the Working Mathematician; ISBN 978-1-4419-3123-8

[6] - http://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system

[7] - http://en.wikipedia.org/wiki/System_F

[8] - http://en.wikipedia.org/wiki/Lambda_cube

[9] - http://www.haskell.org/haskellwiki/Haskell

[10] - http://en.wikipedia.org/wiki/Kleisli_category

[11] - http://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf

Exploiting Vector Instructions with Generalized Stream Fusion

[12] - https://www.fpcomplete.com/user/bartosz/understanding-algebras

[13] - http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

Thursday, November 7, 2013

Mathematical notation

One problem that I've had with mathematics is trying to figure out what all of the jargon and notation means. You might be able to brush it off as merely shorthand, but that doesn't quite do it justice. I don't think I'm ready to talk about math notation and vocabulary at length, but I did think of an interesting idea last week that I feel at least highlights why notation is useful.

Exponents

Exponents are straightforward enough; it's just a notation for repeated multiplication.

10³ = 10 ⋅ 10 ⋅ 10 = 1000

The rules you can use to manipulate exponents are also pretty simple.

x^y ⋅ x^z = x^y+z
(x^y)^z = x^{y ⋅ z}

So given an exponent we can continue to manipulate it as a number. It doesn't become some magical box that can no longer interact with the rest of mathematics. You can just treat it like a normal number for the most part.

Big Exponents

Now consider the following number: 10¹⁰⁰

Let's try writing out this number by hand.

So this is pretty easy. Although a bit cumbersome. One followed by one hundred zeros. It's very convenient to use 10¹⁰⁰ instead of the monstrosity above, but I think that I could get along without using the exponent notation if I really had to.

This number actually has a special name: Googol.

Bigger Exponents

Before we go even bigger, let's consider the number of atoms in the universe. Supposedly, there are 10⁸⁰ atoms in the universe. I didn't research this number very carefully, it's just what I saw repeated once or twice with some internet searches. The number itself doesn't matter too much; it's the principle I'm going for. So for now let's just assume that 10⁸⁰ is about right.

I'm going to stack exponents now. It might look scary, but it's pretty simple when you think about it. It's just giving our exponent an exponent.

10^10¹⁰⁰

Like the number before, this number also has a special name: Googolplex.

So this might look similar to the rule given earlier ( (x^y)^z = x^{y ⋅ z} ), but it's not. The rule earlier was just decomposing (or recomposing as the case may be) factors of your exponent. This number is an exponent that is represented as another exponent. So what does something like this look like. Before I was able to write out the actual number, but how do you write out the actual number when your exponent has an exponent? I already wrote out the exponent 10¹⁰⁰ earlier, and this time that's what our exponent is. So I'll start by expanding the exponent part.

Okay. Again it would be kind of pain if we had to write it this way all the time. But not too terrible I guess. So if googol is just a one followed by one hundred zeros (ten to the power of one hundred), then a googolplex would be a one followed by a googol zeros (ten to the power of googol). Let's start writing then.

Or maybe not.

Earlier I mentioned that there's only 10⁸⁰ atoms in the universe. That's significantly less than a googol. And that means that if I wrote one zero on every atom in the universe in order to write out googolplex with all of the zeros that it should have (ie googol zeros), I would need more atoms than are in the universe.

This is one of the powers that mathematical notations give us. Exponents allow us to manipulate numbers which would otherwise be too large to be able to otherwise write down even if we had the entire universe at our disposal.

Monday, October 14, 2013

Programming Language Feature Diagrams

When I first started reading about different programming languages, I tried to understand the features that differentiated them as if they were atomic and absolute in nature. So imagine perhaps magic spells from Dungeons and Dragons or maybe Go Directly To Jail from Monopoly. If you go directly to jail, you're not asking how or what is the mechanism that propels you to jail. You simply move your game piece to the jail location and there you remain.

However at some point this was no longer sufficient for my curiosity. I started to break apart the programming language features themselves, and began to ask myself how these individual features worked.

Ultimately, everything that is encoded in a program is converted into assembly instructions. There are instructions for modifying memory locations, moving the values in memory locations, storing values in memory locations, retrieving values from memory locations, jumping from one place in the list of instructions to another, etc. While it is important to have an idea about what the assembly code looks like, I think for most features you encounter you can imagine things at a slightly more abstract level.

You can sort of imagine functions themselves as boxes that contain assembly instructions. When you get to a function call, you traverse to a new box. Once that function call is done, you return to the box that you came from. So you kind of get a tree structure showing up.

Executing this program would look kind of like a depth first traversal.

I'm not sure it's worth the time to go through every possible feature and try to show a diagram for it (not to mention some things like call/cc, lisp macros, and compile time type checking don't really fit very neatly into this view), but here are a couple of thought provoking examples.

Threads

When a function creates a new thread, you end up with an additional stack in your program. Time tends to progress as you would expect it[1] in each thread, but because the operating system is managing when any given thread is running (not to mention the existence of multicore computers) you don't really know where any given point in one stack relates to a point in another stack. Without the existence of global memory addresses or a shared pointer in memory[2], these two stacks can't communicate with each other. Because of the ubiquitous nature of multicore computers and the importance of distributed computing, new languages are starting to put a lot of thought in providing not only synchronizing and locking primitives (which allow correct communication via shared state) but also primitives that completely abstract the entire thread creation and communication process altogether.

[1] - Actually, both the compiler and the CPU reorder operations. So time doesn't quite progress as you would expect, but it should progress in a manner which is mathematically identical to the way that you expect it to. Things get tricky when threads are thrown into the mix, so you get to enter the dark world of atomic operations and memory fences.

[2] - Or if you're absolutely crazy, I suppose you could try to guess the location of the other stacks in memory and start dereferencing pointers. I imagine there's probably *someone* who makes good usage of this technique, but I doubt that it's something that would lead to code that is at all comprehensible.

Global State

This one is a lot more straight forward than the threads. At any point in time a function can set or get a value from global memory. This state can be accessed again by the function itself or by another function in the future. This can lead to programs which are difficult to understand especially as the number of connections made between functions via global memory increases[1]. But properly managed (like by the compiler) or used sparingly, this technique can make a lot of programs much easier to write.

[1] - The complexity that can arise from mutable global state is actually one of the motivators behind the recent interest in pure functional programming languages. In a system like Lambda Calculus, all functions work locally or they don't work at all. I've found this to be a very powerful way to carefully analyze the problem at hand to determine exactly how it functions. You gain a lot more freedom with how you are able to build your program when you have separated every aspect of the problem.

Callbacks

I'm not sure that there is much to say about this one. A callback is relatively simple. You take a function (which may call other functions) and pass it as a parameter to other functions. These functions can call the callback and the effect is very much like inserting a new function diagram at the call point of the callback.

This can get more complicated when you combine callbacks, objects, and threads. Diagraming out your program's structure in these cases can help you understand bizarre behavior that you might encounter. Although, as I said concerning threads, languages are beginning to add features that help the programmer manage these situations.

Exceptions

I'm not really a fan of exceptions. I do have to admit that they can make the structure of your types a lot cleaner in certain situations[1]. However, exceptions really are just dynamic upward stack communication channels; unfortunately most people do not seem to think of them that way.

When a function returns normally, you end up at the location of the function call in the calling function and continue normal code execution. But when a function throws an exception, you end up at the location of the function call in the calling function and then look for and execute an exception handler (typically called catching an exception) before continuing normal code execution. If no exception handler is located or if there does not exist the proper function handler, then the exception goes to that functions calling function. This continues until either the exception is handled or the program runs out of calling functions.

For simple programs like the example above, it is easy to determine that all exceptions will be handled appropriately. However, with more dynamic features like objects, interfaces, callbacks, threads, etc, it can be very difficult to determine if all exceptions are being correctly handled. This can lead to the program crashing unexpectedly.

My theory is that if you think of an exception as a method of dynamically communicating with the functions above you (dynamic because you don't have to know which function called you in order to try to communicate with it), then you are more likely to have all possible calling functions properly equipped with exception handlers. On the other hand, if you consider exceptions something that should be used in "exceptional" situations or something to be used when problems show up, then I suspect that you're more likely to forget to handle all of the error cases in the appropriate places.

[1] - For example, division fails when you try to divide any number by zero. Instead of giving division a weird type that indicates that some failure may occur (or proving that it is never passed a zero) maybe it's better to just let an exception be thrown.

Combination

Of course in most programs of any complexity, you're not going to only see any single feature used at a time. You're likely to see many of them used at the same time. This can lead to complicated programs, which are very difficult to understand and maintain. Perhaps this illustration helps explains some buggy programs you've used in the past.

I mentioned that the complexity of global state (and more generally mutable state) is something that has caused interest in functional programming. When you eliminate all of the features that I have outlined here, you gain the knowledge that the entire program can be drawn as a relatively simple tree like the one I showed at the very beginning of this blog post.

Personally, I don't think I'm ever going to be the same now that I have learned about lambda calculus, type theory, and the different mathematical properties of functions (reflexive et al, associative et al, injective et al, etc). These are some very powerful tools for understanding problems, and they have allowed me to drill a lot deeper than I ever would have been able to on my own.

But, while they do allow you to create some very comprehensible programs, it also takes a lot of work to restructure the combined diagram above into a functionally equivalent simple tree structure. So while it would be nice to always do everything in an incredibly disciplined manner, the constraints of real life do have to be considered as well. And I'm not just talking about time and expertise. Even if you had sufficient time and everyone was an expert with the more mathematical tools, the functional route still requires quite a bit of mental effort. Some problems are just very difficult to simplify, and perhaps it is better to throw an exception and move on with your life.

Wednesday, August 21, 2013

Communication of the Paradox Parsers

I Know, Right ?

A difficult aspect of communication is that if you really understand a thing, then it is trivial to you. So why can't all those other people get with the program. And on the other hand. If you don't really understand a thing, then maybe you're the one causing the difficulty by expressing ideas which in reality do not make any sense.

I think that I have a pretty good sense of when I don't fully understand something. Although really, that statement needs about a billion caveats. For starters, "I know what I don't know", or perhaps more accurately, "I know when I don't know", is really way too close of a cousin to, "I'll tell *you* about peanut butter!". You really have to be careful when you start down the path of, "I know that I'm right!". Because as soon as you encounter a scenario when you're not right, it can easily lead to a disaster. Reality doesn't care that you think you're right.

So to be more exact. I really like to understand how things work. And I really like to trace my understanding as deeply as I can manage. And it bothers me on some sort of bizarre semi emotional level when I don't understand "sufficiently" why a given thing is operating in the way that it is.

This bizarre discomfort that I feel about my lack of understanding is a great tool for understanding when I'm missing something potentially important. So as long as I'm listening to my own internal discomfort and lack of understanding alarm, I have a pretty good sense for when I don't understand what's going on. At least I suspect that's the case … when you don't know what you don't know it's hard to know when you don't really know something.

If you want out then you'll have to develop those communication skills

So, I've got a skill that helps me to identify things that I don't fully understand. This comes in handy with communication because I'm always on the lookout for situations where things just aren't making any sense. So, I'm aware when there's a person who isn't following along in a conversation. When I'm more of an observer to the conversation, I find it really easy to decide who it is that's missing the big picture. I can then offer clarifying statements to the clueless or ask stupid questions to get those in the know to offer additional insight. Of course doing this can jeopardize my observer status, so it does sometimes take a bit of finesse.

If I'm highly active in a conversation and it seems like sense has decided to go on a lunch break, then I have to start running experiments. Either I'm the person who doesn't understand or one or more of the other active participants are the ones who don't understand. Bonus points if there's more than one type of misunderstanding going around.

But I didn't really want to talk about communication. Even though it is a very interesting problem to solve. I was interested in talking about a couple of things that don't make any sense.

Parsing!

I've had a terrible time with parsing algorithms. I find the basic idea of converting a stream of symbols into a data structure absolutely fascinating. But often the algorithms for achieving this are hard for me to really get my head wrapped around.

I think my blog entires about Cognitive Complexity Classes (CCC)[1] and Problem Shapes[2] actually shed some light here.

First of all, let's talk about the stuff that I don't understand. Parsing is more complex than just converting a stream of symbols into a data structure. Different streams of symbols have different implicit structures. The main ones that people talk about are the different types of grammars that languages fall into, and this is really talking about formal languages (ie programming languages) as opposed to natural languages (ie French). Context sensitive, context free, and regular grammars. Context sensitive grammars are some of the hardest to parse and they require some very powerful algorithms. Similarly, context free are easier and regular grammars are even easier; both can be parsed with increasingly simple algorithms. And this is only scratching the surface. There's LL and LR grammars. That is the set of language grammars which can be parsed by LL algorithms and/or LR algorithms. Then there's LL(1) grammars. And the rise of even more sophisticated algorithms give rise to even more complex sets of language grammars. And to make things even more complicated, these algorithms aren't just interested in the power of the grammar they can parse. They are also interested in parsing efficiency in both space and/or time.

So when you see a string of characters, there's actually a lot of extra hidden structure that you don't really understand present. Or at the very least *I'm* not happy with my level of understanding. I get the basic gist of Context Sensitive vs Context Free, but I don't get it sufficiently to look at some example strings and have a good intuition for which type of grammar the language falls into. And things get more complicated. If simple things like context baffle my intuition, then it's probably going to be a while before I figure out how to spot GLR grammars.

The whole CCC and Problem Shapes thing becomes useful in describing why this is all a complicated problem.

I'm not entirely sure what shape parsing happens to be. Something tree or graph shaped with backtracking. What shape is constraint propagation and/or unification? Anyway, so let's assume that parsing (all parsing) has some sort of known shape. Then add to the mix different power levels of grammars (regular, context free, etc). Then add to the mix efficient time and space constraints. So we begin with the already complex shape of parsing. Then we add a bunch of very complex shapes to get different characteristics that make parsing practical. And finally, we squash the whole thing into a table and implement it in a for loop.

So of course parsing is hard to *really* understand.

Although, I didn't really want to talk about parsing. Well … not computer language parsing. I actually wanted to talk about math. And more specifically, why is it so hard to understand mathematical statements.

… math blah blah blah or how paradoxically paradox aversion produces paradoxes

I was watching a bunch of group theory videos[3] yesterday. And I noticed that some of the picture examples and intuition behind the presentation was pretty simple and straight forward. But the precise definitions were hard to wade through even with the intuition.

This got me thinking about parsing. How I can't quite seem to feel comfortable with grammars and algorithms even though I've put a lot of work into it. The answer of course is that I still don't have a good feel for the "true shape" of parsing. Let alone all of the extra complexities that go into making it practical.

But what I had missed until now about mathematics is that the English statements aren't really English. And I'm not talking about jargon. To be sure there is a bunch of math jargon that takes a while to understand, but I don't think the hard part is the jargon. The hard part is that, just like with the constrained parsing grammars and algorithms, mathematical statements are a constrained form of English. With the parsing we constrain the grammars in order to be able to write algorithms that halt, fit into memory, and run in a reasonable amount of time. With mathematics there is a different goal in mind.

Charles H. Bennett wrote a pretty interesting paper titled, "On Random and Hard-to-Describe Numbers"[4]. It contains two essays, and the first essay discusses the paradox, "the first number not namable in under ten words". This is of course nine words, which means that the first number not namable in under ten words (1,101,121) is in fact nameable with nine words (ie the above quotation).

Similarly, consider Russell's paradox[5] (also mentioned in Bennett's paper).

The point is that a full natural language contains facilities that allow you to make statements which do not actually make any sense. Even mathematical languages like naive set theory, allow for statements which don't make sense.

Which is why I think mathematical statements are so hard to make sense of. It's because a great deal of work was done in order to make sure that the statement made sense.

[1] - http://stackallocated.blogspot.com/2013/08/cognitive-complexity-classes.html

[2] - http://stackallocated.blogspot.com/2013/08/cognitive-complexity-classes-shapes.html

[3] - https://www.youtube.com/playlist?list=PLWbnIo7XnOkw7zZu6u3si3at21r534qIM

[4] - [pdf] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.3492&rep=rep1&type=pdf

[5] - http://en.wikipedia.org/wiki/Russell's_paradox

Thursday, August 15, 2013

Cognitive Complexity Classes: Shapes

In order to expound upon the different types of Cognitive Complexity Classes (CCC), further attention must be applied to the types of shapes which algorithms may be composed of. This is unfortunate because my current concept of a "shape" of an algorithm is overly vague. So, to be able to focus on what makes any given CCC different from any other given CCC (or for that matter what makes any two CCCs the same … or even what types of sameness are needed to support this system), I first need to flesh out the idea of algorithmic shape.

Shapes on the cave wall

So my initial description of an algorithm shape invoked the imagery of the input and output of a function. While that might have been a good way to break the ice, that description is hardly ideal. After all, if you have a function that takes a graph and then returns the exact same graph, it would be intellectually dishonest to describe it as having anything other than a scalar shape. Similarly, if you have a TreeMap function that maps all the elements in a tree using a given function and then returns the same tree with different elements, that seems a lot like a List shape. You are doing operations on trees, but the nature of the operation is very linear in nature. Given a function that traverses an entire tree and a function that traverses an entire list, the Map function is nearly identical between trees and lists (or for that matter scalars and graphs).

Comparison sorts are a pretty good class of algorithms to consider. If you create a Merge sort algorithm that recursively breaks apart the list you want to sort and then glues it back together in the desired order, then you will end up with a tree shape across time. On the other hand if you create a Heap data structure and then insert and remove all of the elements in your list, then you will end up with a tree shape across space. This leads me to believe that comparison sort algorithms have a tree shape to them.

There's two interesting concepts that emerge from algorithmic shape that I want to discuss. The first one is that different programming languages have different "dimensions", which are determined by the features available in them. So I assert that a programming language that has recursion has an extra dimension than a programming language that does not have recursion. If you want to implement a comparison sort algorithm in a language without recursion or you want to implement it without recursion, then you have to "flatten" out the comparison sort algorithm. With recursion the algorithm can take its natural shape; without recursion you have to figure out how to represent the algorithm's "true form" in a flatter context.

Now I'm kind of eluding to something that I'm sure is going to end up getting me in a lot of trouble. Best case is that I escape by being painted overly naive, ambitious, and/or idealistic. So, flattening representations isn't that big of a deal. We do this all the time. We flatten 3D geometric shapes so that they can be represented on a 2D piece of paper. We represent 2D matrices as an array of 1D arrays. I'm sure that "flattening" an algorithm's representation is something that can be formalized. Now on the other hand, "true form" is clearly getting way to close to philosophic digressions or even worse poetic ruminations; I really don't see how I can hope to get away with that one without something concrete or at the very least wildly convincing.

So, Yuri Gurevich has a very interesting series[1] on Microsoft's channel 9 research website. I'm sure I haven't devoted enough time to fully digest everything he said during that series, but a couple of things did stick that I think spell trouble for the direction that I'm hoping to go in. Turing argued very convincingly for Turing Complete functions. But he didn't argue at all concerning Turing Complete algorithms. So while essentially every programming language is turing complete means that we can compute any function in every programming language, it does not mean that we can implement every algorithm. And of course this makes sense if you think about it. In the C programming language you could write an algorithm that writes a bunch of data to a file, calls a Unix command line utility to process that file, and finally reads the result from the Unix command to complete the algorithm. Implementing this algorithm in lambda calculus is impossible. Lambda calculus doesn't even recognize that files exist let alone have facilities to call out to Unix command line utilities.

Professor Gurevich, if I recall correctly, argues that the class of digital algorithms cannot possibly contain the class of all algorithms. Analog algorithms for example make use of reals, which can't be contained in natural numbers let alone digital memory. Not to mention any all encompassing theory of algorithms would have to contain all future human invocation (who foresaw quantum computers back when the Greeks were first coming up with GCD).

I'm fairly sure that I'm some flavor of constructivist[2], which I believe Professor Gurevich wasn't too keen on in his video series. And I suspect that this might explain my idealistic "true form" sentiments. After all if the only things that *really* exist are things that can be constructed, why all you have to ensure is that the Comparison Sort Problem Constructor is only called one time and it is going to be pretty hard to argue that the result is anything other than the "true form" of comparison sorts. Of course in that last sentence I just spilled the beans as to why the road ahead of me is likely going to be a tough one, but let's get back to that in a little bit.

First though, let's get to that second interesting concept that I mentioned quite a bit ago. Let's go ahead and assume that my algorithmic shape idea has all the merit in the world. And let's assume that programming language features really do correspond to different shapes. In that case, if statements and simple pattern matching seems to correspond pretty closely to scalar complexity. Loops, map, fold, unfold, et al seem to fit nicely with list complexity. Recursion and algebraic data types are pretty clearly tree complex. Not quite sure what would correspond with graph complexity. Some sort of state machine maybe? Well at the very least an actual graph would be graph complex. Of course we've got an interesting progression here.

Lists are a generalization of scalars. You got one thing, now you got lots of things. Trees are generalizations of lists. Graphs are generalizations of trees. And hypergraphs[3] are generalizations of graphs. Now in my previous essay I suggested that maybe the most complex thing we have to worry about is graph complexity because you can implement lambda calculus (a turing complete language) with a graph reduction machine. But on the other hand, let's go with the flow for a minute. What kind of programming language features might show up to make graph complexity as easy to reason about as a Map operation with parametric polymorphism. What about hypergraph. There might be some very powerful abstraction techniques out there, and these complexity shapes might provide some insight to developing them.

Constructivism and the overly poetic imagination

So mathematical constructivism is probably my type of thing. Turing completeness kind of gets in the way of a constructive view of things. My thinking goes kind of like this: If the only things that exist are things that can be constructed, then equivalence is just a hop, skip, and a jump away. Just trace things back to where they were constructed and if it's the same constructor, then you're in business. Unfortunately, with turing completeness we can't even know in general if an arbitrary function halts [4,5]. Forget about showing equivalence between two of these indeterminately halting beasts.

With two turing complete functions, we can't determine if they're the same thing. But who's talking about functions? I wasn't. I was talking about algorithms. Which are an even weaker concept than functions. At least we have a definition for turing computable function. Algorithms are pretty vague and tomorrow a new class of them might show up out of nowhere.

But then again. Who's talking about algorithms? I wasn't. Not really. In my constructivism heart I was talking about some sort of "true form" not for algorithms, but for problems. Can't have a true form without equivalence. At least if you want to stay outside the realm of poetry. I mean how would you show that something was in it's true form if you're not equating it to *the* true form. And in turing complete land you can't even have function equivalence. Let alone algorithmic equivalence. And turing complete algorithms aren't even a thing. How do you have a true form for something that isn't even definable. Now I'm talking about Problems. Like not even implementations of algorithms for something. But some sort of "higher plane of existence" where the Problem itself is a concrete thing. Pretty ridiculous.

Although … on the other hand … turing completeness kind of sucks. I mean don't get me wrong. All things being equal, it's a pretty nice safety blanket. After all you wouldn't want to be stuck up the river without a turing complete paddle to implement that function, which turns out can only be implemented in a turing machine. However, a function that can be generically shown to be total is a function that is a lot easier to understand. If you had a choice between a function that you knew was mathematically shown to always return a value no matter what value you pass to it and a turing computable function that had a "quite impressive" 90% test coverage, you're going to choose Mr. Turing-Complete-Is-For-Chumps. Granted I might be overestimating the typical programers fear of ⊥, however I like to think the informed decision is to ditch turing completeness when you don't actually need it.

So perhaps my Problem as a concrete object isn't as far fetched as it sounds. You do have to get rid of turing completeness. Or at the very least have a subset of your workspace reserved for provably halting functions. Hmm. Homotopy type theory[6] supposedly says some interesting things about equivalence. I guess it's time to hit the books.

[1] - http://channel9.msdn.com/Series/C9-Lectures-Yuri-Gurevich-Introduction-to-Algorithms-and-Computational-Complexity/C9-Lectures-Algorithms-with-Yuri-Gurevich-Introduction-and-Some-History

[2] - http://en.wikipedia.org/wiki/Constructivism_(mathematics)

[3] - http://en.wikipedia.org/wiki/Hypergraph

[4] - http://en.wikipedia.org/wiki/Halting_problem

[5] - http://stackallocated.blogspot.com/2013/02/halting-problem.html

[6] - http://homotopytypetheory.org/

Wednesday, August 14, 2013

Cognitive Complexity Classes

In a Galaxy far far away

LANGUAGE WARS

When I started my professional career as a software engineer (six years ago yesterday actually), I really only knew a couple of languages. I received my Bachelor's of Computer Science at Purdue University. While the breadth of topics offered at Purdue is certainly impressive, the programming languages covered certainly leaves some room for improvement. I had an introductory course in Java and then everything else was pretty much C or C++. Granted there were some alternative possibilities that cropped up from time to time.

For whatever reason, beginning a career in software engineering really got me interested in programming languages. I did some research on my own previously, but actually programming for a living filled me with a sense of professional pride. I ought to understand the tools that I'm using to create products. Of course I've always been interested in understanding how things worked. Maybe it was only natural that this would overlap with my work.

I began trying to figure out the features of programming languages. So dynamic typing, eval, meta objects, object oriented, lisp macros, pattern matching, automatic theorem proving, the works. Type theory took awhile to figure out. After all there's a bunch of math involved, and it's not really the kind that I had prior exposure to.

Through all of this a couple of weird issues kept making an appearance. Turing completeness and the power of more expressive languages. Any turing complete language is able to compute all of the functions that any other turing complete language is able to compute. Nearly all programming languages are turing complete (and none of them are "super" turing complete), so the question naturally arises … "why do we need different programming languages?" Out come the vague, poetic answers. You can work better or faster in such and such a language. Or this language has features that prevents certain mistakes. Programers who use language X tend to be better because they are more passionate about programming. And there are many such ideas like this. Some tied to conduct studies by giving students without prior exposure to programming different languages to complete a specific task. But it's not really clear how this applies to experienced practitioners. After all, shouldn't knowing both languages in a given study be better than just one. And how do abstract, language agnostic, problem solving skills factor into any results?

It's Complexity, but not as we know it

Kind of a staple of a computer science education is algorithm complexity theory. Big O notation and its ilk. I'm probably being overly cynical, but algorithm complexity theory might be the only real difference between an educated computer scientist and a self taught programmer. (And I know I'm being overly cynical when I say most computer scientists forget all about Big O after the final, but sometimes it's hard not to think it.) Honestly though, the theory and concepts from Big O and its friends don't really show up all that often in practice. But. When you are able to leverage these disciplines your peers will look at you as if you're a dark wizard who wields powers beyond human comprehension. Seriously, all I did was replace a list with an array. Just because it changed the application from being broken and without hope of repair to working better than expected doesn't mean I did anything *that* impressive. This power can be yours as well.

The basics are actually pretty simple. Given some number of inputs to an algorithm, how fast does the memory requirements and time to compute grow with one more input. So if an algorithm's time complexity is linear, then the time to complete a computation grows proportionally with the number of the inputs. Or in other words:

input - time

1 - 1

2 - 2

…

100 - 100

If an algorithm's time complexity is "N Squared", then we're looking at something like:

input - time

1 - 1

2 - 4

...

10 - 100

...

100 - 10000

And obviously there's a lot more that goes into all of this when you start to get more formal. But I think you see the point. For small input's it doesn't really matter what the complexity of your algorithm is. But once you hit large input, it really starts to matter. Maybe 10,000 time units doesn't sound like a big deal, but consider someplace like Google, Facebook, or the IRS. An input of 300 million can suddenly turn into 9E16. Or perhaps I should rewrite that as the more impressive, 90000000000000000. That's kind of a problem.

However, while algorithmic complexity is clearly very important and has an important place in the design of software, it's not the type of complexity that I really want to talk about. I've noticed a different type of complexity. One which I feel answers questions about a lot of personal experiences that I've encountered when trying to implement certain types of algorithms, but which more importantly answers the question of "what's the real difference between programming languages" without resorting to a bunch of subjective statements about productivity.

Instead of (or more likely "in addition to") classifying algorithms by their time and memory complexity, I want to classify algorithms by how hard they are to comprehend. On its face my goal is violating my desire to avoid subjective statements. After all who gets to decide the difficulty of comprehending an algorithm. We don't even have a good definition of what comprehension itself *is* let alone how to prove that someone actually has a comprehension of a given subject. Actually, if I could define and prove comprehension, I wouldn't even bother with my "new kind of complexity". I would just set up shop as a recruiter who could mathematically guarantee hires who actually know what they're doing. And I would retire soon after very wealthy indeed.

So simply by the elusive nature of comprehension, I think my idea of complexity will be a bit loose around the joints. However, while a completely formal definition may or may not be forthcoming or even possible, I believe that I can argue convincingly that there are different cognitive complexity classes for algorithms. With perhaps some fuzzy overlap between the classes, to make room for different types of human cognitive talent. After all there may be some people who really are better at solving a problem "the hard way" due to varying circumstances. Anything from how their brain is wired to the way they were raised to solve problems.

It is very complex. You are likely to be eaten by a grue.

Right now I've got four complexity classes in mind. This almost definitely needs to be expanded and modified, but I think it's a good place to start. The way that I'm classifying these classes is by describing the "shape" of the algorithms.

The simplest class is the Scalar class. Followed by the List class. Then the Tree class. And finally the most complex is the Graph class.

Like I said, I don't think these categories are absolute, final, or complete. I think it's just the starting point.

An algorithm that has a Scalar complexity can be easily imagined as a function that takes a scalar as input and output. So logical-and is a good starting place. Take two booleans and return a third boolean. The structure of the inputs is atomic. There's not that much going on here. It's also easy to see where things get fuzzy. A function that takes an array and then reallocates the memory of that array in order to expand the total size is probably Scalar complex. But a function that takes an array and then maps a function over each value in that array, creating a new array with new values, should probably be in the List complexity class.

The Map (or Select depending on your preferred terminology) function that I just described in my previous paragraph may seem like a Scalar complex function. After all you take a completely opaque function, apply it to every element in a mostly opaque array, and finally return a newly constructed and mostly opaque array. The structure of the list doesn't really matter *that* much. So part of this is the unfortunately fuzzy nature of comprehension. The Map function seems pretty easy to someone who's had a lot of experience with functional languages. However, there are two things to consider. The first is that if you look at the "shape" (admittedly a vaguely defined term) of the algorithm, you'll see that it's pretty easy to convince yourself that it has a "Listy" shape … or at the very least it's more listy than a logical-and function. The second is that incorporating Map into your code in a functional language is very easy to imagine. It yields clean and composable constructs. However, attaining Map like functionality in an imperative language like C is a very different story. A major component of my motivation for this cognitive complexity theory is that it should help explain what the difference between programming languages actually are. List complexity isn't a big deal in functional programming languages with higher order functions, but in C like languages you need to be more careful with it. I'll talk more about language differences later.

So, I've already sort of covered List complexity. But there is some more that needs to be dealt with. Functionality like Map, Fold, and Unfold all seem like they belong to a class of functions that have a list like "shape". However, things unfortunately get fuzzy again when you consider lists of lists of scalars. In some situations you may consider a list of lists to be a tree structure and in other situations the lists could be considered to be atomic in nature. It really depends on the type of algorithm and how comfortable you are with these types of operations. Although, the more operations you need to perform on the inner lists the easier it is going to be to argue the shape of the algorithm as a tree.

Tree complexity. The most obvious tree complex algorithm that I can think of is the Permutation function. A function which returns all of the permutations of a list is very naturally described in a recursive manner. Tracing out the function calls shows what is very obviously a tree. O (n log( n ) ) sort methods also have a very tree like shape to them. After all several of these types of sort methods specifically exploit the nature of trees in order to get their time complexity. Again you can start to make arguments based on different non-recursive implementations of these algorithms, but I noticed something interesting recently concerning Levenshtein distance.

The definition of Levenshtein distance[1] is about as recursive as you can get. Each node of the call graph makes multiple calls to itself. The resulting "structure" over time is a massive tree. However, implementing the Levenshtein distance formula this way is overly wasteful. The way that the function is normally implemented is by constructing a table and then finding the minimal path through this table (actually, until recently this was the only algorithm I knew about for Levenshtein distance). And there's even an optimization which only requires two rows of the table. Now, these alternative implementations don't necessarily seem tree shaped. However, I would like to argue that starting with a recursive definition and then transforming it into a non-recursive definition is akin to converting a recurrence into a summation. So my hope is that it can be argued that the recursive form of algorithms like Levenshtein distance is the "more true" form for purposes of classifying cognitive complexity. At the very least I think it won't be too hard to argue that the non-recursive form of Levenshtein distance is more *complex* than your typical listy algorithm.

Finally, there is Graph complexity. I haven't gotten much opportunity to play around with algorithms that seem squarely in the Graph class, so I'm not really going to say too much here. Parsing certain types of grammars seem like they need some sort of graphical shape or structure in order to function correctly. Additionally, Simon Peyton Jones published an interesting book back in 1987[2]. In it he describes a G machine (graphical reduction machine) that can be used to implement the lambda calculus base for a functional programming language. My naive thought is that if graphically shaped algorithms are strong enough to implement the turing complete lambda calculus, then maybe we don't need anything more complex. Perhaps there is a maximum amount of cognitive complexity we have to put up with before we can show that there has to be a simpler way. I have a feeling that this idea will either lead the way for incredibly useful algorithm simplification tools or turn out to be hopelessly optimistic.

Fight a complexity G specimen without damage control, you're crazy.

A major goal I have here is to show that different programming language are different in more than subjective anecdotes. I think that I have at least highlighted one possible route for a different type of complexity. A complexity that doesn't concern itself with time or space, but instead with comprehensibility. And I have already mentioned briefly how we might use such a definition of complexity to differentiate programming languages.

Basic List complex algorithms don't seem like a very big deal when you have access to functions like Map and Fold. Things begin to become more problematic when all you have access to are for and while loops.

Additionally, Map and Fold can be safely used in a dynamically typed language. However, once you begin to embed lists inside of lists, parametric polymorphism found in languages like Haskell, Ocaml, and C# really start to pay off.

Finally, just because nearly all of the languages available to use today are turing complete, doesn't mean that this is beneficial when the complexity class of an algorithm begins to increase. In a language like C where the main algorithm solution mechanisms of choice are loops and if statements, when Tree or Graph complex problems are encountered the solution will be some sort of state machine constructed in a while loop filled with continue statements. We've seen with the Levenshtein distance algorithm that such solutions can be of high quality. However, my fear is that many engineers will not even be aware of the existence of highly complex problems or their need to be carefully transformed in such a way that they can be correctly implemented. Worse yet, if the only tool that these engineers have a mastery of are loops and if statements, the end result maybe an ad-hoc and fatally incomplete state machine, even in more typical languages like Java and C#. Suddenly, turing complete shifts meaning from incredibly useful property to enough rope to hang yourself.

I believe that the goal of my efforts should be to identify and elaborate on different cognitive complexity classes. Then I can identify which programming language constructs most lend themselves for solving these classes of problems. This will provide guidance for future programming language advancements as well as educational efforts for the best way to use those advancements. Developing this idea into a formal system would be icing on the cake, but I feel at the very least that convincing arguments can be made for the utility of cognitive complexity classes.

[1] - http://en.wikipedia.org/wiki/Levenshtein_distance

[2] - http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/