Stack Allocated

Monday, October 14, 2013

Programming Language Feature Diagrams

When I first started reading about different programming languages, I tried to understand the features that differentiated them as if they were atomic and absolute in nature. So imagine perhaps magic spells from Dungeons and Dragons or maybe Go Directly To Jail from Monopoly. If you go directly to jail, you're not asking how or what is the mechanism that propels you to jail. You simply move your game piece to the jail location and there you remain.

However at some point this was no longer sufficient for my curiosity. I started to break apart the programming language features themselves, and began to ask myself how these individual features worked.

Ultimately, everything that is encoded in a program is converted into assembly instructions. There are instructions for modifying memory locations, moving the values in memory locations, storing values in memory locations, retrieving values from memory locations, jumping from one place in the list of instructions to another, etc. While it is important to have an idea about what the assembly code looks like, I think for most features you encounter you can imagine things at a slightly more abstract level.

You can sort of imagine functions themselves as boxes that contain assembly instructions. When you get to a function call, you traverse to a new box. Once that function call is done, you return to the box that you came from. So you kind of get a tree structure showing up.

Executing this program would look kind of like a depth first traversal.

I'm not sure it's worth the time to go through every possible feature and try to show a diagram for it (not to mention some things like call/cc, lisp macros, and compile time type checking don't really fit very neatly into this view), but here are a couple of thought provoking examples.

Threads

When a function creates a new thread, you end up with an additional stack in your program. Time tends to progress as you would expect it[1] in each thread, but because the operating system is managing when any given thread is running (not to mention the existence of multicore computers) you don't really know where any given point in one stack relates to a point in another stack. Without the existence of global memory addresses or a shared pointer in memory[2], these two stacks can't communicate with each other. Because of the ubiquitous nature of multicore computers and the importance of distributed computing, new languages are starting to put a lot of thought in providing not only synchronizing and locking primitives (which allow correct communication via shared state) but also primitives that completely abstract the entire thread creation and communication process altogether.

[1] - Actually, both the compiler and the CPU reorder operations. So time doesn't quite progress as you would expect, but it should progress in a manner which is mathematically identical to the way that you expect it to. Things get tricky when threads are thrown into the mix, so you get to enter the dark world of atomic operations and memory fences.

[2] - Or if you're absolutely crazy, I suppose you could try to guess the location of the other stacks in memory and start dereferencing pointers. I imagine there's probably *someone* who makes good usage of this technique, but I doubt that it's something that would lead to code that is at all comprehensible.

Global State

This one is a lot more straight forward than the threads. At any point in time a function can set or get a value from global memory. This state can be accessed again by the function itself or by another function in the future. This can lead to programs which are difficult to understand especially as the number of connections made between functions via global memory increases[1]. But properly managed (like by the compiler) or used sparingly, this technique can make a lot of programs much easier to write.

[1] - The complexity that can arise from mutable global state is actually one of the motivators behind the recent interest in pure functional programming languages. In a system like Lambda Calculus, all functions work locally or they don't work at all. I've found this to be a very powerful way to carefully analyze the problem at hand to determine exactly how it functions. You gain a lot more freedom with how you are able to build your program when you have separated every aspect of the problem.

Callbacks

I'm not sure that there is much to say about this one. A callback is relatively simple. You take a function (which may call other functions) and pass it as a parameter to other functions. These functions can call the callback and the effect is very much like inserting a new function diagram at the call point of the callback.

This can get more complicated when you combine callbacks, objects, and threads. Diagraming out your program's structure in these cases can help you understand bizarre behavior that you might encounter. Although, as I said concerning threads, languages are beginning to add features that help the programmer manage these situations.

Exceptions

I'm not really a fan of exceptions. I do have to admit that they can make the structure of your types a lot cleaner in certain situations[1]. However, exceptions really are just dynamic upward stack communication channels; unfortunately most people do not seem to think of them that way.

When a function returns normally, you end up at the location of the function call in the calling function and continue normal code execution. But when a function throws an exception, you end up at the location of the function call in the calling function and then look for and execute an exception handler (typically called catching an exception) before continuing normal code execution. If no exception handler is located or if there does not exist the proper function handler, then the exception goes to that functions calling function. This continues until either the exception is handled or the program runs out of calling functions.

For simple programs like the example above, it is easy to determine that all exceptions will be handled appropriately. However, with more dynamic features like objects, interfaces, callbacks, threads, etc, it can be very difficult to determine if all exceptions are being correctly handled. This can lead to the program crashing unexpectedly.

My theory is that if you think of an exception as a method of dynamically communicating with the functions above you (dynamic because you don't have to know which function called you in order to try to communicate with it), then you are more likely to have all possible calling functions properly equipped with exception handlers. On the other hand, if you consider exceptions something that should be used in "exceptional" situations or something to be used when problems show up, then I suspect that you're more likely to forget to handle all of the error cases in the appropriate places.

[1] - For example, division fails when you try to divide any number by zero. Instead of giving division a weird type that indicates that some failure may occur (or proving that it is never passed a zero) maybe it's better to just let an exception be thrown.

Combination

Of course in most programs of any complexity, you're not going to only see any single feature used at a time. You're likely to see many of them used at the same time. This can lead to complicated programs, which are very difficult to understand and maintain. Perhaps this illustration helps explains some buggy programs you've used in the past.

I mentioned that the complexity of global state (and more generally mutable state) is something that has caused interest in functional programming. When you eliminate all of the features that I have outlined here, you gain the knowledge that the entire program can be drawn as a relatively simple tree like the one I showed at the very beginning of this blog post.

Personally, I don't think I'm ever going to be the same now that I have learned about lambda calculus, type theory, and the different mathematical properties of functions (reflexive et al, associative et al, injective et al, etc). These are some very powerful tools for understanding problems, and they have allowed me to drill a lot deeper than I ever would have been able to on my own.

But, while they do allow you to create some very comprehensible programs, it also takes a lot of work to restructure the combined diagram above into a functionally equivalent simple tree structure. So while it would be nice to always do everything in an incredibly disciplined manner, the constraints of real life do have to be considered as well. And I'm not just talking about time and expertise. Even if you had sufficient time and everyone was an expert with the more mathematical tools, the functional route still requires quite a bit of mental effort. Some problems are just very difficult to simplify, and perhaps it is better to throw an exception and move on with your life.

Wednesday, August 21, 2013

Communication of the Paradox Parsers

I Know, Right ?

A difficult aspect of communication is that if you really understand a thing, then it is trivial to you. So why can't all those other people get with the program. And on the other hand. If you don't really understand a thing, then maybe you're the one causing the difficulty by expressing ideas which in reality do not make any sense.

I think that I have a pretty good sense of when I don't fully understand something. Although really, that statement needs about a billion caveats. For starters, "I know what I don't know", or perhaps more accurately, "I know when I don't know", is really way too close of a cousin to, "I'll tell *you* about peanut butter!". You really have to be careful when you start down the path of, "I know that I'm right!". Because as soon as you encounter a scenario when you're not right, it can easily lead to a disaster. Reality doesn't care that you think you're right.

So to be more exact. I really like to understand how things work. And I really like to trace my understanding as deeply as I can manage. And it bothers me on some sort of bizarre semi emotional level when I don't understand "sufficiently" why a given thing is operating in the way that it is.

This bizarre discomfort that I feel about my lack of understanding is a great tool for understanding when I'm missing something potentially important. So as long as I'm listening to my own internal discomfort and lack of understanding alarm, I have a pretty good sense for when I don't understand what's going on. At least I suspect that's the case … when you don't know what you don't know it's hard to know when you don't really know something.

If you want out then you'll have to develop those communication skills

So, I've got a skill that helps me to identify things that I don't fully understand. This comes in handy with communication because I'm always on the lookout for situations where things just aren't making any sense. So, I'm aware when there's a person who isn't following along in a conversation. When I'm more of an observer to the conversation, I find it really easy to decide who it is that's missing the big picture. I can then offer clarifying statements to the clueless or ask stupid questions to get those in the know to offer additional insight. Of course doing this can jeopardize my observer status, so it does sometimes take a bit of finesse.

If I'm highly active in a conversation and it seems like sense has decided to go on a lunch break, then I have to start running experiments. Either I'm the person who doesn't understand or one or more of the other active participants are the ones who don't understand. Bonus points if there's more than one type of misunderstanding going around.

But I didn't really want to talk about communication. Even though it is a very interesting problem to solve. I was interested in talking about a couple of things that don't make any sense.

Parsing!

I've had a terrible time with parsing algorithms. I find the basic idea of converting a stream of symbols into a data structure absolutely fascinating. But often the algorithms for achieving this are hard for me to really get my head wrapped around.

I think my blog entires about Cognitive Complexity Classes (CCC)[1] and Problem Shapes[2] actually shed some light here.

First of all, let's talk about the stuff that I don't understand. Parsing is more complex than just converting a stream of symbols into a data structure. Different streams of symbols have different implicit structures. The main ones that people talk about are the different types of grammars that languages fall into, and this is really talking about formal languages (ie programming languages) as opposed to natural languages (ie French). Context sensitive, context free, and regular grammars. Context sensitive grammars are some of the hardest to parse and they require some very powerful algorithms. Similarly, context free are easier and regular grammars are even easier; both can be parsed with increasingly simple algorithms. And this is only scratching the surface. There's LL and LR grammars. That is the set of language grammars which can be parsed by LL algorithms and/or LR algorithms. Then there's LL(1) grammars. And the rise of even more sophisticated algorithms give rise to even more complex sets of language grammars. And to make things even more complicated, these algorithms aren't just interested in the power of the grammar they can parse. They are also interested in parsing efficiency in both space and/or time.

So when you see a string of characters, there's actually a lot of extra hidden structure that you don't really understand present. Or at the very least *I'm* not happy with my level of understanding. I get the basic gist of Context Sensitive vs Context Free, but I don't get it sufficiently to look at some example strings and have a good intuition for which type of grammar the language falls into. And things get more complicated. If simple things like context baffle my intuition, then it's probably going to be a while before I figure out how to spot GLR grammars.

The whole CCC and Problem Shapes thing becomes useful in describing why this is all a complicated problem.

I'm not entirely sure what shape parsing happens to be. Something tree or graph shaped with backtracking. What shape is constraint propagation and/or unification? Anyway, so let's assume that parsing (all parsing) has some sort of known shape. Then add to the mix different power levels of grammars (regular, context free, etc). Then add to the mix efficient time and space constraints. So we begin with the already complex shape of parsing. Then we add a bunch of very complex shapes to get different characteristics that make parsing practical. And finally, we squash the whole thing into a table and implement it in a for loop.

So of course parsing is hard to *really* understand.

Although, I didn't really want to talk about parsing. Well … not computer language parsing. I actually wanted to talk about math. And more specifically, why is it so hard to understand mathematical statements.

… math blah blah blah or how paradoxically paradox aversion produces paradoxes

I was watching a bunch of group theory videos[3] yesterday. And I noticed that some of the picture examples and intuition behind the presentation was pretty simple and straight forward. But the precise definitions were hard to wade through even with the intuition.

This got me thinking about parsing. How I can't quite seem to feel comfortable with grammars and algorithms even though I've put a lot of work into it. The answer of course is that I still don't have a good feel for the "true shape" of parsing. Let alone all of the extra complexities that go into making it practical.

But what I had missed until now about mathematics is that the English statements aren't really English. And I'm not talking about jargon. To be sure there is a bunch of math jargon that takes a while to understand, but I don't think the hard part is the jargon. The hard part is that, just like with the constrained parsing grammars and algorithms, mathematical statements are a constrained form of English. With the parsing we constrain the grammars in order to be able to write algorithms that halt, fit into memory, and run in a reasonable amount of time. With mathematics there is a different goal in mind.

Charles H. Bennett wrote a pretty interesting paper titled, "On Random and Hard-to-Describe Numbers"[4]. It contains two essays, and the first essay discusses the paradox, "the first number not namable in under ten words". This is of course nine words, which means that the first number not namable in under ten words (1,101,121) is in fact nameable with nine words (ie the above quotation).

Similarly, consider Russell's paradox[5] (also mentioned in Bennett's paper).

The point is that a full natural language contains facilities that allow you to make statements which do not actually make any sense. Even mathematical languages like naive set theory, allow for statements which don't make sense.

Which is why I think mathematical statements are so hard to make sense of. It's because a great deal of work was done in order to make sure that the statement made sense.

[1] - http://stackallocated.blogspot.com/2013/08/cognitive-complexity-classes.html

[2] - http://stackallocated.blogspot.com/2013/08/cognitive-complexity-classes-shapes.html

[3] - https://www.youtube.com/playlist?list=PLWbnIo7XnOkw7zZu6u3si3at21r534qIM

[4] - [pdf] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.3492&rep=rep1&type=pdf

[5] - http://en.wikipedia.org/wiki/Russell's_paradox

Thursday, August 15, 2013

Cognitive Complexity Classes: Shapes

In order to expound upon the different types of Cognitive Complexity Classes (CCC), further attention must be applied to the types of shapes which algorithms may be composed of. This is unfortunate because my current concept of a "shape" of an algorithm is overly vague. So, to be able to focus on what makes any given CCC different from any other given CCC (or for that matter what makes any two CCCs the same … or even what types of sameness are needed to support this system), I first need to flesh out the idea of algorithmic shape.

Shapes on the cave wall

So my initial description of an algorithm shape invoked the imagery of the input and output of a function. While that might have been a good way to break the ice, that description is hardly ideal. After all, if you have a function that takes a graph and then returns the exact same graph, it would be intellectually dishonest to describe it as having anything other than a scalar shape. Similarly, if you have a TreeMap function that maps all the elements in a tree using a given function and then returns the same tree with different elements, that seems a lot like a List shape. You are doing operations on trees, but the nature of the operation is very linear in nature. Given a function that traverses an entire tree and a function that traverses an entire list, the Map function is nearly identical between trees and lists (or for that matter scalars and graphs).

Comparison sorts are a pretty good class of algorithms to consider. If you create a Merge sort algorithm that recursively breaks apart the list you want to sort and then glues it back together in the desired order, then you will end up with a tree shape across time. On the other hand if you create a Heap data structure and then insert and remove all of the elements in your list, then you will end up with a tree shape across space. This leads me to believe that comparison sort algorithms have a tree shape to them.

There's two interesting concepts that emerge from algorithmic shape that I want to discuss. The first one is that different programming languages have different "dimensions", which are determined by the features available in them. So I assert that a programming language that has recursion has an extra dimension than a programming language that does not have recursion. If you want to implement a comparison sort algorithm in a language without recursion or you want to implement it without recursion, then you have to "flatten" out the comparison sort algorithm. With recursion the algorithm can take its natural shape; without recursion you have to figure out how to represent the algorithm's "true form" in a flatter context.

Now I'm kind of eluding to something that I'm sure is going to end up getting me in a lot of trouble. Best case is that I escape by being painted overly naive, ambitious, and/or idealistic. So, flattening representations isn't that big of a deal. We do this all the time. We flatten 3D geometric shapes so that they can be represented on a 2D piece of paper. We represent 2D matrices as an array of 1D arrays. I'm sure that "flattening" an algorithm's representation is something that can be formalized. Now on the other hand, "true form" is clearly getting way to close to philosophic digressions or even worse poetic ruminations; I really don't see how I can hope to get away with that one without something concrete or at the very least wildly convincing.

So, Yuri Gurevich has a very interesting series[1] on Microsoft's channel 9 research website. I'm sure I haven't devoted enough time to fully digest everything he said during that series, but a couple of things did stick that I think spell trouble for the direction that I'm hoping to go in. Turing argued very convincingly for Turing Complete functions. But he didn't argue at all concerning Turing Complete algorithms. So while essentially every programming language is turing complete means that we can compute any function in every programming language, it does not mean that we can implement every algorithm. And of course this makes sense if you think about it. In the C programming language you could write an algorithm that writes a bunch of data to a file, calls a Unix command line utility to process that file, and finally reads the result from the Unix command to complete the algorithm. Implementing this algorithm in lambda calculus is impossible. Lambda calculus doesn't even recognize that files exist let alone have facilities to call out to Unix command line utilities.

Professor Gurevich, if I recall correctly, argues that the class of digital algorithms cannot possibly contain the class of all algorithms. Analog algorithms for example make use of reals, which can't be contained in natural numbers let alone digital memory. Not to mention any all encompassing theory of algorithms would have to contain all future human invocation (who foresaw quantum computers back when the Greeks were first coming up with GCD).

I'm fairly sure that I'm some flavor of constructivist[2], which I believe Professor Gurevich wasn't too keen on in his video series. And I suspect that this might explain my idealistic "true form" sentiments. After all if the only things that *really* exist are things that can be constructed, why all you have to ensure is that the Comparison Sort Problem Constructor is only called one time and it is going to be pretty hard to argue that the result is anything other than the "true form" of comparison sorts. Of course in that last sentence I just spilled the beans as to why the road ahead of me is likely going to be a tough one, but let's get back to that in a little bit.

First though, let's get to that second interesting concept that I mentioned quite a bit ago. Let's go ahead and assume that my algorithmic shape idea has all the merit in the world. And let's assume that programming language features really do correspond to different shapes. In that case, if statements and simple pattern matching seems to correspond pretty closely to scalar complexity. Loops, map, fold, unfold, et al seem to fit nicely with list complexity. Recursion and algebraic data types are pretty clearly tree complex. Not quite sure what would correspond with graph complexity. Some sort of state machine maybe? Well at the very least an actual graph would be graph complex. Of course we've got an interesting progression here.

Lists are a generalization of scalars. You got one thing, now you got lots of things. Trees are generalizations of lists. Graphs are generalizations of trees. And hypergraphs[3] are generalizations of graphs. Now in my previous essay I suggested that maybe the most complex thing we have to worry about is graph complexity because you can implement lambda calculus (a turing complete language) with a graph reduction machine. But on the other hand, let's go with the flow for a minute. What kind of programming language features might show up to make graph complexity as easy to reason about as a Map operation with parametric polymorphism. What about hypergraph. There might be some very powerful abstraction techniques out there, and these complexity shapes might provide some insight to developing them.

Constructivism and the overly poetic imagination

So mathematical constructivism is probably my type of thing. Turing completeness kind of gets in the way of a constructive view of things. My thinking goes kind of like this: If the only things that exist are things that can be constructed, then equivalence is just a hop, skip, and a jump away. Just trace things back to where they were constructed and if it's the same constructor, then you're in business. Unfortunately, with turing completeness we can't even know in general if an arbitrary function halts [4,5]. Forget about showing equivalence between two of these indeterminately halting beasts.

With two turing complete functions, we can't determine if they're the same thing. But who's talking about functions? I wasn't. I was talking about algorithms. Which are an even weaker concept than functions. At least we have a definition for turing computable function. Algorithms are pretty vague and tomorrow a new class of them might show up out of nowhere.

But then again. Who's talking about algorithms? I wasn't. Not really. In my constructivism heart I was talking about some sort of "true form" not for algorithms, but for problems. Can't have a true form without equivalence. At least if you want to stay outside the realm of poetry. I mean how would you show that something was in it's true form if you're not equating it to *the* true form. And in turing complete land you can't even have function equivalence. Let alone algorithmic equivalence. And turing complete algorithms aren't even a thing. How do you have a true form for something that isn't even definable. Now I'm talking about Problems. Like not even implementations of algorithms for something. But some sort of "higher plane of existence" where the Problem itself is a concrete thing. Pretty ridiculous.

Although … on the other hand … turing completeness kind of sucks. I mean don't get me wrong. All things being equal, it's a pretty nice safety blanket. After all you wouldn't want to be stuck up the river without a turing complete paddle to implement that function, which turns out can only be implemented in a turing machine. However, a function that can be generically shown to be total is a function that is a lot easier to understand. If you had a choice between a function that you knew was mathematically shown to always return a value no matter what value you pass to it and a turing computable function that had a "quite impressive" 90% test coverage, you're going to choose Mr. Turing-Complete-Is-For-Chumps. Granted I might be overestimating the typical programers fear of ⊥, however I like to think the informed decision is to ditch turing completeness when you don't actually need it.

So perhaps my Problem as a concrete object isn't as far fetched as it sounds. You do have to get rid of turing completeness. Or at the very least have a subset of your workspace reserved for provably halting functions. Hmm. Homotopy type theory[6] supposedly says some interesting things about equivalence. I guess it's time to hit the books.

[1] - http://channel9.msdn.com/Series/C9-Lectures-Yuri-Gurevich-Introduction-to-Algorithms-and-Computational-Complexity/C9-Lectures-Algorithms-with-Yuri-Gurevich-Introduction-and-Some-History

[2] - http://en.wikipedia.org/wiki/Constructivism_(mathematics)

[3] - http://en.wikipedia.org/wiki/Hypergraph

[4] - http://en.wikipedia.org/wiki/Halting_problem

[5] - http://stackallocated.blogspot.com/2013/02/halting-problem.html

[6] - http://homotopytypetheory.org/

Wednesday, August 14, 2013

Cognitive Complexity Classes

In a Galaxy far far away

LANGUAGE WARS

When I started my professional career as a software engineer (six years ago yesterday actually), I really only knew a couple of languages. I received my Bachelor's of Computer Science at Purdue University. While the breadth of topics offered at Purdue is certainly impressive, the programming languages covered certainly leaves some room for improvement. I had an introductory course in Java and then everything else was pretty much C or C++. Granted there were some alternative possibilities that cropped up from time to time.

For whatever reason, beginning a career in software engineering really got me interested in programming languages. I did some research on my own previously, but actually programming for a living filled me with a sense of professional pride. I ought to understand the tools that I'm using to create products. Of course I've always been interested in understanding how things worked. Maybe it was only natural that this would overlap with my work.

I began trying to figure out the features of programming languages. So dynamic typing, eval, meta objects, object oriented, lisp macros, pattern matching, automatic theorem proving, the works. Type theory took awhile to figure out. After all there's a bunch of math involved, and it's not really the kind that I had prior exposure to.

Through all of this a couple of weird issues kept making an appearance. Turing completeness and the power of more expressive languages. Any turing complete language is able to compute all of the functions that any other turing complete language is able to compute. Nearly all programming languages are turing complete (and none of them are "super" turing complete), so the question naturally arises … "why do we need different programming languages?" Out come the vague, poetic answers. You can work better or faster in such and such a language. Or this language has features that prevents certain mistakes. Programers who use language X tend to be better because they are more passionate about programming. And there are many such ideas like this. Some tied to conduct studies by giving students without prior exposure to programming different languages to complete a specific task. But it's not really clear how this applies to experienced practitioners. After all, shouldn't knowing both languages in a given study be better than just one. And how do abstract, language agnostic, problem solving skills factor into any results?

It's Complexity, but not as we know it

Kind of a staple of a computer science education is algorithm complexity theory. Big O notation and its ilk. I'm probably being overly cynical, but algorithm complexity theory might be the only real difference between an educated computer scientist and a self taught programmer. (And I know I'm being overly cynical when I say most computer scientists forget all about Big O after the final, but sometimes it's hard not to think it.) Honestly though, the theory and concepts from Big O and its friends don't really show up all that often in practice. But. When you are able to leverage these disciplines your peers will look at you as if you're a dark wizard who wields powers beyond human comprehension. Seriously, all I did was replace a list with an array. Just because it changed the application from being broken and without hope of repair to working better than expected doesn't mean I did anything *that* impressive. This power can be yours as well.

The basics are actually pretty simple. Given some number of inputs to an algorithm, how fast does the memory requirements and time to compute grow with one more input. So if an algorithm's time complexity is linear, then the time to complete a computation grows proportionally with the number of the inputs. Or in other words:

input - time

1 - 1

2 - 2

…

100 - 100

If an algorithm's time complexity is "N Squared", then we're looking at something like:

input - time

1 - 1

2 - 4

...

10 - 100

...

100 - 10000

And obviously there's a lot more that goes into all of this when you start to get more formal. But I think you see the point. For small input's it doesn't really matter what the complexity of your algorithm is. But once you hit large input, it really starts to matter. Maybe 10,000 time units doesn't sound like a big deal, but consider someplace like Google, Facebook, or the IRS. An input of 300 million can suddenly turn into 9E16. Or perhaps I should rewrite that as the more impressive, 90000000000000000. That's kind of a problem.

However, while algorithmic complexity is clearly very important and has an important place in the design of software, it's not the type of complexity that I really want to talk about. I've noticed a different type of complexity. One which I feel answers questions about a lot of personal experiences that I've encountered when trying to implement certain types of algorithms, but which more importantly answers the question of "what's the real difference between programming languages" without resorting to a bunch of subjective statements about productivity.

Instead of (or more likely "in addition to") classifying algorithms by their time and memory complexity, I want to classify algorithms by how hard they are to comprehend. On its face my goal is violating my desire to avoid subjective statements. After all who gets to decide the difficulty of comprehending an algorithm. We don't even have a good definition of what comprehension itself *is* let alone how to prove that someone actually has a comprehension of a given subject. Actually, if I could define and prove comprehension, I wouldn't even bother with my "new kind of complexity". I would just set up shop as a recruiter who could mathematically guarantee hires who actually know what they're doing. And I would retire soon after very wealthy indeed.

So simply by the elusive nature of comprehension, I think my idea of complexity will be a bit loose around the joints. However, while a completely formal definition may or may not be forthcoming or even possible, I believe that I can argue convincingly that there are different cognitive complexity classes for algorithms. With perhaps some fuzzy overlap between the classes, to make room for different types of human cognitive talent. After all there may be some people who really are better at solving a problem "the hard way" due to varying circumstances. Anything from how their brain is wired to the way they were raised to solve problems.

It is very complex. You are likely to be eaten by a grue.

Right now I've got four complexity classes in mind. This almost definitely needs to be expanded and modified, but I think it's a good place to start. The way that I'm classifying these classes is by describing the "shape" of the algorithms.

The simplest class is the Scalar class. Followed by the List class. Then the Tree class. And finally the most complex is the Graph class.

Like I said, I don't think these categories are absolute, final, or complete. I think it's just the starting point.

An algorithm that has a Scalar complexity can be easily imagined as a function that takes a scalar as input and output. So logical-and is a good starting place. Take two booleans and return a third boolean. The structure of the inputs is atomic. There's not that much going on here. It's also easy to see where things get fuzzy. A function that takes an array and then reallocates the memory of that array in order to expand the total size is probably Scalar complex. But a function that takes an array and then maps a function over each value in that array, creating a new array with new values, should probably be in the List complexity class.

The Map (or Select depending on your preferred terminology) function that I just described in my previous paragraph may seem like a Scalar complex function. After all you take a completely opaque function, apply it to every element in a mostly opaque array, and finally return a newly constructed and mostly opaque array. The structure of the list doesn't really matter *that* much. So part of this is the unfortunately fuzzy nature of comprehension. The Map function seems pretty easy to someone who's had a lot of experience with functional languages. However, there are two things to consider. The first is that if you look at the "shape" (admittedly a vaguely defined term) of the algorithm, you'll see that it's pretty easy to convince yourself that it has a "Listy" shape … or at the very least it's more listy than a logical-and function. The second is that incorporating Map into your code in a functional language is very easy to imagine. It yields clean and composable constructs. However, attaining Map like functionality in an imperative language like C is a very different story. A major component of my motivation for this cognitive complexity theory is that it should help explain what the difference between programming languages actually are. List complexity isn't a big deal in functional programming languages with higher order functions, but in C like languages you need to be more careful with it. I'll talk more about language differences later.

So, I've already sort of covered List complexity. But there is some more that needs to be dealt with. Functionality like Map, Fold, and Unfold all seem like they belong to a class of functions that have a list like "shape". However, things unfortunately get fuzzy again when you consider lists of lists of scalars. In some situations you may consider a list of lists to be a tree structure and in other situations the lists could be considered to be atomic in nature. It really depends on the type of algorithm and how comfortable you are with these types of operations. Although, the more operations you need to perform on the inner lists the easier it is going to be to argue the shape of the algorithm as a tree.

Tree complexity. The most obvious tree complex algorithm that I can think of is the Permutation function. A function which returns all of the permutations of a list is very naturally described in a recursive manner. Tracing out the function calls shows what is very obviously a tree. O (n log( n ) ) sort methods also have a very tree like shape to them. After all several of these types of sort methods specifically exploit the nature of trees in order to get their time complexity. Again you can start to make arguments based on different non-recursive implementations of these algorithms, but I noticed something interesting recently concerning Levenshtein distance.

The definition of Levenshtein distance[1] is about as recursive as you can get. Each node of the call graph makes multiple calls to itself. The resulting "structure" over time is a massive tree. However, implementing the Levenshtein distance formula this way is overly wasteful. The way that the function is normally implemented is by constructing a table and then finding the minimal path through this table (actually, until recently this was the only algorithm I knew about for Levenshtein distance). And there's even an optimization which only requires two rows of the table. Now, these alternative implementations don't necessarily seem tree shaped. However, I would like to argue that starting with a recursive definition and then transforming it into a non-recursive definition is akin to converting a recurrence into a summation. So my hope is that it can be argued that the recursive form of algorithms like Levenshtein distance is the "more true" form for purposes of classifying cognitive complexity. At the very least I think it won't be too hard to argue that the non-recursive form of Levenshtein distance is more *complex* than your typical listy algorithm.

Finally, there is Graph complexity. I haven't gotten much opportunity to play around with algorithms that seem squarely in the Graph class, so I'm not really going to say too much here. Parsing certain types of grammars seem like they need some sort of graphical shape or structure in order to function correctly. Additionally, Simon Peyton Jones published an interesting book back in 1987[2]. In it he describes a G machine (graphical reduction machine) that can be used to implement the lambda calculus base for a functional programming language. My naive thought is that if graphically shaped algorithms are strong enough to implement the turing complete lambda calculus, then maybe we don't need anything more complex. Perhaps there is a maximum amount of cognitive complexity we have to put up with before we can show that there has to be a simpler way. I have a feeling that this idea will either lead the way for incredibly useful algorithm simplification tools or turn out to be hopelessly optimistic.

Fight a complexity G specimen without damage control, you're crazy.

A major goal I have here is to show that different programming language are different in more than subjective anecdotes. I think that I have at least highlighted one possible route for a different type of complexity. A complexity that doesn't concern itself with time or space, but instead with comprehensibility. And I have already mentioned briefly how we might use such a definition of complexity to differentiate programming languages.

Basic List complex algorithms don't seem like a very big deal when you have access to functions like Map and Fold. Things begin to become more problematic when all you have access to are for and while loops.

Additionally, Map and Fold can be safely used in a dynamically typed language. However, once you begin to embed lists inside of lists, parametric polymorphism found in languages like Haskell, Ocaml, and C# really start to pay off.

Finally, just because nearly all of the languages available to use today are turing complete, doesn't mean that this is beneficial when the complexity class of an algorithm begins to increase. In a language like C where the main algorithm solution mechanisms of choice are loops and if statements, when Tree or Graph complex problems are encountered the solution will be some sort of state machine constructed in a while loop filled with continue statements. We've seen with the Levenshtein distance algorithm that such solutions can be of high quality. However, my fear is that many engineers will not even be aware of the existence of highly complex problems or their need to be carefully transformed in such a way that they can be correctly implemented. Worse yet, if the only tool that these engineers have a mastery of are loops and if statements, the end result maybe an ad-hoc and fatally incomplete state machine, even in more typical languages like Java and C#. Suddenly, turing complete shifts meaning from incredibly useful property to enough rope to hang yourself.

I believe that the goal of my efforts should be to identify and elaborate on different cognitive complexity classes. Then I can identify which programming language constructs most lend themselves for solving these classes of problems. This will provide guidance for future programming language advancements as well as educational efforts for the best way to use those advancements. Developing this idea into a formal system would be icing on the cake, but I feel at the very least that convincing arguments can be made for the utility of cognitive complexity classes.

[1] - http://en.wikipedia.org/wiki/Levenshtein_distance

[2] - http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

Friday, August 9, 2013

non-ubiquitously omniversal, the series

Looking around at our increasingly technical world with four years of formal computer science education and 6 years of industry experience as a software engineer, I am having some trouble watching old episodes of star trek.

They have access to incredibly powerful technologies. They can travel faster than the speed of light. Force fields, inertial dampeners, anti gravity FTL communication channels, transportation, who cares about the laws of physics. Perhaps their understanding of physics is just much greater than our own. They're not violating the rules of reality, they're just playing by a much better understanding of those rules. However, if that's the case, then were's Facebook?

Seriously. Well, perhaps you say that there' so advanced they no longer *have* Facebook, ha ha. But then where're the IDevices™. I'm not sure that the data pads you sometimes see people carrying around count either. They're kind of clunky. And you routinely see them piled up on desks. Seriously dude, I think one pad plus some 30" monitors would do it. Or what about this. Junior officer announces a ship uncloaking. Long pause. A console explodes violently (maybe you shouldn't put a high power cable behind every monitor … seriously you would think they fix that after the first time someone dies due to monitor explosion). Raise the shields! How about you just program the computer to raise the shields automatically whenever a ship de-cloaks. Ta da, thank me later. Everyone seems to have these passwords that are like three letters long … which are occasionally defeated by tape recorders. There was that one episode where Data takes over the entire Enterprise because he can mimic Picard's voice. Everyone just has to sit around and is completely helpless for the rest of the episode. How about some PGP guys. And that's another thing. It seems like there's way too many plot points were some simple crypto would have prevented everything. Or there's too much crypto being broken at the drop of a hat. But whenever the plot needs something to be secret they just duck into an empty room or piggy back data on a modulated inverse tachyon beam. Like as long as no one paying attention you're secure. Seriously, today I could buy 20 raspberry pis, glue a microphone on them, and *bam* I can bug a large facility for under $1000. And as far covert subspace channels go. An introductory book on data mining and a small fleet of very small probes and no one's ever sending anything covertly ever again.

The crypto / covert dynamic is kind of weird though, right? Like what reality is it where crypto is trivial to just crack every other day. But covert channels is the defacto way to get real work done. Hmm.

Well, in a universe where you have nearly infinite computer processing power, a lot of crypto is going to be worthless. I mean crypto works because either you have some math problem that would take too long to solve the hard way or because it would take too long to brute force the key. That all falls away with infinite processing power. So crypto is worthless, but then why don't they have drones everywhere constantly monitoring the background radiation levels and automatically throwing the result into some sort of digital processing algorithm. Heck you could even data mine the signals against a giant database so you automatically adapt to new strategies. That should get rid of any covert channels. No sneaky stuff for that universe. Maybe they're too busy to do it. Like it would be too much work … although I could probably bang something out in python pretty quickly. Like … maybe they don't have python. All of that might be a huge pain in the neck if all you had was assembly … err … they do talk about subroutines an awful lot. And no one ever mentions higher order functions or even modules.

It sounds suspiciously like in the star trek universe, no one ever discovered lambda calculus. Or any other higher level maths that would lead to the development of better software engineering techniques. They still invented computers, but they're programming them in the most basic way possible.

Why wasn't the math we know about today discovered? Well it was probably because all of the mathematicians were busy with the discovery of whatever magical physics defying stuff that allows them to travel faster than light and generate gravity. This is actually the premise of a short story called "The Road Not Taken"[1]. Basically, aliens invade Earth, but they do it with flint lock pistols and cannons. Turns out that the secret to faster than light travel is really simple, but it's totally nonsensical. So once a race discovers it science can't explain it well and proceeds to collapse. Whatever technological level you were at when you discover FTL is where your civilization stays.

So it looks like Philip Wadler's Omniversal programming language isn't as ubiquitous as he hoped. And I guess that sort of makes sense. Who wants to compute lambda terms or find out how to decrement natural numbers when you could play with infinite computation power and travel to the other side of the universe. Heck, most people would be more than happy to just sit around in the holodeck.

Although, that also means that if a copy of SICP or the Lisp User Manual shows up in the star trek universe, then much havoc would be wrought.

Hm. Kind of like the Death Note anime. Only instead of bored Japanese death gods you would have ghost John McCarthy following around some sort of intelligent but fatally overly idealistic and arrogant star fleet cadet.

... that ... that actually sounds like it would make for some compelling tv.

[1] - http://en.wikipedia.org/wiki/The_Road_Not_Taken_%28short_story%29

Thursday, August 8, 2013

All worlds but one

There have been several vastly important revolutions in human knowledge in the last hundred years or so. Global communication changed how we see the world. DNA and the human genome project changed how we see life. General relativity changed how we see space and time. But the one I'm interested in today is the one that most people don't even realize happened. In the early 1900's there was a revolution in mathematics. And it changed the way we see everything. What started with the likes of Bertrand Russell, Gerhard Gentzen, and David Hilbert ended with the likes of Kurt Gödel, Alonzo Church, and Alan Turing. Most people didn't notice a difference in the thoughts and methods of abstract, ivory tower academics. And I doubt they would care even if it they did. But they do notice the IDevice™ in their pocket, the computer that does their taxes, and the massive outgrowth of automation that governs more of our daily lives by the minute.

And I don't quite mean that the mathematical revolution gave us computers. Computers were on their way regardless of what mathematics was up to. Mathematics gave us turing completeness and lambda calculus. I also don't quite mean that our current technological wonderland is due to turing completeness and lambda calculus. Most of the developers responsible for our current lifestyle are only vaguely aware that such things exist. These developers learn the bare minimum and then plow relentlessly ahead; following their imaginations. However, imagination will only get you so far. Eventually, overly ambitious plans must cede to the limit of human comprehension in the face of overwhelming complexity. Mathematics, however, provides avenues to tame this complexity. And it has been sinisterly added to even the simplest developer's toolbox. Why the honest and practical minded developers of today would be shocked to find that they have been making use of monads and theorem provers for years. But they have. And our increasingly technical world would not exist if this wasn't the case.

Philip Wadler is a mathematician who has a certain interest in lambda calculus and computation. He came up with a marvelous phrase … lambda calculus is the omniversal programming language. You see, it might be said that lambda calculus is a universal programming language. Being mathematics, any civilization in the universe could independently come to the same conclusions that Alonzo Church did in the 1930's. But it goes further than that. Because it's mathematics, it should be able to also exist in other universes. So it is truly omniversal.

Of course, personally, I have some reservations about an omniversal programming language. While I can't imagine the nature of a universe where mathematics does not function, I can imagine that such a place might exist. Lambda calculus itself argues against its status as omniversal. In lambda calculus, things are not allowed to come from nowhere. There *are* free variables, but they are really only free in a local, constrained context. There must be some larger scope which introduces the free variable, or else it is an absurd entity without any meaning. If in the world of lambda calculus everything must come from somewhere, then where did lambda calculus come from. Surely it cannot exist of its own nature by its own nature for its nature is that this cannot happen. So it points to a world outside of itself; where it does not care to venture.

Sunday, June 2, 2013

Homonyms by Any Other Name: Types

In college I started getting really interested in programming languages. Somehow, I discovered Lua at approximately the same time I was going through a compilers course. Once I got my first job, I decided that it was important to understand as many different programming languages as possible.

Not entirely sure why I did that. Maybe I had a vague understanding that I derived a great satisfaction from figuring out how things work. But whatever the reason I spent a lot of time and effort figuring out as much as I could about programming languages.

I started with huge lists. Features that I found in different languages. I wanted to bunch them all together. Be able to use all of them proficiently. But as time went by, I noticed that my lists were shrinking. I had begun to understand the mechanisms behind different features, and I could categorize massive sections of a single language behind a single word.

Types were kind of difficult to figure out though. What exactly were these things. People seemed to categorize them into different categories. Dynamic types, static types, strong types, weak types. And people had wildly different thoughts as to which type of type was best. And then more people complained that dynamic, static, strong, and weak were all bad categories and they didn't really correctly classify types. It was almost like the word "type" meant different things depending on the context. And I think that is actually exactly what is going on.

Normally, people would say that types prevent you from making mistakes. But that's not really what happens in the C programming language. In C you can cast anything to anything and the compiler will just assume you know what's up. In C, types mean what kinds of assembly opcodes and offsets the compiler will generate.

Well, maybe that's just C ... in C# types prevent you from making mistakes. Although ...

void fluAdd( IList<int> list ) {
list.Add( 5 );
}

fluAdd has an interface. This allow us to know that no matter what happens, we will always be passed an object that has an Add method on it. This protects us from failures. Until something like this happens:

fluAdd( new List<int>().AsReadOnly() );

So what does happen if you try to add something to a readonly collection ... well you get a NotSupported exception. Which sounds kind of familiar. Hmm.

>> 5.add( 5 )
NoMethodError: undefined method `add' for 5:Fixnum
from (irb):1

Aahh! Ruby's NoMethodError is basically the same thing as C#'s NotSupported exception. So C#'s types don't protect you from making mistakes. Why they're no better than what Ruby gives us. Well, don't go quite so far. Notice that Ruby is letting me try to add a list entry of '5' to the number '5'. C# won't let you do something that far off the rails. So while it is true that C#'s types won't protect you from making any mistakes, it isn't true that you might as well have dynamic types. C#'s types lower the number of possible mistakes you can make.

Okay, so what about Haskell. People are always saying that Haskell has a really powerful type system and comes from real, hardcore type theory. Well this is partially true. Yes, Haskell uses real type theory in its types. Type theory has this great concept called Soundness. Basically, it boils down to being able to prove that the type of a function doesn't change while you execute it *and* every given step of the execution yields a sensible result that can continue to be executed. From this comes the saying, "well typed programs don't get stuck". So the issue with Ruby's NoMethodError or C#'s NotSupported exception shouldn't happen in a programming language with Sound types.

But I mentioned "partial" truth didn't I? The first is that Haskell's type system isn't really all that powerful, when compared to other possible candidates from type theory. Maybe you want to type your types ... there's a type system for that. Or maybe you want to totally specify every aspect of your program that is mathematically possible ... there's also a type system for that. Haskell resides in a middle ground that balances out what you can make safe with types and what is practical to make safe with types. But there are extensions to Haskell and different languages which improve on what you can make safe by using types.

The second is that Haskell also has Type Classes. Type Classes don't really have much to do with type theory at all. But they do allow you to specify functionality based on what a type happens to be. Additionally, Recursive types (totally a type theory thing) are expressed with Algebraic Data Types (not necessarily a type theory thing[1]). Then comes in Generalized Algebraic Data Types, existential types, universal quantification, polymorphic kinds, type families, functional dependencies, the list goes on. Even if you want to ground a programming language completely in type theory, you still have to decide how to implement that and how the programmer will be interfacing with the types. This means that if all you know is type theory, you will still need to understand a type system of a particular programming language in order to truly understand that types are.

Of course all this really only focuses on static type systems. Even in the dynamically typed world, the semantics of type isn't this one monolithic thing that is the same in all contexts. Consider Ruby:

def ikky( a )
a
end

ikky( 1, 2 )

=> ArgumentError: wrong number of arguments (2 for 1)

So, there aren't any compile time checks to make sure ikky is containing the correct types, but there are runtime checks to make sure that at least the correct number of arguments are being passed to ikky.

Now consider Lua:

function ikky( a )
return a
end

ikky( 1, 2, 3, 4 )

=> 1

In Lua, you can pass as many arguments as you want to a function, and it will totally execute the function regardless of how the argument count matches up with the function definition. You can even pass no arguments to a function that takes many arguments and everything will work fine[2].

So just because you decide to move to the dynamically typed world, doesn't mean you can ignore what "Type" actually means. The meaning of this word still depends significantly on where you are.

Whether you want to learn more about types, win the next language "discussion" you take part of, or write good software that works, make sure that you remember that word "type" depends a lot on the context.

[1] - Okay, so yeah, ADTs are totally a theoretical thing which has mathematical properties, etc. But the fact of the matter is that you can have a recursively typed, system F lambda calculus that covers all of the things you will see with ADTs, but still not actually know how to use ADTs. ADTs are a language level interface with the type checker that may have language specific concerns.

[2] - This is of course for varying definitions of the word 'fine'. If a parameter doesn't have an argument, then it gets set to nil. Which may or may not be a problem depending on how the function is written.