Monday, April 27, 2015

Programming Language Feature Diagrams as Cognitive Complexity

After observing things in the software engineering world for about eight years now, I’m vaguely concerned that people approach problems in fundamentally incompatible ways.  The adage “the right tool for the job” makes a lot of sense when you compare web programming to embedded development, but I’m afraid that “the right tool for the person” is something we’re also going to have to come to terms with.

Regardless of how we come to terms with differences between individuals, I still really want a universal metric for measuring how difficult a problem actually is.  I’ve made several attempts at this [1][3], but so far I haven’t encountered anything that would allow me to assign numbers to a problem that indicates how difficult a problem is to cognitively comprehend.  An early attempt was Cognitive Complexity Classes [3].  Basically, I noticed that working with lists is pretty easy, but working with a graph is kind of difficult, so I hypothesized that you could divide algorithms based off of some associated data structure that *matches* the algorithm.  I don’t think this idea is really valid, however with my current line of thinking it is actually surprisingly close.

My most recent attempt at cracking this problem was Mental Comprehension Complexity [1].  This guy might actually be viable, but it would need a lot of refinement before it starts to yield results.  The gist here is that I hypothesize that a function with simple inputs and outputs and also with a very complex internal nature would be difficult to comprehend.  Simple vs. complex would be measured by Kolmogorov Complexity [4] or something similar to it.  I think Mental Comprehension Complexity isn’t going to go very well because in order to handle inputs/outputs of an algorithm that contain turing complete features (i.e. an object that has methods attached to it) you’ll need to do a bunch of work to represent the algorithm in terms of an effectively computable function (like maybe a turing machine).  In the end I’m not sure that the mathematical statements we’ll be coming up with will actually apply to the medium that programmers typically work with (something closer to C#).  And that’s kind of misses the goal I was hoping for (Mental or Cognitive complexity not some sort of algorithmic complexity).


Consider the number 5

So start by considering the number 5.  It’s about the simplest thing you can consider; if you encountered it in a programming context, you wouldn’t be intimidated by having to keep track of it mentally.  However, if we add more constraints to the number 5, it begins to become more problematic.  Maybe it’s important that this number must be odd.  Or maybe it’s important that this number be prime.  Or maybe a triangular number.  As we add constraints the amount of mental effort we have to put forth increases.  I think this is because with a single number the space we have to consider is “any one of all the things”, but once our space has constraints we now have to consider, “one of the things, but not any one of the following based off of this complex formula.”

So take another step up on the Cognitive Complexity Class ladder.  A list of *things* is pretty simple to comprehend, but a list of things that all have constraints is going to be similarly hard to comprehend as a constrained number.  And it can get so much worse.  Think about a sorted list.  In this case each item in the list now has certain rules that go with it that are based on the other items in the list.  This can get arbitrarily complex.  Perhaps the second element has certain constraints, but only if the fifth element possesses other constraints.

If we ascend into tree data structures, we can have entire subtrees that have their own special rules based off of the constraints we place on their roots.  And we can go that much further with graph data structures.

This isn’t Cognitive Complexity Classes because that framework assumed that you could see algorithms as having the “essence” of certain data structures.  However, I am borrowing the basic procession to illustrate how we can create and get an approximate feeling for what complex and difficult to comprehend things look like.  I think the basic forces involved are the size of the space that we are working in and the number of constraints that have been put on the elements in that space.  Additionally, I suspect the nature of the grouping of the elements in any given space is relevant as well.  If we have a simple property like even or odd, that’s not too hard to comprehend; if we have something harder to predict like primality, then we have to spend additional effort to understand when we are looking at a valid element in that space.  

However, this is just static things like data.  Is there a way to extend this idea to handle things like objects with methods or a function that accesses global mutable data?  


Enter Programming Language Feature Diagrams (PLFD)

I came up with an idea a while back about how to talk about certain types of programming language features [2].  I didn’t really like the statement “exceptions are exceptional” because it didn’t actually explain anything that was going on.  So I drew a tree shaped structure with a box at each node and said, “exemptions go up this thing until it hits a circle (representing try/catch statement).”  I like PLFDs because they let you think of features in terms of what they do, not in terms of when you would use them.  An exception being a dynamic stack communication device is better than it being for exceptional situations because in the latter case you just know to only use them sparingly, but in the first case you know that there needs to be a try/catch statement above you in the call stack or else you’ll crash the program (not the best user experience).  Additionally, when you see a subtree drawn out with an indicator of where the relevant try/catch lives, you know how you can refactor a program without having to worry about adding additional try/catch statements to retain the semantics of the original un-refactored program.

These things can be used to measure cognitive comprehension complexity in much the same way as we were above with the data structures.

I noticed that PLFDs are graphs (lazy evaluated graphs I guess, but totally graphs) with a bunch of very strange constraints on them.  Consider a mutable reference.  When you have a mutable reference and you pass it to functions you call in the function the reference is initialized in, what you’re actually doing is setting up a bunch of different constraints between the functions that make use of or alter that mutable reference.  This is exactly the situation we have above with the data structures except instead of a chunk of data that exists at one time we have a chunk of data that exists over time.  However, if you kept track of each value returned by every function call, you could fill out the entire call tree as if it were a data structure.

Different features used in different ways imply different sets of constraints on the final program.  How big the space is and how many and what kind of constraints you have determine how hard it is to understand the program.

Gist

If you can create a programming language feature diagram for every feature in a programming language, then you should be able to measure the cognitive comprehension complexity of the entire program (including highlighting sub-complexity of sub-parts of the program (assuming the program is that well structured)).  For data I suspect you really need some sort of dependent types to allow the system to know how cognitively complex the values actually are.  But this system will be able to handle both data structures and objects and higher order functions.  Because a method is just another diagram that travels around with data, you should be able to use this technique to also handle higher order (and turing complete) situations in the same terms as the programmer is using, which will hopefully mean that any conclusions you draw from cognitive complexity will actually apply to programs that people write instead of mathematical constructs that are used for proofs.

What the actual measurement is at the end of the day is something that I still need to work at.  It seems like the output is going to be a space of some sort and then a description of how easy it is to select legal elements in that space.  Additionally, there might need to be some work done to identify what groupings of legal elements are easier or harder to wrap your head around.





Friday, April 24, 2015

Mental Comprehension Complexity


My initial goal as of late was to be able to identify all problems that are hard for people to devise solutions for.  You may think, “Hard for you, but easy for me.  And isn’t intelligence versatile and non-transitive?  So perhaps something might also be easy for you and hard for me.  If this is the case then how do you plan to have some *universal* metric for how difficult things are for people to accomplish?”  And I think this is a reasonable thing to consider.  Can we really create some sort of ordered lattice of problems where the higher you ascend the harder things get *regardless* of who you are or what experience you have?  Well, I had a hypothesis for trying to accomplish just that.

Step 1

Kolmogorov complexity describes how complex mathematical objects are.  I don’t think I can use Kolmogorov complexity itself for my purposes, but I wanted something similar.  Basically, given an input or an output I want to be able to rank how much mental energy you have to expend to think about it (at least relative to other inputs and outputs).

Step 2

Define a hard problem as any problem that has a solution implemented with a function where the inputs and outputs have a low relative mental comprehension complexity when compared to structures internal to the function.  The idea here is that this function is hard to comprehend because considering it solely in terms of its inputs or outputs will miss out on significant internal features.  Failure to be mindful of these internal features makes the function easier to implement incorrectly (some inputs result in invalid internal states) or easier to use incorrectly (expecting some outputs that will never occur because of certain features of the internal state).  In order to correctly implement and utilize this difficult function you have to be able to project the complex internal nature of the function onto the inputs and the outputs.  The bigger the disparity and the greater the internal complexity the harder the function is to comprehend.

Step 3

If you can find a mathematically solid definition for the hard problem in step 2, then you can build a machine that recognizes hard problems (in terms of people being able to find solutions for them).  This is the first goal.  We have a machine for recognizing hard problems.

Step 4

Try to see if the machine from step 3 can be adapted to generate an infinite number of unique hard problems.  The naive solution is to just generate a random problem and then use the machine from step 3 to decide if the problem is hard.  If the problem is hard you return it if the problem is not hard you generate another random problem.  I’m not sure if this machine is particularly useful by itself, but how we use it in step 5 is important.

Step 5

Begin giving people the hard problems generated in step 4.  Some people will be able to solve familiar hard problems simply because they have seen them before.  But if the hard problem definition really does encode what it means for a problem to be hard, then eventually you’ll generate a hard problem that they do not know how to solve from experience.  This step is really only describing a way to frustrate people, but end goal is to find or train a person who can successfully solve hard problems.

Step 6

Encounter a person who can consistently solve hard problems even when they are not familiar with them.  The hope is that their technique will either be to find a way to decrease the required internal complexity of the difficult problem OR to find a way to divide the hard problem into two or more non-hard problems.  With any luck these two techniques (or any others that show up) can be adapted to arbitrary hard problems.  This is the second goal.  A method for dealing with arbitrary hard problems.

Of course if you encounter a person who can easily solve hard problems, but not by making the problem more approachable or being able to explain why they are easy, then we haven’t really gained anything.  Although I suppose you could just hire that person to solve your hard problems.


But now what

Unfortunately I’m not sure my hypothesis is likely to be viable.  To start with, it seems like you could make a local hard problem easy by making the inputs and outputs of the function more complex.  The end result is that hardness increases globally, but becomes harder to detect locally.  This means my technique being successful might end up making software solutions *harder* to work with.

The second thing to consider is that inputs and outputs that include objects with methods or first class functions are kind of hard to assess for complexity.  A first class function might look very simple, but it’s internal nature might include global mutable state that is accessed concurrently in several different places.

And the third issue is that because some inputs or outputs might be hard to assess, you really need to take a very abstract mathematical view on things.  Komlogorov complexity itself seems like it has some mathematical limitations that we don’t want, and the alternatives don’t seem much better.  The root of the problem is that we have to evaluate inputs and outputs in terms of effectively computable mathematical objects.  We’re no longer really talking about constructs that people work with when they try to solve problems, but recursive topological spaces.  I’m not sure that we’re going to end up with results that actually reflect how mentally difficult it is to comprehend a problem because there may be effective abstractions to a otherwise complicated mathematical object and that’s going to throw off all our assessments for complexity.


Monday, March 9, 2015

garbage collection

Finished reading The Garbage Collection Handbook[1].  I’m glad that I read through it, but I’m pretty sure that my takeaway is something other than what the authors would have hoped for.  I completely believe the subtitle when it says, “The Art of Automatic Memory Management.”  Something *that* overly complex has got to be an “art” as opposed to something you might be able to build a discipline around.

To be fair, the concept behind garbage collection is actually relatively straightforward.  If a programmer allocates dynamic memory, then track all references to any specific entity of dynamic memory and once all references to an entity are gone then delete that memory.  Unfortunately, the modern world we live in necessitates making the whole process rather more complicated in order to support the lavish lifestyles we are currently enjoying.

Near as I can tell there’s three problems:  1) mutable state means old memory can reference new memory 2) hierarchical memory architectures means memory locality trumps any level of sophistication 3) concurrent programs means the simple must be made complex in order to avoid locks.  With their powers combined, it looks suspiciously like you have to be a super expert in order to create a garbage collector and a skilled expert if you need to troubleshoot one.

My guess is that we’re going to see mutable state and concurrency transform into something very different than what we see today in the industry standard languages.  Of course it’s also possible that something like memristors[2] might change everything, but I’m not sure that’s going to show up soon enough to avoid a paradigm shift.

Also C has it’s own problems[3] so I’m not convinced we can just ‘move back to the trees’ as it were.

I guess it’s time to figure out linear typing and regional calculus and hope the Rust[4] guys are able to blaze a trail for the rest of us.


Saturday, January 24, 2015

Distance Type Checking

Someone mentioned to me recently that they were interested in having a way to indicate space constraints into a type system such that you can tell if memory is close or far from any other given memory.  Honestly I’m not sure what their ultimate goal is (the only thing I can think of is that they want to ensure good cache usage), but by itself this is a very interesting problem to consider.

My immediate thought is that you can use a dependent typing system and embed pointer addresses into the type signatures.  Then all you are going to need is a “these objects are sufficiently close” proof and you’re done.

However, I doubt my naive solution would ever actually be useful for anything.  For starters stack randomization is going to ensure that none of your proofs for stack allocated data hold.  But maybe that’s not really a big problem.  You can just create an abstract address that’s relative to wherever your program’s stack starts.  The big problem, though, is going to be dynamically allocated memory.  Last I checked we didn’t have a way to control when heap space starts to get fragmented in a way that’s analyzable at compile time.  And really this is just the start.  For example, I don’t think I really want to be the one in charge of type checking that an arbitrary value on the stack is within a specific distance of an arbitrary value on the heap.

I assert that the problem has to do with trying to get whole program validation.  If we narrow our vision a bit, I suspect we might end up with a plausible solution.

If we can subdivide our memory constrained problems into atomic chunks, then we can implement them, individually, as part of a DSL.  The DSL itself can be type checked (easiest example would to be to just use GADTs) to ensure that our memory distance constraints hold.  Additionally the DSL can also ensure that our mini program’s total memory usage will stay smaller than whatever amount we are confident our allocator can contiguously provide at runtime.  At runtime our program can still fail if the allocator can’t provide a sufficient amount of contiguous memory, but that’s the world we live in right now anyway (malloc can fail and GCs can throw out of memory exceptions).


Wednesday, December 17, 2014

Syntax Matters

Originally, I didn’t worry too much about programming language syntax.  I think, starting out, I simply didn’t have enough exposure to the different options.  Then as I started to really get interested in programming languages, my interest was focused on the semantics of different features.  How they looked didn’t seem to make any difference when I had no idea what they actually did.  Lisp and Forth were probably my first clue that syntax would be significant.  And that makes sense because in those languages the syntax allows certain constructs to exist.  However, I think that even if there does not exist any language construct that necessitates a specific syntax, the syntax of your language still has effects.

In Let Over Lambda [1], Doug Hoyte talks about “Duality of Syntax”.  I’m not going to go into detail of this concept, but I believe he uses this principle in order to justify deviating from typical common lisp conventions as well as deciding on how to design new conventions.  This is important because it’s setting up a theory on what makes good programs and then makes decisions in that framework.  

You also see this (but sort of in the other direction) with Douglas Crockford’s Javascript the Good Parts [2].  And the punchline is something that I talked about in a previous blog post [3].  Normally, where you put a curly brace doesn’t really matter, but in javascript it does matter because semicolon insertion means that a return statement for an object literal might instead return ‘unknown’.  Douglas’ solution is to say that in javascript you always put the open curly brace on the same line because that way you won’t run into the semicolon insertion problem.  Here we’re using a syntax convention in order to avoid bad parts of the language, but my thought is that we should observe situations like what has occurred with javascript and instead create new languages that do not have these defects.

Also observe Jonathan Blow’s video series for the new programming language that he is developing [4].  Specifically try to keep an eye out for how he’s designing the syntax for functions.  A locally defined function has nearly the exact syntax as a globally defined function; that’s on purpose.  This design decision allows for a work flow that is much better optimized than if local and global functions have different syntaxes.  

And finally take a look at my previous blog post [3] where I propose that we can decide what good syntax is by determining if it possesses certain mathematical properties that are advantageous.

Now I don’t think that there is going to be a ‘one syntax to rule them all’, but I do think that the reasoning behind our syntax and conventions needs to be based off of what properties we want our code to have.  Just choosing an arbitrary set of rules for the sake of uniformity is missing the point (although it’s clearly on the right track).  What we need is a careful analysis so that we can show in a very methodical fashion that our rules are helping us and not hindering us.







Wednesday, December 10, 2014

85%

The indie game developer Jonathan Blow is currently trying his best to create a replacement for the C programming language with the intended use case of developing games (go figure).  In general I think that replacing C/C++ is a good call; my personal belief being that undefined behavior [1] is a bigger problem than the college lectures let on.  And honestly, replacing C/C++ seems to be en vogue these days [2].  

Now Jonathan’s project is still a work in progress, but he’s racked up quite the impressive length of video digression into this subject [3].  If you watch through those videos, you’ll get the impression (and part of this comes from explicit statements) that the goal of this language is to be an 85% language.  Fix the parts of C that are rather archaic, optimize for the use cases that game designers care about, avoid syntactic and semantics pitfalls that do not have any reason to exist and we can fix with zero cost, but don’t ground the thing in whatever crazy lambda calculus the cool kids are playing with this decade.  After all we would like to be able to get some work done by Thursday without having to learn category theory in order to do it.

I think that the 85% goal is a pretty good one if you already know how you want to be programming.  If your patterns stay the same and are always doing *basically* the same things, ensuring that weird edge cases fit into your system might actually be a waste of time.  You can handle those issues with code reviews, stack traces, and debuggers.

However, on the other hand, my personality doesn’t get along with very well with 85% solutions.  Economically, I see why they are absolutely necessary, and I wouldn’t hesitate to use an 85% solution as an implementation detail if my focus is on whatever the bigger picture is (assuming of course the 85% can be verified to fully satisfy the requirements).  But if my goal and focus is to understand the thing (whatever it is), I’m going to lean towards 100% solutions nearly every time.

I think the reason for this is because my learning style seems to involve some sort of combination of counter example search and backtracking.  Break everything apart into the smallest pieces you can and then try to put the pieces back together again.  Anytime you think you have a semantic meta statement look for counter examples.  If you find a counter example then something is wrong, so backtrack until you can’t find any counter examples.  Once you can’t find any counter examples for anything you’re done.  Not quite a proof of correctness, but success by exhausting all avenues of failure seems like it ought to work better than “but this worked last time”.

Here comes the punch line:  I’ve already mused about mathematical properties being valuable for programming correctness [4].  And I think the reason for that is because the context of a property is 100% (and by the way we have proofs).  You don’t have to worry about things going wrong tomorrow.  I think the reason that we need to be replacing C/C++ today is that they were the 85% languages of yesterday.  The programming techniques we’ve developed just don’t fit into languages that were developed in a very different environment.  Maybe you want an 85% solution or programming language because you need to get work done today and you don’t want to worry about proofs of correctness.  And maybe that’s going to work.  On the other hand maybe someone will misuse your solution or new techniques will arrive that can’t be safely utilized in your language.



[2] - Microsoft is supposedly developing *something*, Apple has developed Swift, Mozilla is developing Rust, Facebook seems interested in D, Google has developed Go, the C++ standard is trying to modernize.


Tuesday, December 2, 2014

Compiler : Syntax -> Program

It has seemed to me for a while now that the significant whitespace in languages like F#, Haskell, and Python is actually a really good thing.  After all, correctly formatted code already has indentation delimiting scope.  And there exist compilers that will correctly compile code that has significant whitespace (i.e. F#, Haskell, Python).  Conclusion:  Curly brackets et al are all only to make the compiler writers’ life easier, the right way to delimit scope is with significant whitespace.

However, one of the more horrifying things that I’ve encountered in programming languages are the bizarre errors that can occur from Javascript’s semicolon insertion.  Basically, the problem is that:

return {
    a : b
}

is different from 

return 
    a : b
}

The first one returns an object that has a field ‘a’ with a value ‘b’ and the second one returns ‘undefined’.

Why is semantic whitespace in the same post as javascript semicolon insertion?  Because I realized that they both share the same property (or lack thereof).  Both are discontinuous.  

The title to this post is actually an indirect reference to my previous post [1].  In that post I said that there are a bunch of mathematical properties that we really should make sure our programming constructs have (for example … some constructs really should be continuous).  *This* post is about how compilers are really just functions that map syntax into programs.  

The gist behind continuous functions is that a small change in the input should result in a small change in the output [2].  The problem with the javascript example is that a small change in the input (a single invisible character that only changes formatting and almost never has any semantic value in other languages) will result in a very large change in the resulting program semantics.  If you accept that this is a problem in javascript it seems like you have to also accept that this is a problem in languages like python (a single invisible character that only changes formatting has non-trivial semantic results).

The thesis here is that a continuous compiler is good and a discontinuous compiler is bad.  One solution might be to add in curly braces and remove semicolon injection … but I suspect there may be better ways to salvage the situation.

ReSharper is a neat add-on to Visual Studio that adds a bunch of additional syntax highlighting (among other features).  One of the things I’ve really enjoyed is the grey out effect that occurs when a function is not called anywhere.  It becomes rather easy to determine at a glance when a function has become obsolete.  And I suspect that this is also the answer to solve the semantic whitespace discontinuity.

The only real problem I have is that I want a continuous syntax to semantic mapping.  So if I can convert the small change we see with semantic whitespace into a big change then the problem should be solved.  If we take a page from ReSharper, then we can change the color scheme slightly (maybe only a tint change) when the scope of an expression is different.  Each scope can have a different tint and then the small change causing a large differing output will be converted into a noticeable change causing a noticeable differing output.  Additionally, this technique could also be leveraged to address some of the issues with semicolon insertion in javascript.





[2] - Things get complicated when you aren’t dealing with geometry or metric spaces.  What do you mean by “small change” when distance doesn’t exist in your domain?  Well, topology has a solution for that, but I’m not going to try to build a topological space for syntax and program semantics.  I think the informal ideas for small change is clear enough to get the idea in this instance.