Monday, April 27, 2015

Static v. Dynamic

I have a previous post about static and dynamic types [1] and I think I still agree with everything my past self said.  However, I think there’s a couple additional points that I can make now that I either couldn’t think of a good way to say previously or have come up with since my original post.

The typical discourse seems to be that those in favor of statically typed languages indicate that their programs ‘just work’ while those in favor of dynamically typed languages will say that static types get in their way and their programs totally just work as well.

I think the position of static type system proponents can be disingenuous.  A static type system isn’t going to solve all of your problems; it’s just going to ensure that your “micro” interfaces line up correctly.  This will keep you from using an integer instead of a list of integers, but it isn’t necessarily going to prevent you from building a program that violates the semantics of the problem you’re actually trying to solve with your program.

Consider if you had a Haskell like language, but you only used tuples and integers.  It doesn’t seem like the type system is really going to be helping you very much with ensuring that your program solves the problem at hand correctly.  However, on the other hand, if you had a Lua like language, but it was also equipped with algebraic data types and pattern matching.  *That* language would probably help you encode and represent your problem and greatly assist you in coming to a correct solution even without static checks.

Additionally, learning a type system isn’t free.  There’s a lot of potential work to do in order to be able to effectively handle a type checker.  Some of that work is understanding the type theory that they were implementing (and there’s more than one possibility to choose from), and some of that work is understanding the implementation details that affect what you’re able to do.  Sometimes the things that you want to do, which make perfect sense and are completely consistent and safe, are not allowed because of theoretical or implementation artifacts.

Finally, those who are really good at programming in dynamically typed languages are running a kind of a type checker in their minds.  They understand the nature of what their functions accomplish and they’re able to ensure that they work correctly with the rest of the program.  They don’t have to wait for anyone to figure out how to type check what they’re able to work out for themselves.

But that isn’t the final paragraph … is it?  So far this post *sounds* like it’s in favor of dynamically typed languages, but this is really a pro-statically typed language post.  However, before I went into how I feel about static typing, I really wanted to point out that dynamically typed proponents have legitimate view points and concerns.  

My first point was that micro interface type checking (or perhaps I should say sub-semantic type checking) isn’t all that great.  And I think before you get the hang of dealing with a type checker like what Haskell or ML provides this *is* a thing that is very frustrating.  Mysterious “failed to unify” statements that you have no control over complaining about silly things that just work in other mediums really hurts and that’s not a problem you can just ignore.  However, once you do get the hang of it … not understanding stupid things that don’t work really hurts, and the last thing I want to do is have my program produce incorrect output because I forgot that the function I’m calling returns a “List (Maybe Int)” and I’m mistakenly believing it returns a “Maybe (List Int)”.  Having something sit there and tap me on the shoulder to help me with silly mistakes that don’t have anything to do with the actual problem I care about is actually really nice because it means I can put more of my effort into thinking about the problem I actually care about.  If my program type checks and it still doesn’t work, I don’t have to worry about a lot of types of errors because the type checker already took care of those problems.  I can devote all of my cognitive effort towards the problem that matters.

Additionally, I mentioned that a language with algebraic data types and pattern matching, but no static type checking might be just as good as a language with static type checking.  However, advanced techniques allow you to push more and more of your real world problem’s semantics into the type system.  Phantom types, general algebraic data types, dependent types, etc.  If we can use advanced techniques to represent the problems we care about in the type system, then we can not only get automatic assistance in checking for stupid mistakes, but we can get similar assistance in checking for big relevant domain specific mistakes.

Finally, I’m not exactly sure where else dynamic typing has to go.  It would be nice to be pleasantly surprised by new developments in what you can do with dynamic typing, but I suspect there’s not that much more you can do with it.  On the other hand, the ultimate goal of static typing is to be nearly as permissive as a dynamically typed language, but also provide you with checks on low level mistakes, checks on semantic mistakes, and even suggestions for how you can proceed with a problem when you get stuck.  Today I don’t think it’s fair to tell everyone that they need to switch to a statically typed language, but I believe that tomorrow it’s going to be more easy and also more compelling to be solving your problems with a static type checker.

Programming Language Feature Diagrams as Cognitive Complexity

After observing things in the software engineering world for about eight years now, I’m vaguely concerned that people approach problems in fundamentally incompatible ways.  The adage “the right tool for the job” makes a lot of sense when you compare web programming to embedded development, but I’m afraid that “the right tool for the person” is something we’re also going to have to come to terms with.

Regardless of how we come to terms with differences between individuals, I still really want a universal metric for measuring how difficult a problem actually is.  I’ve made several attempts at this [1][3], but so far I haven’t encountered anything that would allow me to assign numbers to a problem that indicates how difficult a problem is to cognitively comprehend.  An early attempt was Cognitive Complexity Classes [3].  Basically, I noticed that working with lists is pretty easy, but working with a graph is kind of difficult, so I hypothesized that you could divide algorithms based off of some associated data structure that *matches* the algorithm.  I don’t think this idea is really valid, however with my current line of thinking it is actually surprisingly close.

My most recent attempt at cracking this problem was Mental Comprehension Complexity [1].  This guy might actually be viable, but it would need a lot of refinement before it starts to yield results.  The gist here is that I hypothesize that a function with simple inputs and outputs and also with a very complex internal nature would be difficult to comprehend.  Simple vs. complex would be measured by Kolmogorov Complexity [4] or something similar to it.  I think Mental Comprehension Complexity isn’t going to go very well because in order to handle inputs/outputs of an algorithm that contain turing complete features (i.e. an object that has methods attached to it) you’ll need to do a bunch of work to represent the algorithm in terms of an effectively computable function (like maybe a turing machine).  In the end I’m not sure that the mathematical statements we’ll be coming up with will actually apply to the medium that programmers typically work with (something closer to C#).  And that’s kind of misses the goal I was hoping for (Mental or Cognitive complexity not some sort of algorithmic complexity).

Consider the number 5

So start by considering the number 5.  It’s about the simplest thing you can consider; if you encountered it in a programming context, you wouldn’t be intimidated by having to keep track of it mentally.  However, if we add more constraints to the number 5, it begins to become more problematic.  Maybe it’s important that this number must be odd.  Or maybe it’s important that this number be prime.  Or maybe a triangular number.  As we add constraints the amount of mental effort we have to put forth increases.  I think this is because with a single number the space we have to consider is “any one of all the things”, but once our space has constraints we now have to consider, “one of the things, but not any one of the following based off of this complex formula.”

So take another step up on the Cognitive Complexity Class ladder.  A list of *things* is pretty simple to comprehend, but a list of things that all have constraints is going to be similarly hard to comprehend as a constrained number.  And it can get so much worse.  Think about a sorted list.  In this case each item in the list now has certain rules that go with it that are based on the other items in the list.  This can get arbitrarily complex.  Perhaps the second element has certain constraints, but only if the fifth element possesses other constraints.

If we ascend into tree data structures, we can have entire subtrees that have their own special rules based off of the constraints we place on their roots.  And we can go that much further with graph data structures.

This isn’t Cognitive Complexity Classes because that framework assumed that you could see algorithms as having the “essence” of certain data structures.  However, I am borrowing the basic procession to illustrate how we can create and get an approximate feeling for what complex and difficult to comprehend things look like.  I think the basic forces involved are the size of the space that we are working in and the number of constraints that have been put on the elements in that space.  Additionally, I suspect the nature of the grouping of the elements in any given space is relevant as well.  If we have a simple property like even or odd, that’s not too hard to comprehend; if we have something harder to predict like primality, then we have to spend additional effort to understand when we are looking at a valid element in that space.  

However, this is just static things like data.  Is there a way to extend this idea to handle things like objects with methods or a function that accesses global mutable data?  

Enter Programming Language Feature Diagrams (PLFD)

I came up with an idea a while back about how to talk about certain types of programming language features [2].  I didn’t really like the statement “exceptions are exceptional” because it didn’t actually explain anything that was going on.  So I drew a tree shaped structure with a box at each node and said, “exemptions go up this thing until it hits a circle (representing try/catch statement).”  I like PLFDs because they let you think of features in terms of what they do, not in terms of when you would use them.  An exception being a dynamic stack communication device is better than it being for exceptional situations because in the latter case you just know to only use them sparingly, but in the first case you know that there needs to be a try/catch statement above you in the call stack or else you’ll crash the program (not the best user experience).  Additionally, when you see a subtree drawn out with an indicator of where the relevant try/catch lives, you know how you can refactor a program without having to worry about adding additional try/catch statements to retain the semantics of the original un-refactored program.

These things can be used to measure cognitive comprehension complexity in much the same way as we were above with the data structures.

I noticed that PLFDs are graphs (lazy evaluated graphs I guess, but totally graphs) with a bunch of very strange constraints on them.  Consider a mutable reference.  When you have a mutable reference and you pass it to functions you call in the function the reference is initialized in, what you’re actually doing is setting up a bunch of different constraints between the functions that make use of or alter that mutable reference.  This is exactly the situation we have above with the data structures except instead of a chunk of data that exists at one time we have a chunk of data that exists over time.  However, if you kept track of each value returned by every function call, you could fill out the entire call tree as if it were a data structure.

Different features used in different ways imply different sets of constraints on the final program.  How big the space is and how many and what kind of constraints you have determine how hard it is to understand the program.


If you can create a programming language feature diagram for every feature in a programming language, then you should be able to measure the cognitive comprehension complexity of the entire program (including highlighting sub-complexity of sub-parts of the program (assuming the program is that well structured)).  For data I suspect you really need some sort of dependent types to allow the system to know how cognitively complex the values actually are.  But this system will be able to handle both data structures and objects and higher order functions.  Because a method is just another diagram that travels around with data, you should be able to use this technique to also handle higher order (and turing complete) situations in the same terms as the programmer is using, which will hopefully mean that any conclusions you draw from cognitive complexity will actually apply to programs that people write instead of mathematical constructs that are used for proofs.

What the actual measurement is at the end of the day is something that I still need to work at.  It seems like the output is going to be a space of some sort and then a description of how easy it is to select legal elements in that space.  Additionally, there might need to be some work done to identify what groupings of legal elements are easier or harder to wrap your head around.

Friday, April 24, 2015

Mental Comprehension Complexity

My initial goal as of late was to be able to identify all problems that are hard for people to devise solutions for.  You may think, “Hard for you, but easy for me.  And isn’t intelligence versatile and non-transitive?  So perhaps something might also be easy for you and hard for me.  If this is the case then how do you plan to have some *universal* metric for how difficult things are for people to accomplish?”  And I think this is a reasonable thing to consider.  Can we really create some sort of ordered lattice of problems where the higher you ascend the harder things get *regardless* of who you are or what experience you have?  Well, I had a hypothesis for trying to accomplish just that.

Step 1

Kolmogorov complexity describes how complex mathematical objects are.  I don’t think I can use Kolmogorov complexity itself for my purposes, but I wanted something similar.  Basically, given an input or an output I want to be able to rank how much mental energy you have to expend to think about it (at least relative to other inputs and outputs).

Step 2

Define a hard problem as any problem that has a solution implemented with a function where the inputs and outputs have a low relative mental comprehension complexity when compared to structures internal to the function.  The idea here is that this function is hard to comprehend because considering it solely in terms of its inputs or outputs will miss out on significant internal features.  Failure to be mindful of these internal features makes the function easier to implement incorrectly (some inputs result in invalid internal states) or easier to use incorrectly (expecting some outputs that will never occur because of certain features of the internal state).  In order to correctly implement and utilize this difficult function you have to be able to project the complex internal nature of the function onto the inputs and the outputs.  The bigger the disparity and the greater the internal complexity the harder the function is to comprehend.

Step 3

If you can find a mathematically solid definition for the hard problem in step 2, then you can build a machine that recognizes hard problems (in terms of people being able to find solutions for them).  This is the first goal.  We have a machine for recognizing hard problems.

Step 4

Try to see if the machine from step 3 can be adapted to generate an infinite number of unique hard problems.  The naive solution is to just generate a random problem and then use the machine from step 3 to decide if the problem is hard.  If the problem is hard you return it if the problem is not hard you generate another random problem.  I’m not sure if this machine is particularly useful by itself, but how we use it in step 5 is important.

Step 5

Begin giving people the hard problems generated in step 4.  Some people will be able to solve familiar hard problems simply because they have seen them before.  But if the hard problem definition really does encode what it means for a problem to be hard, then eventually you’ll generate a hard problem that they do not know how to solve from experience.  This step is really only describing a way to frustrate people, but end goal is to find or train a person who can successfully solve hard problems.

Step 6

Encounter a person who can consistently solve hard problems even when they are not familiar with them.  The hope is that their technique will either be to find a way to decrease the required internal complexity of the difficult problem OR to find a way to divide the hard problem into two or more non-hard problems.  With any luck these two techniques (or any others that show up) can be adapted to arbitrary hard problems.  This is the second goal.  A method for dealing with arbitrary hard problems.

Of course if you encounter a person who can easily solve hard problems, but not by making the problem more approachable or being able to explain why they are easy, then we haven’t really gained anything.  Although I suppose you could just hire that person to solve your hard problems.

But now what

Unfortunately I’m not sure my hypothesis is likely to be viable.  To start with, it seems like you could make a local hard problem easy by making the inputs and outputs of the function more complex.  The end result is that hardness increases globally, but becomes harder to detect locally.  This means my technique being successful might end up making software solutions *harder* to work with.

The second thing to consider is that inputs and outputs that include objects with methods or first class functions are kind of hard to assess for complexity.  A first class function might look very simple, but it’s internal nature might include global mutable state that is accessed concurrently in several different places.

And the third issue is that because some inputs or outputs might be hard to assess, you really need to take a very abstract mathematical view on things.  Komlogorov complexity itself seems like it has some mathematical limitations that we don’t want, and the alternatives don’t seem much better.  The root of the problem is that we have to evaluate inputs and outputs in terms of effectively computable mathematical objects.  We’re no longer really talking about constructs that people work with when they try to solve problems, but recursive topological spaces.  I’m not sure that we’re going to end up with results that actually reflect how mentally difficult it is to comprehend a problem because there may be effective abstractions to a otherwise complicated mathematical object and that’s going to throw off all our assessments for complexity.

Monday, March 9, 2015

garbage collection

Finished reading The Garbage Collection Handbook[1].  I’m glad that I read through it, but I’m pretty sure that my takeaway is something other than what the authors would have hoped for.  I completely believe the subtitle when it says, “The Art of Automatic Memory Management.”  Something *that* overly complex has got to be an “art” as opposed to something you might be able to build a discipline around.

To be fair, the concept behind garbage collection is actually relatively straightforward.  If a programmer allocates dynamic memory, then track all references to any specific entity of dynamic memory and once all references to an entity are gone then delete that memory.  Unfortunately, the modern world we live in necessitates making the whole process rather more complicated in order to support the lavish lifestyles we are currently enjoying.

Near as I can tell there’s three problems:  1) mutable state means old memory can reference new memory 2) hierarchical memory architectures means memory locality trumps any level of sophistication 3) concurrent programs means the simple must be made complex in order to avoid locks.  With their powers combined, it looks suspiciously like you have to be a super expert in order to create a garbage collector and a skilled expert if you need to troubleshoot one.

My guess is that we’re going to see mutable state and concurrency transform into something very different than what we see today in the industry standard languages.  Of course it’s also possible that something like memristors[2] might change everything, but I’m not sure that’s going to show up soon enough to avoid a paradigm shift.

Also C has it’s own problems[3] so I’m not convinced we can just ‘move back to the trees’ as it were.

I guess it’s time to figure out linear typing and regional calculus and hope the Rust[4] guys are able to blaze a trail for the rest of us.

Saturday, January 24, 2015

Distance Type Checking

Someone mentioned to me recently that they were interested in having a way to indicate space constraints into a type system such that you can tell if memory is close or far from any other given memory.  Honestly I’m not sure what their ultimate goal is (the only thing I can think of is that they want to ensure good cache usage), but by itself this is a very interesting problem to consider.

My immediate thought is that you can use a dependent typing system and embed pointer addresses into the type signatures.  Then all you are going to need is a “these objects are sufficiently close” proof and you’re done.

However, I doubt my naive solution would ever actually be useful for anything.  For starters stack randomization is going to ensure that none of your proofs for stack allocated data hold.  But maybe that’s not really a big problem.  You can just create an abstract address that’s relative to wherever your program’s stack starts.  The big problem, though, is going to be dynamically allocated memory.  Last I checked we didn’t have a way to control when heap space starts to get fragmented in a way that’s analyzable at compile time.  And really this is just the start.  For example, I don’t think I really want to be the one in charge of type checking that an arbitrary value on the stack is within a specific distance of an arbitrary value on the heap.

I assert that the problem has to do with trying to get whole program validation.  If we narrow our vision a bit, I suspect we might end up with a plausible solution.

If we can subdivide our memory constrained problems into atomic chunks, then we can implement them, individually, as part of a DSL.  The DSL itself can be type checked (easiest example would to be to just use GADTs) to ensure that our memory distance constraints hold.  Additionally the DSL can also ensure that our mini program’s total memory usage will stay smaller than whatever amount we are confident our allocator can contiguously provide at runtime.  At runtime our program can still fail if the allocator can’t provide a sufficient amount of contiguous memory, but that’s the world we live in right now anyway (malloc can fail and GCs can throw out of memory exceptions).