Wednesday, August 6, 2014

two three one

The Feynman problem-solving algorithm was a joke that Murray Gell-Mann [0] made in order to talk about the seemingly magical way Richard Feynman came up with solutions to problems (a trait that was noted by more than one person [12]).  The algorithm works like this:  1) write down the problem, 2) think very hard, 3) write down the solution.  While I think the reason why this is a joke is pretty obvious, the frustrating thing is that occasionally this is the only honest way to proceed with solving a problem.  Looking back on the landscape of dead ends, blind alleys, and failed techniques, the victors are forced to concede that the answer really did come from the 20 minute break when they went off to grab some yogurt.  The solution came from a process that was not only non-continuous, but actively disjoint from as far as anyone can tell the rest of reality. 

I’ve got a different algorithm that I go through although alas it seems rather less directed than Feynman’s.  1) think very hard, 2) write down the solution, 3) write down the problem.  Like … it’s nice when you realize that what you’ve been thinking about actually applies to things the rest of the world cares about, however the process of getting there involves thoughts that are difficult justify.  

What I’ve been thinking about ‘recently’ feels like a *thing*, but unfortunately I’m pretty deep into the first step of Inman’s problem solving algorithm.  This might simply be me rediscovering something that everyone else is already aware of or it might be me noticing a bunch of suspicious looking things that in reality do not have any actual relation to one another.  Hard to tell, but I want to try to express some of this.

This probably started with type theory.  Which then lead to category theory.  It turns out you can use F Algebras [3] to convert a recursive data structure into a non recursive data structure.  And I guess this is just the category theory description of work from domain theory [4] (check out lecture 2 about recursively defined programs).  Self referential mathematical objects caused a lot of problems in set theory, so there was some interest in eliminating these problems from recursive definitions in lambda calculus in order to ensure there was no formal issues that would prevent it from being used as a model for denotational semantics.  And while I’m on the domain theory topic, I’ll go ahead and mention that domain theory’s creator Dana Scott also came up with a rather neat observation:   computable functions are just scott continuous functions [13].  

Okay, so far we’ve seen two weird things.  The first is that there is some sort of relationship between recursive data and non recursive data.  The second is that we can treat computability as a mathematical object.  Let’s move on.

While poking around in category theory and f algebras, of course I couldn’t help notice discussions regarding co recursion [2].  Specifically, a HN user, Pitarou, mentioned something about co recursion and transforms [1].  The gist of the comment is that there is asserted, but not proved, a path from a recursive algorithm to a stream algorithm via co recursion and then from the stream algorithm to an iterative algorithm via deforestation techniques and tail call optimization.  

Much like the first observation concerning a transform from recursive data structures to non recursive data structures, we now see a transform from recursive algorithms to iterative algorithms.  Granted we don’t see the same rigor as we do from domain theory’s explorations, however just the idea is the important bit.  Keep moving.

Derandomization [5] is a technique for converting a randomized algorithm into a non randomized algorithm with possibly some slow down.  There are a bunch of techniques for accomplishing this, but I don’t think there’s any proof yet that all randomized algorithms can be derandomized (although I believe it is thought that this is the case).  This of course sounds vaguely similar to Pitarou’s recursive algorithm to iterative algorithm transform.  A thing that sounds about right but no proof that it’s possible for all inhabitants of its space.

Relatedly (because complexity theory) is the subtitle to Scott Aaronson’s blog [6].  “Quantum computers … can be simulated classically with exponential slowdown.”  While I’m aware that blog subtitles aren’t proofs, I have heard this before from several independent sources (and Scott Aaronson is a much better complexity theorist than I will probably ever be), so let’s call this one “proved” for the time being.

Interesting, we’ve got a relationship between recursive data and non recursive data and a relationship between classical computers and quantum computers both of which stand on firm theoretical grounding.  Additionally, we’ve got derandomization that sounds good but is unproved and a recursion to iterative process that also sounds good but is unproven.  All of that seems vaguely like the same sorts of things are going on (in some sense of those words).

Along comes HN user tel who starts a pretty interesting blog series concerning mutable algorithms in an immutable context [7].  Meanwhile HN user jerf clarifies and indicates a transform from mutable algorithm to immutable algorithm [8] really ought to exist.  I’m not sure that I buy jerf’s argument, but that’s just because I haven’t been able to get the whole algorithm into my head yet.  I think there’s some ambiguity with the description that will be resolved if I actually step through it manually with an example, however I’m not sure what problem I’m actually trying to solve so I haven’t convinced myself to go through that exercise yet.  Either way I’m already dealing with several other transforms that don’t have “totality” proofs yet, so I’m perfectly happy placing this (rather incredible even if only partial) transform in a pending state mentally for now.  

The point is that this yet another computer science thingy transform from one side of the fence to the other.  Normally the discourse is concerning iterative vs recursive or immutable vs mutable or classical vs quantum.  However, again and again I’m bumping into arguments that these things are really the same thing or perhaps one of them is just the ‘bigger’ version of the other.  Okay, only two more things to work though and I think I’m done.

I’ve long thought that lisp macros [10] and Λ abstraction in system f [11] feel really similar.  I’m not exactly sure what I’m suggesting here, but it feels sort of like it belongs in the same (or a similar) category as the rest of the transforms here.

Finally, I wrote kind of a weird monadic parser [9].  I’ve also used a similar method to convert a data structure that had a lot of implicit properties into a data structure that made those properties explicit.  The idea being that maybe there’s some sort of relationship between unstructured data and structured data that can be represented by monads, and maybe that sort of thing is similar to some of the other ideas I’ve been encountering.

Short story long.  It looks a lot like *something* is going on between things that are normally thought of as opposite sides of the spectrum.  

Thursday, January 9, 2014

Good Software

For some time now, I've been working on the theory that people are different.  Go ahead pause to consider if you really want to continue reading.  

Seriously?  People are different ... and you've been working on this theory "for some time"?  Maybe if I left now I would have enough time to finish that sudoku.

Well, there's a little bit more to it than just that.  So let me set the scene a bit and I think you'll see that there's something here to consider.  Also there could be some horrifying consequences that will render the Software Engineering Industry asunder.  Destroying the lives of hard working men and women whose only crime was doing their jobs.  Meanwhile these same consequences will facilitate the first megalomaniac sociopath who realizes the implications and has the means and ambition to act on them to rebuild the Software Engineering Industry in their own image.  You know of what I speak.  A barren wasteland where Software Engineers form primitive tribes to survive.  And tire armor.  Yeah!  Tire armor for everybody!  TIRE ARMOR FOREVER!!!

Okay, maybe not that.  But it might explain *some* of the reasons why everyone yells at each other so much about programming languages.  And … it could provide someone who's really clever (and owns a software engineering company) to build more efficient teams and ensure better code base quality over time.

The situation today

Is that software kind of sucks.  The best in the world are able to just barely create software that does what we want without emailing our social security number straight identity thieves.  

Right now your best bet for getting software that works is to find a small group of highly motivated smart people who get along (but who also secretly hate each other, just a little bit) and put them into a room under the guidance of "that guy".  You know.  That guy who pesters all of your stake holders for every minor detail of what they want until the stake holders decide that they were better off doing it manually.  That guy who refuses to sign off on the release until you get 200% code coverage, user studies successfully completed by a homeless man from a different country who doesn't speak the language the application displays and who is completely illiterate, and finally all of the integration tests are successfully run by a blind monkey.  No not a code monkey like a real monkey.

Then you give that team complete autonomy and a budget that is two hundred percent of what the estimates tell you, and they still fail to deliver anything useful half of the time.

All of this assumes that you don't need any security in your application.  If it needs to be secure, then you need the same thing with some minor changes.  Your highly motivated team of smart people also need to have PhDs.  Like each one needs to have multiple PhDs.  Also they each need no less than five (5) and no more than eight (8) graduate students all of whom were child prodigies.  Finally, they each need to have been doing computer security work for at least 30 years (and it wouldn't hurt if they were spies in the cold war).

You still need "that guy", but he also needs some upgrades.  And unfortunately, you'll never find him.  He has to find you.  Just assemble your team and put out a press release that you're interested in maybe creating a secure application some day.  Pay the team to do research or something.  Perhaps they can create prototypes of what the final secure application could be like.  Just don't get jittery and use one of the prototypes because it's "good enough".  Because it's not good enough.  

Eventually, that guy will show up.  You'll probably find him in your house one day after work.  He will ask you questions and you will answer them.  Don't worry about getting them correct because you cannot get them correct.  He is only interested in how you fail.  

Once "that guy" finds you and he leads your team to construct your secure application, do not forget the final step.  It is vital that you have a plan for when some bored Russian undergrad proves that the Chinese have been putting hardware viruses in the sand that is used to create the computer chips that your application runs on and *that* allows the NSA to trigger the backdoor that they hid in number theory, which devastates the entire security model of your application causing it to immediately email your social security number to every identity thief on the planet.

The right way to write software

Is almost definitely a myth.  Which is actually kind of the point that I wanted to start with.  People are different.  This has consequences for how they create software.  Unfortunately, the completely true picture I drew above concerning the state of software development today gives a very big incentive to charlatans who will try to sell you "the right way to write software".  Try not to be too hard on them, there's a large chance they don't realize that they are charlatans.  Their methods might even work … for them.  And maybe their methods even work for people sufficiently like them.  But their methods don't work for everyone.  Also if you try to cheat by only employing people who are all sufficiently similar, you'll get destroyed when they encounter a problem that they can't get their heads wrapped around, but which literally any other developer on the planet could solve in twenty minutes (but not if they have to code like *that*).

At one point in time I think I really wanted to find "the one way" to program.  It was going to be the right way to program, it would let me work faster, it would eliminate bugs, and I could teach it to everyone.  Then one day when I was knee deep in type theory, trying to decide if domain theory or category theory should be the next topic that I venture into, when I happened to consider the economic side of perfect software.  Can a company actually bill it's clients enough or can an internal software division procure a sufficient budget to pay for the software engineer who is also a computer science category theorist?  The only reason that I spend time trying to decide whether I like dependent types or general algebraic data types with polymorphic kinds better is because I'm lonely and weird.  Normal people have lives.

And then there's the evidence.  I kept on bumping into problems that I couldn't solve without much frustration, but normal software engineers seem to be able to tackle without any problems.  Can trying to figure out Sequent Calculus in your free time ruin you as a software engineer?  Well, no that doesn't make sense either because I also kept bumping into problems that normal software engineers found incomprehensible, but that I could solve trivially.  A suspicious pattern that suggests that there's no best way of thinking that allows you to solve all problems.  Let's take a look at an example.

TDD vs Parametric Polymorphism (ie type theory stuff)

There are two claims that have been floating around on the internet.  They sound very similar to each other, but you wouldn't want to assert that they are the same.  At least don't if you want to stay out of trouble.  The first is the TDD claim:  "I write my tests first and that allows me to write correct code."  The second is the Hindley-Milner type inference claim:  "When my program type checks, then I know it's correct."

To me, both of these claims sound like magical thinking.  I do this thing, which has a totally intrinsic and completely unexplained and unexplored ability to manifest "correct" software regardless of the constraints of physical reality, and my software just somehow works completely correct.  The first time.  Every time.  

I don't buy that TDD is *the* way to write software because I have rarely found anyone who can actually explain to me what it is or why it works.  The people who do give me an actionable definition have always given me a different actionable definition than the previous person who gave me an actionable definition.  I suppose I wouldn't be opposed to trying TDD in practice, however I would first need to actually understand what it is I'm suppose to be doing.  Write the test first and then correct software is emitted isn't a well defined algorithm.

I don't buy that Hindley-Milner type inference is *the* way to write software because I know about dependent types.  And universal quantification, and polymorphic kinds, and general algebraic data types.  Even assuming you upgrade your Parametric Polymorphism, popularized in languages that use Hindley-Milner like inference such as ML or Haskell, to a dependently typed system and then go on to prove everything you can think of about every last function, then you still have to worry about an impedance mismatch between the problem you're solving and the actual code.  A perfectly proven algorithm doesn't protect you against misidentifying the problem you were trying to solve.

However!  Both types of people write software that works!  It isn't perfect, but it does mostly solve the problem and allows you to get on with your life.


So my theory at least for now is that people are different.  They don't think the same, they don't work the same, and they don't write software the same.  And that's just something we're going to have to deal with moving forward in the Software Engineering Industry.  I'm not sure exactly what this means that we should do, but I do think this means that we can't settle on "the one way" to write software.  At least if we want to be able to write good software.

Friday, December 27, 2013

Figuring out how Category Theory works

People occasionally ask me what I do.  Kind of a weird question really.  Like, I'm pretty big on breathing.  I try to keep on doing it *all* the time.  So statistically speaking the best answer should be something about that.  But that's not what they mean when they ask me what I do.  They want to know about my job.  Again, this seems kind of like a strange question to me, but perhaps it's just a safe question.  I imagine most people have something to say about their job, so people have learned that if you have to engage in conversation with a stranger that you might try asking them what they do.

When I answer this question, I tell a story about developing desktop applications that communicate and process data from a wide range of devices, typically medical devices.  This provides a satisfactory direction for the conversation to travel in.  Or at least it seems to.

But this isn't really representative of what I do.  It answers the intended question, but ignores the literal question.  What I do, is figure out how things work.

I've spent quite a bit of time figuring out how programming languages work.  And I'm not just talking about compilers.  I've also been very interested in understanding the meaning behind different programming language constructs.  A great example is the various flavors of macros from various flavors of Lisp.  It's not entirely obvious what a macro is, and at the same time there is no shortage of talk concerning "macros" when you start to look into them.  It takes awhile to sort everything out when your upbringing involves a bunch of plain C with a helping of "don't use macros; they're bad for you."

Anyway, eventually I bumped into Haskell and at this point monads and category theory reared it's … well I wouldn't say "ugly", but definitely mysterious and mostly incomprehensible head.  If you've been present in the online, software engineering, y-combinator-ish [1] crowd then by now you've figured out what kind of blog post this is.  This is a "monads are like …" article.  Don't panic though, I suspect things will be a little bit different here.

Monads are like abstract mathematical objects with some properties

Okay, so the heading here isn't quite accurate.  Monads aren't *like* abstract mathematical objects with some properties; monads *are* abstract mathematical objects with some properties.

Here's the thing.  I noticed at some point that mathematicians don't seem to be interested in the same things that software engineers are interested in.  This of course makes sense because if it wasn't true then we wouldn't need different words for mathematicians and software engineers; they would be the same profession.  

Mathematicians seem very interested in taking abstract objects, describing their properties, and then proving different things that you can do with these objects.  Software engineers are interested in generating code that solves problems.

With this in mind, consider where Category theory comes from.  It starts with abstract algebra[2].  Abstract algebra is more or less the properties you see on number systems, but you break things down to be more simple or you generalize to handle more cases.  The result is a bunch of abstract objects that have certain properties.  Move forward by proving new statements.  Kind of an exploration of things that don't concretely exist.  Group theory and ring theory both seem to be rather big.  The extended study of two different kinds of abstract objects.

Next check out algebraic topology[3].  I'm not entirely sure how to introduce this field.  It's kind of like the algebraic study of shapes, but the shape part doesn't matter so much … at least how muggles normally think of the concept of shapes.  Anyway, there's a bunch of fields within algebraic topology.  Knot theory, homotopy, low dimensional algebraic topology, etc.  There's plenty to study.

This is all the origin of category theory[4].  If you don't really want to investigate category theory any further, then perhaps you don't need to think much about group theory or knot theory.  But if you do check out category theory in more depth (like maybe by reading this book [5]), then you're going to find a *lot* of references to these fields (and other's not mentioned here).  

Category theory deals with abstracting patterns found in other mathematical fields.  You'll see a lot of concepts concerning groups and rings and sets, but ultimately I believe the motivation was to examine different concepts in algebraic topology.  And in the end category theory starts examining itself as an object and things get kind of interesting.

Which brings us back nearly to where we started.  Mathematicians have their own job and it's not the same job as programmers or software engineers or computer scientists.  So what does category theory (and by extension monads) have to do with writing software?  Well for the most part it doesn't.

At least until you take a look at type theory.

Types are Categories

It turns out that if you have a type system like Hindly-Milner[6] or perhaps more formally System F [7] (although I think anything on the Lambda Cube works out [8]).  The type system fits the definition of a category in Category Theory.  

This means that theorems that you can work out in Category theory hold true for certain programming constructs when the proper type system is attached.  

So technically you don't need Category theory.  There's more than one way to prove that something is true.  Most computer programs don't need the full generality that Category theory provides.  You really only need to prove that your function works correctly for a relatively small number of possible inputs (at least most of the time).  But if you do have a proof in Category theory and your type system qualifies, then why not use the results.

Monads themselves are useful not because they are *like* any particular thing, but because there's a bunch of proofs associated with them.  So if you follow all of the axioms associated with a monad, then you get all of the assurances of one for free.  And for a pure functional language like Haskell[9], you can use monads to represent effects (like IO or changing state) and your type system doesn't go crazy.  More specifically, I think you're able to use results from Kleisli Categories [10] (which you can use to get the corresponding monad) to ensure that functions that go from A to B to C still type check when they instead go from IO A to IO B to IO C.  And in this instance IO <whatever> just means a function that has an IO effect embedded inside of it somewhere and it returns something of type <whatever>.

Is that all?

Okay, so you can use Category theory to make sure your Haskell type system works correctly.  If that's it, then can we just ignore it?  LIke assuming we stay away from Haskell.

Well, for the most part, sure why not.  But unfortunately, you have to watch out for two things.  The first one is relatively simple.  You have to be able to refute people who try to sell you snake oil because it runs on the power of monads and categories and complicated math stuff.  The second is a little more complicated.

It turns out people are doing crazy things with category theory.  Like, they're getting Haskell to go faster than C [11].  This is kind of a big deal because common sense tells you that high level language means it's easier to program, but slower than low level languages.  

I *think* the reason that it works in Haskell is because someone figured out that you can get some surprisingly powerful computations to fit into a sub-turing complete system by modeling your recursive data structures non-recursively using F-Algebras [12].  Then you feed the whole thing to your processor's SSE [13] instructions.

Again, you could probably make this work without the Category theory proofs, but it is a bit suspicious that the guys using Category theory got there first.

The downside is that you can't just assume everyone using Category theory to advertise their product is a crank.  The upside is that you don't have to think of monads as shoeboxes anymore.  






[5] - Categories for the Working Mathematician; ISBN 978-1-4419-3123-8







Exploiting Vector Instructions with Generalized Stream Fusion




Thursday, November 7, 2013

Mathematical notation

One problem that I've had with mathematics is trying to figure out what all of the jargon and notation means.  You might be able to brush it off as merely shorthand, but that doesn't quite do it justice.  I don't think I'm ready to talk about math notation and vocabulary at length, but I did think of an interesting idea last week that I feel at least highlights why notation is useful.

Exponents

Exponents are straightforward enough; it's just a notation for repeated multiplication.

103 = 10 ⋅ 10 ⋅ 10 = 1000

The rules you can use to manipulate exponents are also pretty simple.

xy ⋅ xz = xy+z
(xy)z = xy ⋅ z

So given an exponent we can continue to manipulate it as a number.  It doesn't become some magical box that can no longer interact with the rest of mathematics.  You can just treat it like a normal number for the most part.

Big Exponents

Now consider the following number:  10100

Let's try writing out this number by hand.


So this is pretty easy.  Although a bit cumbersome.  One followed by one hundred zeros.  It's very convenient to use 10100 instead of the monstrosity above, but I think that I could get along without using the exponent notation if I really had to. 

This number actually has a special name:  Googol.

Bigger Exponents 

Before we go even bigger, let's consider the number of atoms in the universe.  Supposedly, there are 1080 atoms in the universe.  I didn't research this number very carefully, it's just what I saw repeated once or twice with some internet searches.  The number itself doesn't matter too much; it's the principle I'm going for.  So for now let's just assume that 1080 is about right.

I'm going to stack exponents now.  It might look scary, but it's pretty simple when you think about it.  It's just giving our exponent an exponent.

1010100 

Like the number before, this number also has a special name:  Googolplex.

So this might look similar to the rule given earlier ( (xy)z = xy ⋅ z ), but it's not.  The rule earlier was just decomposing (or recomposing as the case may be) factors of your exponent.  This number is an exponent that is represented as another exponent.  So what does something like this look like.  Before I was able to write out the actual number, but how do you write out the actual number when your exponent has an exponent?  I already wrote out the exponent 10100 earlier, and this time that's what our exponent is.  So I'll start by expanding the exponent part.



Okay.  Again it would be kind of pain if we had to write it this way all the time.  But not too terrible I guess.  So if googol is just a one followed by one hundred zeros (ten to the power of one hundred), then a googolplex would be a one followed by a googol zeros (ten to the power of googol).  Let's start writing then.

Or maybe not.

Earlier I mentioned that there's only 1080 atoms in the universe.  That's significantly less than a googol.  And that means that if I wrote one zero on every atom in the universe in order to write out googolplex with all of the zeros that it should have (ie googol zeros), I would need more atoms than are in the universe.

This is one of the powers that mathematical notations give us.  Exponents allow us to manipulate numbers which would otherwise be too large to be able to otherwise write down even if we had the entire universe at our disposal.

Monday, October 14, 2013

Programming Language Feature Diagrams

When I first started reading about different programming languages, I tried to understand the features that differentiated them as if they were atomic and absolute in nature.  So imagine perhaps magic spells from Dungeons and Dragons or maybe Go Directly To Jail from Monopoly.  If you go directly to jail, you're not asking how or what is the mechanism that propels you to jail.  You simply move your game piece to the jail location and there you remain.  

However at some point this was no longer sufficient for my curiosity.  I started to break apart the programming language features themselves, and began to ask myself how these individual features worked.

Ultimately, everything that is encoded in a program is converted into assembly instructions.  There are instructions for modifying memory locations, moving the values in memory locations, storing values in memory locations, retrieving values from memory locations, jumping from one place in the list of instructions to another, etc.  While it is important to have an idea about what the assembly code looks like, I think for most features you encounter you can imagine things at a slightly more abstract level.

You can sort of imagine functions themselves as boxes that contain assembly instructions.  When you get to a function call, you traverse to a new box.  Once that function call is done, you return to the box that you came from.   So you kind of get a tree structure showing up.




Executing this program would look kind of like a depth first traversal.


I'm not sure it's worth the time to go through every possible feature and try to show a diagram for it (not to mention some things like call/cc, lisp macros, and compile time type checking don't really fit very neatly into this view), but here are a couple of thought provoking examples.

Threads

When a function creates a new thread, you end up with an additional stack in your program.  Time tends to progress as you would expect it[1] in each thread, but because the operating system is managing when any given thread is running (not to mention the existence of multicore computers) you don't really know where any given point in one stack relates to a point in another stack.  Without the existence of global memory addresses or a shared pointer in memory[2], these two stacks can't communicate with each other.  Because of the ubiquitous nature of multicore computers and the importance of distributed computing, new languages are starting to put a lot of thought in providing not only synchronizing and locking primitives (which allow correct communication via shared state) but also primitives that completely abstract the entire thread creation and communication process altogether.

[1] - Actually, both the compiler and the CPU reorder operations.  So time doesn't quite progress as you would expect, but it should progress in a manner which is mathematically identical to the way that you expect it to.  Things get tricky when threads are thrown into the mix, so you get to enter the dark world of atomic operations and memory fences.

[2] - Or if you're absolutely crazy, I suppose you could try to guess the location of the other stacks in memory and start dereferencing pointers.  I imagine there's probably *someone* who makes good usage of this technique, but I doubt that it's something that would lead to code that is at all comprehensible.

Global State

This one is a lot more straight forward than the threads.  At any point in time a function can set or get a value from global memory.  This state can be accessed again by the function itself or by another function in the future.  This can lead to programs which are difficult to understand especially as the number of connections made between functions via global memory increases[1].  But properly managed (like by the compiler) or used sparingly, this technique can make a lot of programs much easier to write.

[1] - The complexity that can arise from mutable global state is actually one of the motivators behind the recent interest in pure functional programming languages.  In a system like Lambda Calculus, all functions work locally or they don't work at all.  I've found this to be a very powerful way to carefully analyze the problem at hand to determine exactly how it functions.  You gain a lot more freedom with how you are able to build your program when you have separated every aspect of the problem.

Callbacks

I'm not sure that there is much to say about this one.  A callback is relatively simple.  You take a function (which may call other functions) and pass it as a parameter to other functions.  These functions can call the callback and the effect is very much like inserting a new function diagram at the call point of the callback.

This can get more complicated when you combine callbacks, objects, and threads.  Diagraming out your program's structure in these cases can help you understand bizarre behavior that you might encounter.  Although, as I said concerning threads, languages are beginning to add features that help the programmer manage these situations.

Exceptions

I'm not really a fan of exceptions.  I do have to admit that they can make the structure of your types a lot cleaner in certain situations[1].  However, exceptions really are just dynamic upward stack communication channels; unfortunately most people do not seem to think of them that way.

When a function returns normally, you end up at the location of the function call in the calling function and continue normal code execution.  But when a function throws an exception, you end up at the location of the function call in the calling function and then look for and execute an exception handler (typically called catching an exception) before continuing normal code execution.  If no exception handler is located or if there does not exist the proper function handler, then the exception goes to that functions calling function.  This continues until either the exception is handled or the program runs out of calling functions.

For simple programs like the example above, it is easy to determine that all exceptions will be handled appropriately.  However, with more dynamic features like objects, interfaces, callbacks, threads, etc, it can be very difficult to determine if all exceptions are being correctly handled.  This can lead to the program crashing unexpectedly.

My theory is that if you think of an exception as a method of dynamically communicating with the functions above you (dynamic because you don't have to know which function called you in order to try to communicate with it), then you are more likely to have all possible calling functions properly equipped with exception handlers.  On the other hand, if you consider exceptions something that should be used in "exceptional" situations or something to be used when problems show up, then I suspect that you're more likely to forget to handle all of the error cases in the appropriate places.

[1] -  For example, division fails when you try to divide any number by zero.  Instead of giving division a weird type that indicates that some failure may occur (or proving that it is never passed a zero) maybe it's better to just let an exception be thrown.

Combination

Of course in most programs of any complexity, you're not going to only see any single feature used at a time.  You're likely to see many of them used at the same time.  This can lead to complicated programs, which are very difficult to understand and maintain.  Perhaps this illustration helps explains some buggy programs you've used in the past.

I mentioned that the complexity of global state (and more generally mutable state) is something that has caused interest in functional programming.  When you eliminate all of the features that I have outlined here, you gain the knowledge that the entire program can be drawn as a relatively simple tree like the one I showed at the very beginning of this blog post.  

Personally, I don't think I'm ever going to be the same now that I have learned about lambda calculus, type theory, and the different mathematical properties of functions (reflexive et al, associative et al, injective et al, etc).  These are some very powerful tools for understanding problems, and they have allowed me to drill a lot deeper than I ever would have been able to on my own. 

But, while they do allow you to create some very comprehensible programs, it also takes a lot of work to restructure the combined diagram above into a functionally equivalent simple tree structure.  So while it would be nice to always do everything in an incredibly disciplined manner, the constraints of real life do have to be considered as well.  And I'm not just talking about time and expertise.  Even if you had sufficient time and everyone was an expert with the more mathematical tools, the functional route still requires quite a bit of mental effort.  Some problems are just very difficult to simplify, and perhaps it is better to throw an exception and move on with your life.


Wednesday, August 21, 2013

Communication of the Paradox Parsers

I Know, Right ?

A difficult aspect of communication is that if you really understand a thing, then it is trivial to you.  So why can't all those other people get with the program.  And on the other hand.  If you don't really understand a thing, then maybe you're the one causing the difficulty by expressing ideas which in reality do not make any sense.

I think that I have a pretty good sense of when I don't fully understand something.  Although really, that statement needs about a billion caveats.  For starters, "I know what I don't know", or perhaps more accurately, "I know when I don't know", is really  way too close of a cousin to, "I'll tell *you* about peanut butter!".  You really have to be careful when you start down the path of, "I know that I'm right!".  Because as soon as you encounter a scenario when you're not right, it can easily lead to a disaster.  Reality doesn't care that you think you're right.  

So to be more exact.  I really like to understand how things work.  And I really like to trace my understanding as deeply as I can manage.  And it bothers me on some sort of bizarre semi emotional level when I don't understand "sufficiently" why a given thing is operating in the way that it is.

This bizarre discomfort that I feel about my lack of understanding is a great tool for understanding when I'm missing something potentially important.  So as long as I'm listening to my own internal discomfort and lack of understanding alarm, I have a pretty good sense for when I don't understand what's going on.  At least I suspect that's the case … when you don't know what you don't know it's hard to know when you don't really know something.

If you want out then you'll have to develop those communication skills

So, I've got a skill that helps me to identify things that I don't fully understand.  This comes in handy with communication because I'm always on the lookout for situations where things just aren't making any sense.  So, I'm aware when there's a person who isn't following along in a conversation.  When I'm more of an observer to the conversation, I find it really easy to decide who it is that's missing the big picture.  I can then offer clarifying statements to the clueless or ask stupid questions to get those in the know to offer additional insight.  Of course doing this can jeopardize my observer status, so it does sometimes take a bit of finesse.

If I'm highly active in a conversation and it seems like sense has decided to go on a lunch break, then I have to start running experiments.  Either I'm the person who doesn't understand or one or more of the other active participants are the ones who don't understand.  Bonus points if there's more than one type of misunderstanding going around.

But I didn't really want to talk about communication.  Even though it is a very interesting problem to solve.  I was interested in talking about a couple of things that don't make any sense.

Parsing!

I've had a terrible time with parsing algorithms.  I find the basic idea of converting a stream of symbols into a data structure absolutely fascinating.  But often the algorithms for achieving this are hard for me to really get my head wrapped around.  

I think my blog entires about Cognitive Complexity Classes (CCC)[1] and Problem Shapes[2] actually shed some light here.  

First of all, let's talk about the stuff that I don't understand.  Parsing is more complex than just converting a stream of symbols into a data structure.  Different streams of symbols have different implicit structures.  The main ones that people talk about are the different types of grammars that languages fall into, and this is really talking about formal languages (ie programming languages) as opposed to natural languages (ie French).  Context sensitive, context free, and regular grammars.  Context sensitive grammars are some of the hardest to parse and they require some very powerful algorithms.  Similarly, context free are easier and regular grammars are even easier; both can be parsed with increasingly simple algorithms.  And this is only scratching the surface.  There's LL and LR grammars.  That is the set of language grammars which can be parsed by LL algorithms and/or LR algorithms.  Then there's LL(1) grammars.  And the rise of even more sophisticated algorithms give rise to even more complex sets of language grammars.  And to make things even more complicated, these algorithms aren't just interested in the power of the grammar they can parse.  They are also interested in parsing efficiency in both space and/or time.

So when you see a string of characters, there's actually a lot of extra hidden structure that you don't really understand present.  Or at the very least *I'm* not happy with my level of understanding.  I get the basic gist of Context Sensitive vs Context Free, but I don't get it sufficiently to look at some example strings and have a good intuition for which type of grammar the language falls into.  And things get more complicated.  If simple things like context baffle my intuition, then it's probably going to be a while before I figure out how to spot GLR grammars.

The whole CCC and Problem Shapes thing becomes useful in describing why this is all a complicated problem.  

I'm not entirely sure what shape parsing happens to be.  Something tree or graph shaped with backtracking.  What shape is constraint propagation and/or unification?  Anyway, so let's assume that parsing (all parsing) has some sort of known shape.  Then add to the mix different power levels of grammars (regular, context free, etc).  Then add to the mix efficient time and space constraints.  So we begin with the already complex shape of parsing.  Then we add a bunch of very complex shapes to get different characteristics that make parsing practical.  And finally, we squash the whole thing into a table and implement it in a for loop.

So of course parsing is hard to *really* understand.

Although, I didn't really want to talk about parsing.  Well … not computer language parsing.  I actually wanted to talk about math.  And more specifically, why is it so hard to understand mathematical statements.

… math blah blah blah or how paradoxically paradox aversion produces paradoxes 

I was watching a bunch of group theory videos[3] yesterday.  And I noticed that some of the picture examples and intuition behind the presentation was pretty simple and straight forward.  But the precise definitions were hard to wade through even with the intuition.

This got me thinking about parsing.  How I can't quite seem to feel comfortable with grammars and algorithms even though I've put a lot of work into it.  The answer of course is that I still don't have a good feel for the "true shape" of parsing.  Let alone all of the extra complexities that go into making it practical.  

But what I had missed until now about mathematics is that the English statements aren't really English.  And I'm not talking about jargon.  To be sure there is a bunch of math jargon that takes a while to understand, but I don't think the hard part is the jargon.  The hard part is that, just like with the constrained parsing grammars and algorithms, mathematical statements are a constrained form of English.  With the parsing we constrain the grammars in order to be able to write algorithms that halt, fit into memory, and run in a reasonable amount of time.  With mathematics there is a different goal in mind.

Charles H. Bennett wrote a pretty interesting paper titled, "On Random and Hard-to-Describe Numbers"[4].  It contains two essays, and the first essay discusses the paradox, "the first number not namable in under ten words".  This is of course nine words, which means that the first number not namable in under ten words (1,101,121) is in fact nameable with nine words (ie the above quotation).  

Similarly, consider Russell's paradox[5] (also mentioned in Bennett's paper).  

The point is that a full natural language contains facilities that allow you to make statements which do not actually make any sense.  Even mathematical languages like naive set theory, allow for statements which don't make sense.  

Which is why I think mathematical statements are so hard to make sense of.  It's because a great deal of work was done in order to make sure that the statement made sense.








Thursday, August 15, 2013

Cognitive Complexity Classes: Shapes

In order to expound upon the different types of Cognitive Complexity Classes (CCC), further attention must be applied to the types of shapes which algorithms may be composed of.  This is unfortunate because my current concept of a "shape" of an algorithm is overly vague.  So, to be able to focus on what makes any given CCC different from any other given CCC (or for that matter what makes any two CCCs the same … or even what types of sameness are needed to support this system), I first need to flesh out the idea of algorithmic shape.

Shapes on the cave wall

So my initial description of an algorithm shape invoked the imagery of the input and output of a function.  While that might have been a good way to break the ice, that description is hardly ideal.  After all, if you have a function that takes a graph and then returns the exact same graph, it would be intellectually dishonest to describe it as having anything other than a scalar shape.  Similarly, if you have a TreeMap function that maps all the elements in a tree using a given function and then returns the same tree with different elements, that seems a lot like a List shape.  You are doing operations on trees, but the nature of the operation is very linear in nature.  Given a function that traverses an entire tree and a function that traverses an entire list, the Map function is nearly identical between trees and lists (or for that matter scalars and graphs).

Comparison sorts are a pretty good class of algorithms to consider.  If you create a Merge sort algorithm that recursively breaks apart the list you want to sort and then glues it back together in the desired order, then you will end up with a tree shape across time.  On the other hand if you create a Heap data structure and then insert and remove all of the elements in your list, then you will end up with a tree shape across space.  This leads me to believe that comparison sort algorithms have a tree shape to them.

There's two interesting concepts that emerge from algorithmic shape that I want to discuss.  The first one is that different programming languages have different "dimensions", which are determined by the features available in them.  So I assert that a programming language that has recursion has an extra dimension than a programming language that does not have recursion.  If you want to implement a comparison sort algorithm in a language without recursion or you want to implement it without recursion, then you have to "flatten" out the comparison sort algorithm.  With recursion the algorithm can take its natural shape; without recursion you have to figure out how to represent the algorithm's "true form" in a flatter context.

Now I'm kind of eluding to something that I'm sure is going to end up getting me in a lot of trouble.  Best case is that I escape by being painted overly naive, ambitious, and/or idealistic.  So, flattening representations isn't that big of a deal.  We do this all the time.  We flatten 3D geometric shapes so that they can be represented on a 2D piece of paper.  We represent 2D matrices as an array of 1D arrays.  I'm sure that  "flattening" an algorithm's representation is something that can be formalized.  Now on the other hand, "true form" is clearly getting way to close to philosophic digressions or even worse poetic ruminations; I really don't see how I can hope to get away with that one without something concrete or at the very least wildly convincing.

So, Yuri Gurevich has a very interesting series[1] on Microsoft's channel 9 research website.  I'm sure I haven't devoted enough time to fully digest everything he said during that series, but a couple of things did stick that I think spell trouble for the direction that I'm hoping to go in.  Turing argued very convincingly for Turing Complete functions.  But he didn't argue at all concerning Turing Complete algorithms.  So while essentially every programming language is turing complete means that we can compute any function in every programming language, it does not mean that we can implement every algorithm.  And of course this makes sense if you think about it.  In the C programming language you could write an algorithm that writes a bunch of data to a file, calls a Unix command line utility to process that file, and finally reads the result from the Unix command to complete the algorithm.  Implementing this algorithm in lambda calculus is impossible.  Lambda calculus doesn't even recognize that files exist let alone have facilities to call out to Unix command line utilities.  

Professor Gurevich, if I recall correctly, argues that the class of digital algorithms cannot possibly contain the class of all algorithms.  Analog algorithms for example make use of reals, which can't be contained in natural numbers let alone digital memory.  Not to mention any all encompassing theory of algorithms would have to contain all future human invocation (who foresaw quantum computers back when the Greeks were first coming up with GCD).   

I'm fairly sure that I'm some flavor of constructivist[2], which I believe Professor Gurevich wasn't too keen on in his video series.  And I suspect that this might explain my idealistic "true form" sentiments.  After all if the only things that *really* exist are things that can be constructed, why all you have to ensure is that the Comparison Sort Problem Constructor is only called one time and it is going to be pretty hard to argue that the result is anything other than the "true form" of comparison sorts.  Of course in that last sentence I just spilled the beans as to why the road ahead of me is likely going to be a tough one, but let's get back to that in a little bit.

First though, let's get to that second interesting concept that I mentioned quite a bit ago.  Let's go ahead and assume that my algorithmic shape idea has all the merit in the world.  And let's assume that programming language features really do correspond to different shapes.  In that case, if statements and simple pattern matching seems to correspond pretty closely to scalar complexity.  Loops, map, fold, unfold, et al seem to fit nicely with list complexity.  Recursion and algebraic data types are pretty clearly tree complex.  Not quite sure what would correspond with graph complexity.  Some sort of state machine maybe?  Well at the very least an actual graph would be graph complex.  Of course we've got an interesting progression here.

Lists are a generalization of scalars.  You got one thing, now you got lots of things.  Trees are generalizations of lists.  Graphs are generalizations of trees.  And hypergraphs[3] are generalizations of graphs.  Now in my previous essay I suggested that maybe the most complex thing we have to worry about is graph complexity because you can implement lambda calculus (a turing complete language) with a graph reduction machine.  But on the other hand, let's go with the flow for a minute.  What kind of programming language features might show up to make graph complexity as easy to reason about as a Map operation with parametric polymorphism.  What about hypergraph.  There might be some very powerful abstraction techniques out there, and these complexity shapes might provide some insight to developing them.

Constructivism and the overly poetic imagination

So mathematical constructivism is probably my type of thing.  Turing completeness kind of gets in the way of a constructive view of things.  My thinking goes kind of like this:  If the only things that exist are things that can be constructed, then equivalence is just a hop, skip, and a jump away.  Just trace things back to where they were constructed and if it's the same constructor, then you're in business.  Unfortunately, with turing completeness we can't even know in general if an arbitrary function halts [4,5].  Forget about showing equivalence between two of these indeterminately halting beasts.  

With two turing complete functions, we can't determine if they're the same thing.  But who's talking about functions?  I wasn't.  I was talking about algorithms.  Which are an even weaker concept than functions.  At least we have a definition for turing computable function.  Algorithms are pretty vague and tomorrow a new class of them might show up out of nowhere.  

But then again.  Who's talking about algorithms?  I wasn't.  Not really.  In my constructivism heart I was talking about some sort of "true form" not for algorithms, but for problems.  Can't have a true form without equivalence.  At least if you want to stay outside the realm of poetry.  I mean how would you show that something was in it's true form if you're not equating it to *the* true form.  And in turing complete land you can't even have function equivalence.  Let alone algorithmic equivalence.  And turing complete algorithms aren't even a thing.  How do you have a true form for something that isn't even definable.  Now I'm talking about Problems.  Like not even implementations of algorithms for something.  But some sort of "higher plane of existence" where the Problem itself is a concrete thing.  Pretty ridiculous.

Although … on the other hand … turing completeness kind of sucks.  I mean don't get me wrong.  All things being equal, it's a pretty nice safety blanket.  After all you wouldn't want to be stuck up the river without a turing complete paddle to implement that function, which turns out can only be implemented in a turing machine.  However, a function that can be generically shown to be total is a function that is a lot easier to understand.  If you had a choice between a function that you knew was mathematically shown to always return a value no matter what value you pass to it and a turing computable function that had a "quite impressive" 90% test coverage, you're going to choose Mr. Turing-Complete-Is-For-Chumps.  Granted I might be overestimating the typical programers fear of ⊥, however I like to think the informed decision is to ditch turing completeness when you don't actually need it.

So perhaps my Problem as a concrete object isn't as far fetched as it sounds.  You do have to get rid of turing completeness.  Or at the very least have a subset of your workspace reserved for provably halting functions.  Hmm.  Homotopy type theory[6] supposedly says some interesting things about equivalence.  I guess it's time to hit the books.