Tuesday, December 2, 2014

Compiler : Syntax -> Program

It has seemed to me for a while now that the significant whitespace in languages like F#, Haskell, and Python is actually a really good thing.  After all, correctly formatted code already has indentation delimiting scope.  And there exist compilers that will correctly compile code that has significant whitespace (i.e. F#, Haskell, Python).  Conclusion:  Curly brackets et al are all only to make the compiler writers’ life easier, the right way to delimit scope is with significant whitespace.

However, one of the more horrifying things that I’ve encountered in programming languages are the bizarre errors that can occur from Javascript’s semicolon insertion.  Basically, the problem is that:

return {
    a : b

is different from 

    a : b

The first one returns an object that has a field ‘a’ with a value ‘b’ and the second one returns ‘undefined’.

Why is semantic whitespace in the same post as javascript semicolon insertion?  Because I realized that they both share the same property (or lack thereof).  Both are discontinuous.  

The title to this post is actually an indirect reference to my previous post [1].  In that post I said that there are a bunch of mathematical properties that we really should make sure our programming constructs have (for example … some constructs really should be continuous).  *This* post is about how compilers are really just functions that map syntax into programs.  

The gist behind continuous functions is that a small change in the input should result in a small change in the output [2].  The problem with the javascript example is that a small change in the input (a single invisible character that only changes formatting and almost never has any semantic value in other languages) will result in a very large change in the resulting program semantics.  If you accept that this is a problem in javascript it seems like you have to also accept that this is a problem in languages like python (a single invisible character that only changes formatting has non-trivial semantic results).

The thesis here is that a continuous compiler is good and a discontinuous compiler is bad.  One solution might be to add in curly braces and remove semicolon injection … but I suspect there may be better ways to salvage the situation.

ReSharper is a neat add-on to Visual Studio that adds a bunch of additional syntax highlighting (among other features).  One of the things I’ve really enjoyed is the grey out effect that occurs when a function is not called anywhere.  It becomes rather easy to determine at a glance when a function has become obsolete.  And I suspect that this is also the answer to solve the semantic whitespace discontinuity.

The only real problem I have is that I want a continuous syntax to semantic mapping.  So if I can convert the small change we see with semantic whitespace into a big change then the problem should be solved.  If we take a page from ReSharper, then we can change the color scheme slightly (maybe only a tint change) when the scope of an expression is different.  Each scope can have a different tint and then the small change causing a large differing output will be converted into a noticeable change causing a noticeable differing output.  Additionally, this technique could also be leveraged to address some of the issues with semicolon insertion in javascript.

[2] - Things get complicated when you aren’t dealing with geometry or metric spaces.  What do you mean by “small change” when distance doesn’t exist in your domain?  Well, topology has a solution for that, but I’m not going to try to build a topological space for syntax and program semantics.  I think the informal ideas for small change is clear enough to get the idea in this instance.

No comments:

Post a Comment