Wednesday, December 17, 2014

Syntax Matters

Originally, I didn’t worry too much about programming language syntax.  I think, starting out, I simply didn’t have enough exposure to the different options.  Then as I started to really get interested in programming languages, my interest was focused on the semantics of different features.  How they looked didn’t seem to make any difference when I had no idea what they actually did.  Lisp and Forth were probably my first clue that syntax would be significant.  And that makes sense because in those languages the syntax allows certain constructs to exist.  However, I think that even if there does not exist any language construct that necessitates a specific syntax, the syntax of your language still has effects.

In Let Over Lambda [1], Doug Hoyte talks about “Duality of Syntax”.  I’m not going to go into detail of this concept, but I believe he uses this principle in order to justify deviating from typical common lisp conventions as well as deciding on how to design new conventions.  This is important because it’s setting up a theory on what makes good programs and then makes decisions in that framework.  

You also see this (but sort of in the other direction) with Douglas Crockford’s Javascript the Good Parts [2].  And the punchline is something that I talked about in a previous blog post [3].  Normally, where you put a curly brace doesn’t really matter, but in javascript it does matter because semicolon insertion means that a return statement for an object literal might instead return ‘unknown’.  Douglas’ solution is to say that in javascript you always put the open curly brace on the same line because that way you won’t run into the semicolon insertion problem.  Here we’re using a syntax convention in order to avoid bad parts of the language, but my thought is that we should observe situations like what has occurred with javascript and instead create new languages that do not have these defects.

Also observe Jonathan Blow’s video series for the new programming language that he is developing [4].  Specifically try to keep an eye out for how he’s designing the syntax for functions.  A locally defined function has nearly the exact syntax as a globally defined function; that’s on purpose.  This design decision allows for a work flow that is much better optimized than if local and global functions have different syntaxes.  

And finally take a look at my previous blog post [3] where I propose that we can decide what good syntax is by determining if it possesses certain mathematical properties that are advantageous.

Now I don’t think that there is going to be a ‘one syntax to rule them all’, but I do think that the reasoning behind our syntax and conventions needs to be based off of what properties we want our code to have.  Just choosing an arbitrary set of rules for the sake of uniformity is missing the point (although it’s clearly on the right track).  What we need is a careful analysis so that we can show in a very methodical fashion that our rules are helping us and not hindering us.







Wednesday, December 10, 2014

85%

The indie game developer Jonathan Blow is currently trying his best to create a replacement for the C programming language with the intended use case of developing games (go figure).  In general I think that replacing C/C++ is a good call; my personal belief being that undefined behavior [1] is a bigger problem than the college lectures let on.  And honestly, replacing C/C++ seems to be en vogue these days [2].  

Now Jonathan’s project is still a work in progress, but he’s racked up quite the impressive length of video digression into this subject [3].  If you watch through those videos, you’ll get the impression (and part of this comes from explicit statements) that the goal of this language is to be an 85% language.  Fix the parts of C that are rather archaic, optimize for the use cases that game designers care about, avoid syntactic and semantics pitfalls that do not have any reason to exist and we can fix with zero cost, but don’t ground the thing in whatever crazy lambda calculus the cool kids are playing with this decade.  After all we would like to be able to get some work done by Thursday without having to learn category theory in order to do it.

I think that the 85% goal is a pretty good one if you already know how you want to be programming.  If your patterns stay the same and are always doing *basically* the same things, ensuring that weird edge cases fit into your system might actually be a waste of time.  You can handle those issues with code reviews, stack traces, and debuggers.

However, on the other hand, my personality doesn’t get along with very well with 85% solutions.  Economically, I see why they are absolutely necessary, and I wouldn’t hesitate to use an 85% solution as an implementation detail if my focus is on whatever the bigger picture is (assuming of course the 85% can be verified to fully satisfy the requirements).  But if my goal and focus is to understand the thing (whatever it is), I’m going to lean towards 100% solutions nearly every time.

I think the reason for this is because my learning style seems to involve some sort of combination of counter example search and backtracking.  Break everything apart into the smallest pieces you can and then try to put the pieces back together again.  Anytime you think you have a semantic meta statement look for counter examples.  If you find a counter example then something is wrong, so backtrack until you can’t find any counter examples.  Once you can’t find any counter examples for anything you’re done.  Not quite a proof of correctness, but success by exhausting all avenues of failure seems like it ought to work better than “but this worked last time”.

Here comes the punch line:  I’ve already mused about mathematical properties being valuable for programming correctness [4].  And I think the reason for that is because the context of a property is 100% (and by the way we have proofs).  You don’t have to worry about things going wrong tomorrow.  I think the reason that we need to be replacing C/C++ today is that they were the 85% languages of yesterday.  The programming techniques we’ve developed just don’t fit into languages that were developed in a very different environment.  Maybe you want an 85% solution or programming language because you need to get work done today and you don’t want to worry about proofs of correctness.  And maybe that’s going to work.  On the other hand maybe someone will misuse your solution or new techniques will arrive that can’t be safely utilized in your language.



[2] - Microsoft is supposedly developing *something*, Apple has developed Swift, Mozilla is developing Rust, Facebook seems interested in D, Google has developed Go, the C++ standard is trying to modernize.


Tuesday, December 2, 2014

Compiler : Syntax -> Program

It has seemed to me for a while now that the significant whitespace in languages like F#, Haskell, and Python is actually a really good thing.  After all, correctly formatted code already has indentation delimiting scope.  And there exist compilers that will correctly compile code that has significant whitespace (i.e. F#, Haskell, Python).  Conclusion:  Curly brackets et al are all only to make the compiler writers’ life easier, the right way to delimit scope is with significant whitespace.

However, one of the more horrifying things that I’ve encountered in programming languages are the bizarre errors that can occur from Javascript’s semicolon insertion.  Basically, the problem is that:

return {
    a : b
}

is different from 

return 
    a : b
}

The first one returns an object that has a field ‘a’ with a value ‘b’ and the second one returns ‘undefined’.

Why is semantic whitespace in the same post as javascript semicolon insertion?  Because I realized that they both share the same property (or lack thereof).  Both are discontinuous.  

The title to this post is actually an indirect reference to my previous post [1].  In that post I said that there are a bunch of mathematical properties that we really should make sure our programming constructs have (for example … some constructs really should be continuous).  *This* post is about how compilers are really just functions that map syntax into programs.  

The gist behind continuous functions is that a small change in the input should result in a small change in the output [2].  The problem with the javascript example is that a small change in the input (a single invisible character that only changes formatting and almost never has any semantic value in other languages) will result in a very large change in the resulting program semantics.  If you accept that this is a problem in javascript it seems like you have to also accept that this is a problem in languages like python (a single invisible character that only changes formatting has non-trivial semantic results).

The thesis here is that a continuous compiler is good and a discontinuous compiler is bad.  One solution might be to add in curly braces and remove semicolon injection … but I suspect there may be better ways to salvage the situation.

ReSharper is a neat add-on to Visual Studio that adds a bunch of additional syntax highlighting (among other features).  One of the things I’ve really enjoyed is the grey out effect that occurs when a function is not called anywhere.  It becomes rather easy to determine at a glance when a function has become obsolete.  And I suspect that this is also the answer to solve the semantic whitespace discontinuity.

The only real problem I have is that I want a continuous syntax to semantic mapping.  So if I can convert the small change we see with semantic whitespace into a big change then the problem should be solved.  If we take a page from ReSharper, then we can change the color scheme slightly (maybe only a tint change) when the scope of an expression is different.  Each scope can have a different tint and then the small change causing a large differing output will be converted into a noticeable change causing a noticeable differing output.  Additionally, this technique could also be leveraged to address some of the issues with semicolon insertion in javascript.





[2] - Things get complicated when you aren’t dealing with geometry or metric spaces.  What do you mean by “small change” when distance doesn’t exist in your domain?  Well, topology has a solution for that, but I’m not going to try to build a topological space for syntax and program semantics.  I think the informal ideas for small change is clear enough to get the idea in this instance.