Stack Allocated

Sunday, June 2, 2013

Homonyms by Any Other Name: Types

In college I started getting really interested in programming languages. Somehow, I discovered Lua at approximately the same time I was going through a compilers course. Once I got my first job, I decided that it was important to understand as many different programming languages as possible.

Not entirely sure why I did that. Maybe I had a vague understanding that I derived a great satisfaction from figuring out how things work. But whatever the reason I spent a lot of time and effort figuring out as much as I could about programming languages.

I started with huge lists. Features that I found in different languages. I wanted to bunch them all together. Be able to use all of them proficiently. But as time went by, I noticed that my lists were shrinking. I had begun to understand the mechanisms behind different features, and I could categorize massive sections of a single language behind a single word.

Types were kind of difficult to figure out though. What exactly were these things. People seemed to categorize them into different categories. Dynamic types, static types, strong types, weak types. And people had wildly different thoughts as to which type of type was best. And then more people complained that dynamic, static, strong, and weak were all bad categories and they didn't really correctly classify types. It was almost like the word "type" meant different things depending on the context. And I think that is actually exactly what is going on.

Normally, people would say that types prevent you from making mistakes. But that's not really what happens in the C programming language. In C you can cast anything to anything and the compiler will just assume you know what's up. In C, types mean what kinds of assembly opcodes and offsets the compiler will generate.

Well, maybe that's just C ... in C# types prevent you from making mistakes. Although ...

void fluAdd( IList<int> list ) {
list.Add( 5 );
}

fluAdd has an interface. This allow us to know that no matter what happens, we will always be passed an object that has an Add method on it. This protects us from failures. Until something like this happens:

fluAdd( new List<int>().AsReadOnly() );

So what does happen if you try to add something to a readonly collection ... well you get a NotSupported exception. Which sounds kind of familiar. Hmm.

>> 5.add( 5 )
NoMethodError: undefined method `add' for 5:Fixnum
from (irb):1

Aahh! Ruby's NoMethodError is basically the same thing as C#'s NotSupported exception. So C#'s types don't protect you from making mistakes. Why they're no better than what Ruby gives us. Well, don't go quite so far. Notice that Ruby is letting me try to add a list entry of '5' to the number '5'. C# won't let you do something that far off the rails. So while it is true that C#'s types won't protect you from making any mistakes, it isn't true that you might as well have dynamic types. C#'s types lower the number of possible mistakes you can make.

Okay, so what about Haskell. People are always saying that Haskell has a really powerful type system and comes from real, hardcore type theory. Well this is partially true. Yes, Haskell uses real type theory in its types. Type theory has this great concept called Soundness. Basically, it boils down to being able to prove that the type of a function doesn't change while you execute it *and* every given step of the execution yields a sensible result that can continue to be executed. From this comes the saying, "well typed programs don't get stuck". So the issue with Ruby's NoMethodError or C#'s NotSupported exception shouldn't happen in a programming language with Sound types.

But I mentioned "partial" truth didn't I? The first is that Haskell's type system isn't really all that powerful, when compared to other possible candidates from type theory. Maybe you want to type your types ... there's a type system for that. Or maybe you want to totally specify every aspect of your program that is mathematically possible ... there's also a type system for that. Haskell resides in a middle ground that balances out what you can make safe with types and what is practical to make safe with types. But there are extensions to Haskell and different languages which improve on what you can make safe by using types.

The second is that Haskell also has Type Classes. Type Classes don't really have much to do with type theory at all. But they do allow you to specify functionality based on what a type happens to be. Additionally, Recursive types (totally a type theory thing) are expressed with Algebraic Data Types (not necessarily a type theory thing[1]). Then comes in Generalized Algebraic Data Types, existential types, universal quantification, polymorphic kinds, type families, functional dependencies, the list goes on. Even if you want to ground a programming language completely in type theory, you still have to decide how to implement that and how the programmer will be interfacing with the types. This means that if all you know is type theory, you will still need to understand a type system of a particular programming language in order to truly understand that types are.

Of course all this really only focuses on static type systems. Even in the dynamically typed world, the semantics of type isn't this one monolithic thing that is the same in all contexts. Consider Ruby:

def ikky( a )
a
end

ikky( 1, 2 )

=> ArgumentError: wrong number of arguments (2 for 1)

So, there aren't any compile time checks to make sure ikky is containing the correct types, but there are runtime checks to make sure that at least the correct number of arguments are being passed to ikky.

Now consider Lua:

function ikky( a )
return a
end

ikky( 1, 2, 3, 4 )

=> 1

In Lua, you can pass as many arguments as you want to a function, and it will totally execute the function regardless of how the argument count matches up with the function definition. You can even pass no arguments to a function that takes many arguments and everything will work fine[2].

So just because you decide to move to the dynamically typed world, doesn't mean you can ignore what "Type" actually means. The meaning of this word still depends significantly on where you are.

Whether you want to learn more about types, win the next language "discussion" you take part of, or write good software that works, make sure that you remember that word "type" depends a lot on the context.

[1] - Okay, so yeah, ADTs are totally a theoretical thing which has mathematical properties, etc. But the fact of the matter is that you can have a recursively typed, system F lambda calculus that covers all of the things you will see with ADTs, but still not actually know how to use ADTs. ADTs are a language level interface with the type checker that may have language specific concerns.

[2] - This is of course for varying definitions of the word 'fine'. If a parameter doesn't have an argument, then it gets set to nil. Which may or may not be a problem depending on how the function is written.

Monday, May 6, 2013

Industry Languages VS Super Languages (part 2)

If you want some insight into my motivation, mood, and personality when it comes to the difference between the industry standard languages like Java and the esoteric languages that commonly get passed up in corporate environments like Lisp or Haskell, then this post has a part 1. Go check that out. However, if you're strapped for time or just want to get to the point and pass judgment, feel free to start here.

I already mentioned the classic objective measure for programming languages. Is it turing complete or is it sub-turing complete? With a turing complete language, we know how to create functions that can perform every calculation that we know how to perform. With sub-turing complete languages, we can still create functions that can perform calculations, but we can't create all of the functions that can be created in a fully turing complete language.

I want to add one more idea. Fixed functionality. Think about it this way. If a turing complete language can create a light, it can create lights that can do anything that is doable. Red lights, blinking lights, moving lights, whatever. Well, sub-turing languages can create lights as well. They can't do as much as the turing complete language can, but perhaps it can make red lights and blinking lights. The fixed functionality language can't even be called a language. It's either there or it's not there. If you use it you get a light, but you have no control over the light. It's a red light that's always on. If you don't use it it's not there.

Now, fixed functionality isn't necessarily bad. The down side is pretty obvious. If you want a green blinking swaying light, then you are out of luck. You only get red lights with fixed functionality. But on the other hand. If you happen to need a red light, and you don't want to think about it very hard. Then you get yourself a red light. You don't have to tell the language to make a light, and then make it red, and then make it stationary, and then make it constant. You just say LIGHT. And you get exactly what you wanted. So there's a sort of cognitive simplicity provided.

So now we can quality languages as turing complete, sub-turing complete, and fixed.

Before we go onto differentiating languages, let's talk about some of the features in languages. Well first of all every language is parsed. That is, you provide some sort of textual input to the computer that humans can understand, and the computer turns the text into a bunch of bytes that the CPU can understand. Also there are keywords. These are specific special words in the language that have a single meaning to the computer. And finally, let's say function dispatch. So normally, if you call a function, then you get a very specific set of calculations. Sometimes, though, you can get one of several different sets of calculations depending on the applications of some rules.

Okay, so as far as a language goes. It doesn't make a lot of sense to have a fixed language. That's more of a light switch than something that you would program with. But what about those features I just mentioned. Well, in Java the parsing is fixed, the keywords are fixed, and the function dispatch is sub turing complete (interfaces and inheritance allows a non-fixed decision as to what the actual function actually is). The C programming language has a sub-turing complete parsing process (macros), fixed keywords, and mostly fixed function dispatch (although there are function pointers). Now consider Common Lisp. Turing complete parsing (reader macros can call turing complete functions), turing complete keywords (macros can call turing complete functions), and … well I'm not sure if the common lisp object system (CLOS) actually allows arbitrary function dispatch, but it does allow quite a bit, so at the very least we can call it sub-turing complete.

So nearly every language is turing complete, but the differentiation I'm trying to raise here is where else is it turing complete. The super languages tend to have a lot more turing completeness available. Haskell has turing complete parsing (at least it has a Quasi Quoting extension that provides this), turing complete keywords (again the Template Haskell extension), and a turing complete type system. Forth has turing a turing complete parsing process. Etc etc.

The industry languages tend to have less turing complete features available. With the canonical example of Java, nearly every aspect of the language is sub-turing complete. And generally features that are turing complete are considered harmful. Most people distrust macros in C and templates in C++.

This is why people talk about being able to do *more* with Lisp. The final product is just as turing complete as everything else, but with Lisp there's so many more places where you can take advantage of turing completeness to help you do your job.

However, this isn't always a good thing. And I think this is why we won't be able to say that the Super languages are better than the industry languages for a long while. Most people aren't entirely sure what to do with turing completeness. On the one hand if you gave them a sub-turing complete language, they wouldn't be able to figure out how to transform their problem into a representation that is actually solvable by their language. And when you give them a turing complete language they immediately begin messing things up. After all, if you're able to calculate all calculations that are possible … you can write a bunch of terrible code (up to the limit of what humans can comprehend and beyond as long as you remain lucky) and there's not much objective evidence for claiming that it's bad … because it does sort of work in the end. Then once you give them a language that has a lot more turing complete aspects and features, they mess things up in more places. I think this is a natural consequence of computer programming being a relatively new discipline. Most people don't seem to understand the right way to do things. Those who do, don't understand how to teach it to others. And finally it's not clear how much of a good technique is due to it's objective goodness and how much of it is due to the way the person using it thinks.

Anyway, at the end of the day I don't think it's useful to give high praise to many of the esoteric super languages. I think a much better categorization technique is to indicate which features are turing complete, sub-turing complete, or fixed. Then continue to describe why you need what features to be what status in order to best solve the problem in your given domain.

Industry Languages VS Super Languages (part 1)

I'm not sure that it is entirely accurate to describe myself as a programmer. I have a very strong interest in understanding how things work. This trait makes me an effective programmer and software engineer, but understanding how things work is more important to me than programming or engineering.

I started off using C at Purdue. Java, Perl, and some other languages showed up. But the vast majority of everything was C. When I started working, I used a lot of C#. These are the typical programming languages that people are used to. It's what you're likely to see if you walk into an arbitrary software development company or division. However, because I'm very interested in how things work, I spent a lot of time investigating more exotic programming languages.

Almost immediately I encountered languages like ocaml, forth, and lisp. These languages have many interesting features that are typically completely unrepresented or at the very least shunned in the industry standard languages. I spent a bunch of time getting my head wrapped around complex features like macros and highly expressive type systems. And I also spent a bunch of time trying to wrap my head around why these features aren't present in the industry standard languages. Apparently, a bunch of people who really like the less popular languages have also spent a bunch of time trying to explain why their super languages aren't more popular.

I suspect that my perspective is different than most. Even among the outliers, my focus has been different, so I believe that this has influenced my outlook to also be different. I sought out different programming languages because I was unsatisfied with what the traditional languages offered, but most of my motivation was based off of pure curiosity. As I moved from language to language, I never found an environment which made me feel comfortable and satisfied with what is possible. I found that I was much more interested in learning and understanding the languages as opposed to actually using them for anything. Most of the defenses of Lisp and Haskell, that I've seen, seem to be from the perspective of people who actually use them to do things. My not quite a defense but more of a categorization of Lisp and Haskell (and other languages in a related position) is coming from a perspective of trying to figure out how these languages relate to the rest of reality.

There are two obvious categories you can place a language into. Turing complete and non-turing complete. This distinction isn't really that useful to people because it's not really obvious, even to people who know the definitions, what the difference is. A turing machine is able to perform the most powerful calculations that we know how to calculate. We can propose and describe calculations which would require a machine more powerful than a turing machine, but we don't know how to actually perform those calculations. They are mathematical problems that we don't know how to solve. A programming language being turing complete means that it can perform all of the same calculations that a turing machine is able to perform. Or in other words it means that it is able to perform the most powerful calculations that we know how to calculate. A non-turing complete language is then a language which is not able to perform all of the calculations that a turing machine can. We could call it a sub-turing complete language because it's abilities are below that of a turing complete language.

Now, in practice even people who get the gist of what turing complete means wouldn't actually be able to describe the difference between calculations that require a turing machine and calculations that can be done in a sub-turing complete language. But the idea of turing complete languages are useful at the very least because it's an objective measure. We can mathematically show that a given language is turing complete or not. Unfortunately, we do not really have any other objective measures for programming languages. Don't misunderstand. There are other objective-ish things we can say about programming languages, but few people actually agree which of those objective things are better and which are worse. And when the pedants and/or experts get involved the first thing they do is muddle the water as to what commonly understood features actually mean (because at the end of the day, things are always more complicated than we would prefer). At the very least with turing complete vs not, we are able to say that if a language is turing complete then it can do all of the calculations that we know how to do and alternatively it is not able to do all of them. So if you think about it, you probably want the turing complete language because that way if the problem is solvable, then it is solvable in your language. (Of course, there's some interesting situations that arise where you might actually want a sub-turing complete language, but let's ignore that for now.)

Basically every language is turing complete. There's a couple of outliers, but by and large all of them are turing complete. So, Java is turing complete and so is Lisp. This is a problem for people who really like Lisp (and/or the other misfit languages) because when they try to convince other people about the superiority of their language, the best objective measure (turing completeness) indicates that Lisp is in fact equivalent to the boring language that everyone else is using. Now I think that part of the problem is that different people think different ways. Some people will work better in Lisp and some people will work better in Java. It's difficult to argue about superiority when what you are actually trying to express is very specific thought processes that are unique to yourself. It seems that people usually resort to poetry.

There's the turing tarpit … where everything is possible but nothing is easy. A language that doesn't change the way you think isn't worth learning. You can write the solution to your problem in your language, or you can write the language that solves your problem. And this is probably even applicable to language slogans. Consider perl's "there's more than one way to do it" and python's "there's only one way to do it". I didn't even have to look these up, they're just in my DNA because of how many times they show up when people start to talk about the differences between languages. And while many of these sound nice, at the end of the day they're not much more than poetry. It's people trying to tell a story or convey a personal experience. And even if these people are correct, they are really only correct for themselves and people like them. After all the perl slogan is diametrically opposed to the python slogan. When you express something personal, you run the risk that what you say is only true for yourself. Of course that doesn't mean that you shouldn't still try. After all, communicating with other people is vitally important to living. And, like with any art, by expressing yourself you open the possibility of profoundly affecting and changing the lives of others.

I've said a lot here and I haven't quite gotten to what I actually started out to say. I think if I continue, I'll end up making this post too long and too fragmented. So next time, what does David say is the real difference between languages like Java and Lisp.

Wednesday, February 20, 2013

Halting problem

The question posed by the halting problem is: Given an arbitrary program from a turing complete system and an input for that program, determine if the program yields output or if it processes the input forever, never yielding output.

Named the halting problem because we are trying to decide if the program halts (yields output).

The halting problem is not decidable. That is to say that using turing complete systems, it is not possible to create a program that is able to examine arbitrary programs and decide if they halt or not for a given input. Unfortunately, the proof is a proof by contradiction, so there's not really a lot of insight as to why the halting problem cannot be answered. But it is a useful exercise in understanding the types of thinking that come up when analyzing a rule system. You may think a set of rules sound just fine, but someone clever may be able to subvert your entire system.

The first thing to consider is a program that doesn't halt. In lambda calculus this would look something like this:

( λ x . x x ) ( λ x . x x )

Okay, so the second ( λ x . x x ) is fed into the first ( λ x . x x ). The second ( λ x . x x ) then becomes "x". So every instance of x needs to be replaced by ( λ x . x x ). And the result is:

( λ x . x x ) ( λ x . x x )

The exact same thing that we started with. And then we do the same thing again and again. This process continues forever. This is a program that doesn't halt.

Next, let's define what the interface for a halting problem function would look like.

halt?( input, program ) -> true if the program halts with input, false otherwise

It's a function that takes a program and the target input as parameters and then it returns true or false depending on whether or not the program halts given that input.

A naive implementation of this function might look something like:

function halt?( input, program )

if program( input ) then

return true

else

return false

end

This will work perfectly if the program does in fact halt on the input. But if the program does not halt on the input, then our halt? function will also never halt. So we'll never get to return false. Whatever, the halt? function does, it can't actually do it by running the program and seeing what the result it. It needs to do some sort of more sophisticated analysis. However, this more sophisticated analysis turns out to be impossible (for arbitrary programs at least). Let's consider the following function:

function G( input )

if F( input, input ) == false then

return false

else

( λ x . x x ) ( λ x . x x ) // ie loop forever

end

The function F is to be considered as potentially any function that takes two inputs and returns true or false.

Now let's see what kind of behavior the G function has when we pass it itself.

G( G )

Well, it really depends on what the behavior of F is right? Fortunately, F either returns true or it returns false, so we can define the G function's behavior piecewise.

G( G ) = false, when F( G, G ) = false

G( G ) = ⊥, when F( G, G ) = true

Hmm … what's the deal with "⊥"? That symbol actually has a lot of different meanings. Like most things in mathematics, its horribly overloaded and means a lot of different things depending on the context. It means things like 'does not terminate', undefined, falsity (a concept I'm not sure I fully grasp), etc. This symbol is useful here because G might not return any value. And in that case instead of saying, "G doesn't return any value because it fails to terminate," we can instead say ⊥. It's like a shortcut.

Let's redefine the halt? function using the new ⊥ symbol and the piecewise style.

halt?( input, program ) = false, when program( input ) = ⊥

halt?( input, program ) = true, otherwise

Okay, this seems simple enough, but what's the connection with proving that the halting problem can't be solved? Well earlier I said that the F function can be any function that takes two parameters and returns true or false. Well, the halt? function is a function that takes two parameters and returns true or false. Let's see what happens when the F function is actually our halt? function. First notice that G is taking itself as input. So let's describe the behavior that we expect from the halt? function if we were using it to analyze whether or not the G function halts when given itself as an input.

halt?( G, G ) = true, when G( G ) = false

halt?( G, G ) = false, when G( G ) = ⊥

Okay, that's basically the piecewise description of halt? above, but with the parameters input and program both replaced by G. Additionally, we can replace the "true, otherwise" portion with G( G ) = false because we know that's the only time that the function G halts with an input of itself. Now let's combine the behavior we expect from the halt? function with the behavior that we expect from the G function.

halt?( G, G ) = true, when G( G ) = false, when F( G, G ) = false

halt?( G, G ) = false, when G( G ) = ⊥, when F( G, G ) = true

Now because F is actually halt?, let's replace the occurrences of F with halt?.

halt?( G, G ) = true, when G( G ) = false, when halt?( G, G ) = false

halt?( G, G ) = false, when G( G ) = ⊥, when halt?( G, G ) = true

And now we are left with the contradiction. halt?( G, G ) is equal to true when G( G ) is equal to false, and that only happens when halt?( G, G ) is equal to false. And vice versa, halt?( G, G ) is equal to false when G( G ) does not terminate, and that only happens when halt?( G, G ) is equal to true.

That might be a bit confusing, so let's reorganize the piecewise description to highlight the contradiction.

G( G ) = false, when halt?( G, G ) = false

Meaning that G( G ) is equal to false when the halt? function returns false which in turn means that G( G ) does halt when halt? says that G( G ) does not halt.

G( G ) ⊥, when halt?( G, G ) = true

Meaning that G( G ) does not halt when the halt? function says that it should halt.

So in other words. The function G with input G halts when the halting function says that it doesn't halt and it doesn't halt when the halt? function says that it does halt. Or …

halt?( G, G ) = false, when halt?( G, G ) = true

halt?( G, G ) = true, when halt?( G, G ) = false

Proof by contradiction works by assuming a certain thing exists. Then it shows that as a result of that thing existing, something else that's impossible must also exist. Finally, we conclude that the thing cannot exist because if it does then something that is definitely impossible must then also exist.

In this instance we assumed that there existed a function halt? that returned true when a program halted with a given input and false when a program does not halt with a given input. From there we create a very real function G and explore its behavior when it is passed itself as a parameter. Finally, we place the halt? function inside of the G function such that we receive a result that contradicts the very nature of the halt? function. It's pretty obvious that halt?( G, G ) is equal to true only when it is also equal to false and vice versa is impossible.

Unfortunately, this doesn't really explain why the halting problem can't be solved. It just shows that having a function that solves the halting problem is nonsensical.

Even though this seems like very esoteric material, it actually has important implications for everyone. Just because you make a rule or a request that sounds like it makes sense doesn't mean that it actually does make sense. After all, the halting problem sounds like it should be perfectly reasonable to exist. And it's not quite obvious why it can't exist. What is obvious is that if it did exist, then it wouldn't exist (it fails to work for G( G )).

Additionally, notice what happened when we created what we thought was a reasonable request. Some very clever person showed up and created a very real function G. G was able to make our reasonable request say 'yes' when it should have said 'no' and 'no' when it should have said 'yes'.

If we set up rules which seem reasonable, but in reality are nonsensical, we run the risk of encountering clever people who can subvert our rules and manipulate them to enforce the exact opposite of what we actually wanted to happen.

Monday, February 11, 2013

Proving it with math

There are many complicated and sophisticated ways of proving things using mathematics. Unfortunately, I haven't been able to investigate most of them. But there are a few simple techniques that I've thought about.

Proving it with calculation.

Okay, so this one isn't what people normally think of as a proof. But it seems to be one of the easier ones to do. You just write a program that does the thing. And then you run it. It works? You're done. It happened, it's proved. QED.

So on one hand, you have the curry howard correspondence, which says that programs are proofs. And on the other hand you have ad hoc constructs, inconsistent compilers, undefined behavior, gamma rays from space flipping bits in your ram, sloppy programmers, and totally ill defined inputs, outputs, or even any idea of what the thing is even supposed to do.

Personally, I like this one the best. Probably because it's the one I'm good at. But there's also something satisfying about it to me. Not sure I can put my finger on it, but for an arbitrary *thing* I would rather show it with a lambda term than show it with type theory.

Proving it with deduction.

This one is pretty straight forward. If you have an 'A' and you have a rule which states 'A -> B', then you can conclude 'B'.

So basically, you gather all of the rules and initial conditions and then you see which way you can point it in.

This one makes me think about parsing. You have a language of primitive atoms (or letters) and a set of allowable transform rules. Then your goal is to search through a very large or infinite tree of possible transformation sequences ... looking for something that is "interesting".

I'm not exactly sure what to think about this one. However, it does seem really useful.

Proving it with induction.

That thing from high school or wherever. You know. Prove it for 0, then prove it for x (what did that step even mean?), then prove it for x + 1 (why does this help?), and then you're done.

So basically what's going on is that you're proving the *thing* for a real, specific, base case, then proving it for a general case, and then proving it for a general case modified in such a way that the last step could theoretically be repeated for every item in your set. ... so with the traditional x + 1 proof, this really only works if your set of possible inputs are natural numbers (ie 0 to infinity). If you start with zero and an increment function (ie + 1), then you are able to generate every element in the set of natural numbers.

So again, the theory is more or less that you just proved it for x + 1, so you could theoretically prove it for *every* natural number, therefore you did prove it for every natural number.

This always bothered me as a child because I really didn't understand anything that was going on. Then I learned set theory and got used to recursion (check out the little schemer and the seasoned schemer for great help getting your mind around recursion). Now when I think about induction, I just think recursion.

Proving it with contradiction.

So this one is probably the least satisfying proof method. And at the same time the most satisfying proof method. With programming you can basically assemble a set of instructions to do a *thing*. With deduction you have all of the steps to get to the *thing*. WIth induction you have a recursive sketch of the structure of the *thing*. With contradiction you just have the *thing* because the alternative is impossible.

So the basic idea is this. I don't know how to prove the *thing*. But let's assume the alternative. IE the *thing* doesn't exist or isn't true. Now, all we have to do is find a side effect of the *thing* not existing which is totally impossible (the contradiction). And presto chango *thing* must be true because otherwise *impossible* must be true ... and *impossible* ISNT true (because it can't be ... that would be impossible).

This isn't very satisfying if you are trying to understand why the *thing* is true or isn't true. If you're trying to understand the structure or form of the problem.

On the other hand. If you don't care about the structure or form of the problem. You just need the answer. Or if someone wants *thing* to be true or wants you to do *thing* (and you don't ... and you want to be a little bit too clever for your own good). For these types of things I have a hard time thinking of something that's more satisfying than proof by contradiction.

"Hey, here's a bunch of busy work to do in order to make sure that we are correctly handling situation X."

"Hmm. If X ever did happen, it would wipe out all of reality. Past and future. We're still here, so clearly X is never going to happen ... because then we wouldn't be here. We're handling situation X appropriately by definition."

ahh

Prove it

Normally, when we say "prove it", what we are really talking about is the process of strengthening the neurological pathways in our brain that allows us to short-circuit the expectation of certain outcomes. The more the expected outcome occurs, the stronger the connections become, the better we are able to react to the likely outcome.

The stronger our brain believes something to occur, the more freaked out it is when that thing *doesn't* occur. Our brains have a pretty good handle on how to deal with situations it has seen before. A brand new situation can be very dangerous or fatal. "Being surprised" and "being shocked" are synonyms. The shock of our brain freaking out from a non-standard occurrence that does not have well worn neuro pathways.

That is of course my armchair neuro-socio explanation for what most people seem to mean about proving something.

Scientific endeavors have a different meaning of that phrase. Proving something scientifically is about bayesian statistics. We run experiments to generate evidence to support that what we believe is correct, or more importantly generate evidence to support that what we believe is incorrect. The more evidence that we have for something being true the higher the expected probability that thing is actually true. The phrase, "extraordinary evidence for extraordinary claims," is about the bayesian nature of this process. If we have a bunch of very good experiments leading us to believe a certain thing is true and we have a so-so experiment leading us to believe that same thing is not true, then we will tend to believe that the large amount of good evidence trumps the small amount of so-so evidence. After all, it is more likely that the so-so small amount of evidence is wrong or indicating something other than the stated conclusions than for all of the large amount of good evidence to be wrong.

I have no idea of what to think about philosophical methods of 'proving' something. I should probably start reading the classic philosophical works when I get a chance. On the one hand it kind of seems like a bunch of hot air, but then again they do come up with some surprisingly good observations. For example, isn't it suspicious that induction can only be shown to work via a process of induction. And speaking of induction ...

Proving things in mathematics is much more certain than in other fields. The proofs have nothing to do with intuition or what is expected. The proofs have nothing to do with statistical extrapolation or gathering evidence. The proofs can be verified regardless of how your thought process works. The proofs start with a system of rules and follow them exactly, until they reach a useful conclusion. And within their system (assuming no mistakes were made ... and assuming that the system is valid) they are absolute.

... Of course outside of their system they are meaningless. So a good question is, "What's the point?" Isn't this the same question that's asked in every math class ever?

I think I have an answer to this question. I didn't come up with it because I was trying to answer the question. I came up with it because I was trying very hard to figure out how mathematics works. So if you have category theory and lambda calculus, why are they both classified as math. Surely this is not just someone's opinion. There's got to be some unifying characteristic that makes them both math. Additionally, if you have an arbitrary math, how do you know it works. Maybe you can prove that it's sound and complete and consistent, but how are we so sure that our proofs won't evaporate before our eyes. The rules don't have any special powers, we just made them up. Why should we be able to prove anything lasting about them.

My conclusion is that mathematics are abstract causality machines. Causality has the same philosophical problems that math does. We don't really know why effects have causes. And as far as the physical universe goes ... even if we are able to eventually prove (in the bayesian sense) that everything is in reality very small strings that are vibrating in 11 dimensions, we still won't know why those strings feel compelled to obey a certain set of rules.

So why is studying math useful? Well, we can't prove that it is useful (in the mathematical sense). But mathematics are just abstract causality machines. And as far we can tell, the vast majority of reality "runs" on causality. Understanding mathematics helps you to understand reality, in the general sense.

Thursday, February 7, 2013

Hard problems

Lambda calculus was created by Alonzo Church in the early 1930's. This tiny logic is turing complete. In fact, it was turing complete before turing completeness was a thing. Alonzo Church published his lambda calculus shortly before Alan Turing published his work with turing machines.

Every computable function is computable in a system which is turing complete. And what this means for lambda calculus is, basically, anything you can do in math you can represent in lambda calculus. However, originally, Alonzo Church was having some problems making lambda calculus do everything. He was having problems making it do a very simple function. The predecessor function. Or in other words ... subtract one from a number.

The story goes that Church thought about it very hard and was considering giving up on lambda calculus when his student, Stephen Kleene, went to the dentist. There Kleene reasoned out a way to implement the predecessor function by using triples (a list with exactly three items). The more modern version uses pairs.

So here's the point. Alonzo Church was smart enough to come up with lambda calculus in the first place. This is a system which is powerful enough to describe everything that we are capable of formally describing. But he wasn't smart enough to figuring the missing piece that would make basic subtraction work. That was solved by Kleene. Additionally, it's not like Kleene just walked in and solved the thing. He was clearly thinking about it very hard. So hard in fact that he was thinking about it even when he was off at the dentist.

Solving truly hard problems isn't a question of raw intelligence. Church was clearly very intelligent. And it's also not a question of just putting in the hours. Kleene solved the predecessor function while he was on a personal errand.

If you're working on a truly hard problem, you might not be able to solve it. And this is regardless of how smart you are. Additionally, there isn't a set algorithm for coming up with the solution.

I once tried to come up with the solution to the predecessor function on my own. I almost got there without cheating, but in the end I checked around online. Had I known that the predecessor function was so difficult to originally come up with, I probably would have worked harder to solve it on my own. Although I would have still been cheating. Because I knew that it was actually possible to solve.

This is the solution that I found that I was able to actually understand in the end. It took a lot of work to figure out, but my intutition was correct. My idea was that you needed to "unwrap the number" and then just "rewrap" it one less time than it's original total. [In lambda calculus numbers are typcially represented as a number of times that you do something else. Unwrapping a number is cognatively similar to unwrapping a loop as an optimization techique. Except you're not in it for the optimization.]

0 = λ f . λ v . v
1 = λ f . λ v . f v
2 = λ f . λ v . f ( f v )

pred = λ n . λ f . λ v . n ( (λ a . λ b . b ( a f )) (λ u . v) (λ i . i) )