Stack Allocated: February 2013

Wednesday, February 20, 2013

Halting problem

The question posed by the halting problem is: Given an arbitrary program from a turing complete system and an input for that program, determine if the program yields output or if it processes the input forever, never yielding output.

Named the halting problem because we are trying to decide if the program halts (yields output).

The halting problem is not decidable. That is to say that using turing complete systems, it is not possible to create a program that is able to examine arbitrary programs and decide if they halt or not for a given input. Unfortunately, the proof is a proof by contradiction, so there's not really a lot of insight as to why the halting problem cannot be answered. But it is a useful exercise in understanding the types of thinking that come up when analyzing a rule system. You may think a set of rules sound just fine, but someone clever may be able to subvert your entire system.

The first thing to consider is a program that doesn't halt. In lambda calculus this would look something like this:

( λ x . x x ) ( λ x . x x )

Okay, so the second ( λ x . x x ) is fed into the first ( λ x . x x ). The second ( λ x . x x ) then becomes "x". So every instance of x needs to be replaced by ( λ x . x x ). And the result is:

( λ x . x x ) ( λ x . x x )

The exact same thing that we started with. And then we do the same thing again and again. This process continues forever. This is a program that doesn't halt.

Next, let's define what the interface for a halting problem function would look like.

halt?( input, program ) -> true if the program halts with input, false otherwise

It's a function that takes a program and the target input as parameters and then it returns true or false depending on whether or not the program halts given that input.

A naive implementation of this function might look something like:

function halt?( input, program )

if program( input ) then

return true

else

return false

end

This will work perfectly if the program does in fact halt on the input. But if the program does not halt on the input, then our halt? function will also never halt. So we'll never get to return false. Whatever, the halt? function does, it can't actually do it by running the program and seeing what the result it. It needs to do some sort of more sophisticated analysis. However, this more sophisticated analysis turns out to be impossible (for arbitrary programs at least). Let's consider the following function:

function G( input )

if F( input, input ) == false then

return false

else

( λ x . x x ) ( λ x . x x ) // ie loop forever

end

The function F is to be considered as potentially any function that takes two inputs and returns true or false.

Now let's see what kind of behavior the G function has when we pass it itself.

G( G )

Well, it really depends on what the behavior of F is right? Fortunately, F either returns true or it returns false, so we can define the G function's behavior piecewise.

G( G ) = false, when F( G, G ) = false

G( G ) = ⊥, when F( G, G ) = true

Hmm … what's the deal with "⊥"? That symbol actually has a lot of different meanings. Like most things in mathematics, its horribly overloaded and means a lot of different things depending on the context. It means things like 'does not terminate', undefined, falsity (a concept I'm not sure I fully grasp), etc. This symbol is useful here because G might not return any value. And in that case instead of saying, "G doesn't return any value because it fails to terminate," we can instead say ⊥. It's like a shortcut.

Let's redefine the halt? function using the new ⊥ symbol and the piecewise style.

halt?( input, program ) = false, when program( input ) = ⊥

halt?( input, program ) = true, otherwise

Okay, this seems simple enough, but what's the connection with proving that the halting problem can't be solved? Well earlier I said that the F function can be any function that takes two parameters and returns true or false. Well, the halt? function is a function that takes two parameters and returns true or false. Let's see what happens when the F function is actually our halt? function. First notice that G is taking itself as input. So let's describe the behavior that we expect from the halt? function if we were using it to analyze whether or not the G function halts when given itself as an input.

halt?( G, G ) = true, when G( G ) = false

halt?( G, G ) = false, when G( G ) = ⊥

Okay, that's basically the piecewise description of halt? above, but with the parameters input and program both replaced by G. Additionally, we can replace the "true, otherwise" portion with G( G ) = false because we know that's the only time that the function G halts with an input of itself. Now let's combine the behavior we expect from the halt? function with the behavior that we expect from the G function.

halt?( G, G ) = true, when G( G ) = false, when F( G, G ) = false

halt?( G, G ) = false, when G( G ) = ⊥, when F( G, G ) = true

Now because F is actually halt?, let's replace the occurrences of F with halt?.

halt?( G, G ) = true, when G( G ) = false, when halt?( G, G ) = false

halt?( G, G ) = false, when G( G ) = ⊥, when halt?( G, G ) = true

And now we are left with the contradiction. halt?( G, G ) is equal to true when G( G ) is equal to false, and that only happens when halt?( G, G ) is equal to false. And vice versa, halt?( G, G ) is equal to false when G( G ) does not terminate, and that only happens when halt?( G, G ) is equal to true.

That might be a bit confusing, so let's reorganize the piecewise description to highlight the contradiction.

G( G ) = false, when halt?( G, G ) = false

Meaning that G( G ) is equal to false when the halt? function returns false which in turn means that G( G ) does halt when halt? says that G( G ) does not halt.

G( G ) ⊥, when halt?( G, G ) = true

Meaning that G( G ) does not halt when the halt? function says that it should halt.

So in other words. The function G with input G halts when the halting function says that it doesn't halt and it doesn't halt when the halt? function says that it does halt. Or …

halt?( G, G ) = false, when halt?( G, G ) = true

halt?( G, G ) = true, when halt?( G, G ) = false

Proof by contradiction works by assuming a certain thing exists. Then it shows that as a result of that thing existing, something else that's impossible must also exist. Finally, we conclude that the thing cannot exist because if it does then something that is definitely impossible must then also exist.

In this instance we assumed that there existed a function halt? that returned true when a program halted with a given input and false when a program does not halt with a given input. From there we create a very real function G and explore its behavior when it is passed itself as a parameter. Finally, we place the halt? function inside of the G function such that we receive a result that contradicts the very nature of the halt? function. It's pretty obvious that halt?( G, G ) is equal to true only when it is also equal to false and vice versa is impossible.

Unfortunately, this doesn't really explain why the halting problem can't be solved. It just shows that having a function that solves the halting problem is nonsensical.

Even though this seems like very esoteric material, it actually has important implications for everyone. Just because you make a rule or a request that sounds like it makes sense doesn't mean that it actually does make sense. After all, the halting problem sounds like it should be perfectly reasonable to exist. And it's not quite obvious why it can't exist. What is obvious is that if it did exist, then it wouldn't exist (it fails to work for G( G )).

Additionally, notice what happened when we created what we thought was a reasonable request. Some very clever person showed up and created a very real function G. G was able to make our reasonable request say 'yes' when it should have said 'no' and 'no' when it should have said 'yes'.

If we set up rules which seem reasonable, but in reality are nonsensical, we run the risk of encountering clever people who can subvert our rules and manipulate them to enforce the exact opposite of what we actually wanted to happen.

Monday, February 11, 2013

Proving it with math

There are many complicated and sophisticated ways of proving things using mathematics. Unfortunately, I haven't been able to investigate most of them. But there are a few simple techniques that I've thought about.

Proving it with calculation.

Okay, so this one isn't what people normally think of as a proof. But it seems to be one of the easier ones to do. You just write a program that does the thing. And then you run it. It works? You're done. It happened, it's proved. QED.

So on one hand, you have the curry howard correspondence, which says that programs are proofs. And on the other hand you have ad hoc constructs, inconsistent compilers, undefined behavior, gamma rays from space flipping bits in your ram, sloppy programmers, and totally ill defined inputs, outputs, or even any idea of what the thing is even supposed to do.

Personally, I like this one the best. Probably because it's the one I'm good at. But there's also something satisfying about it to me. Not sure I can put my finger on it, but for an arbitrary *thing* I would rather show it with a lambda term than show it with type theory.

Proving it with deduction.

This one is pretty straight forward. If you have an 'A' and you have a rule which states 'A -> B', then you can conclude 'B'.

So basically, you gather all of the rules and initial conditions and then you see which way you can point it in.

This one makes me think about parsing. You have a language of primitive atoms (or letters) and a set of allowable transform rules. Then your goal is to search through a very large or infinite tree of possible transformation sequences ... looking for something that is "interesting".

I'm not exactly sure what to think about this one. However, it does seem really useful.

Proving it with induction.

That thing from high school or wherever. You know. Prove it for 0, then prove it for x (what did that step even mean?), then prove it for x + 1 (why does this help?), and then you're done.

So basically what's going on is that you're proving the *thing* for a real, specific, base case, then proving it for a general case, and then proving it for a general case modified in such a way that the last step could theoretically be repeated for every item in your set. ... so with the traditional x + 1 proof, this really only works if your set of possible inputs are natural numbers (ie 0 to infinity). If you start with zero and an increment function (ie + 1), then you are able to generate every element in the set of natural numbers.

So again, the theory is more or less that you just proved it for x + 1, so you could theoretically prove it for *every* natural number, therefore you did prove it for every natural number.

This always bothered me as a child because I really didn't understand anything that was going on. Then I learned set theory and got used to recursion (check out the little schemer and the seasoned schemer for great help getting your mind around recursion). Now when I think about induction, I just think recursion.

Proving it with contradiction.

So this one is probably the least satisfying proof method. And at the same time the most satisfying proof method. With programming you can basically assemble a set of instructions to do a *thing*. With deduction you have all of the steps to get to the *thing*. WIth induction you have a recursive sketch of the structure of the *thing*. With contradiction you just have the *thing* because the alternative is impossible.

So the basic idea is this. I don't know how to prove the *thing*. But let's assume the alternative. IE the *thing* doesn't exist or isn't true. Now, all we have to do is find a side effect of the *thing* not existing which is totally impossible (the contradiction). And presto chango *thing* must be true because otherwise *impossible* must be true ... and *impossible* ISNT true (because it can't be ... that would be impossible).

This isn't very satisfying if you are trying to understand why the *thing* is true or isn't true. If you're trying to understand the structure or form of the problem.

On the other hand. If you don't care about the structure or form of the problem. You just need the answer. Or if someone wants *thing* to be true or wants you to do *thing* (and you don't ... and you want to be a little bit too clever for your own good). For these types of things I have a hard time thinking of something that's more satisfying than proof by contradiction.

"Hey, here's a bunch of busy work to do in order to make sure that we are correctly handling situation X."

"Hmm. If X ever did happen, it would wipe out all of reality. Past and future. We're still here, so clearly X is never going to happen ... because then we wouldn't be here. We're handling situation X appropriately by definition."

ahh

Prove it

Normally, when we say "prove it", what we are really talking about is the process of strengthening the neurological pathways in our brain that allows us to short-circuit the expectation of certain outcomes. The more the expected outcome occurs, the stronger the connections become, the better we are able to react to the likely outcome.

The stronger our brain believes something to occur, the more freaked out it is when that thing *doesn't* occur. Our brains have a pretty good handle on how to deal with situations it has seen before. A brand new situation can be very dangerous or fatal. "Being surprised" and "being shocked" are synonyms. The shock of our brain freaking out from a non-standard occurrence that does not have well worn neuro pathways.

That is of course my armchair neuro-socio explanation for what most people seem to mean about proving something.

Scientific endeavors have a different meaning of that phrase. Proving something scientifically is about bayesian statistics. We run experiments to generate evidence to support that what we believe is correct, or more importantly generate evidence to support that what we believe is incorrect. The more evidence that we have for something being true the higher the expected probability that thing is actually true. The phrase, "extraordinary evidence for extraordinary claims," is about the bayesian nature of this process. If we have a bunch of very good experiments leading us to believe a certain thing is true and we have a so-so experiment leading us to believe that same thing is not true, then we will tend to believe that the large amount of good evidence trumps the small amount of so-so evidence. After all, it is more likely that the so-so small amount of evidence is wrong or indicating something other than the stated conclusions than for all of the large amount of good evidence to be wrong.

I have no idea of what to think about philosophical methods of 'proving' something. I should probably start reading the classic philosophical works when I get a chance. On the one hand it kind of seems like a bunch of hot air, but then again they do come up with some surprisingly good observations. For example, isn't it suspicious that induction can only be shown to work via a process of induction. And speaking of induction ...

Proving things in mathematics is much more certain than in other fields. The proofs have nothing to do with intuition or what is expected. The proofs have nothing to do with statistical extrapolation or gathering evidence. The proofs can be verified regardless of how your thought process works. The proofs start with a system of rules and follow them exactly, until they reach a useful conclusion. And within their system (assuming no mistakes were made ... and assuming that the system is valid) they are absolute.

... Of course outside of their system they are meaningless. So a good question is, "What's the point?" Isn't this the same question that's asked in every math class ever?

I think I have an answer to this question. I didn't come up with it because I was trying to answer the question. I came up with it because I was trying very hard to figure out how mathematics works. So if you have category theory and lambda calculus, why are they both classified as math. Surely this is not just someone's opinion. There's got to be some unifying characteristic that makes them both math. Additionally, if you have an arbitrary math, how do you know it works. Maybe you can prove that it's sound and complete and consistent, but how are we so sure that our proofs won't evaporate before our eyes. The rules don't have any special powers, we just made them up. Why should we be able to prove anything lasting about them.

My conclusion is that mathematics are abstract causality machines. Causality has the same philosophical problems that math does. We don't really know why effects have causes. And as far as the physical universe goes ... even if we are able to eventually prove (in the bayesian sense) that everything is in reality very small strings that are vibrating in 11 dimensions, we still won't know why those strings feel compelled to obey a certain set of rules.

So why is studying math useful? Well, we can't prove that it is useful (in the mathematical sense). But mathematics are just abstract causality machines. And as far we can tell, the vast majority of reality "runs" on causality. Understanding mathematics helps you to understand reality, in the general sense.

Thursday, February 7, 2013

Hard problems

Lambda calculus was created by Alonzo Church in the early 1930's. This tiny logic is turing complete. In fact, it was turing complete before turing completeness was a thing. Alonzo Church published his lambda calculus shortly before Alan Turing published his work with turing machines.

Every computable function is computable in a system which is turing complete. And what this means for lambda calculus is, basically, anything you can do in math you can represent in lambda calculus. However, originally, Alonzo Church was having some problems making lambda calculus do everything. He was having problems making it do a very simple function. The predecessor function. Or in other words ... subtract one from a number.

The story goes that Church thought about it very hard and was considering giving up on lambda calculus when his student, Stephen Kleene, went to the dentist. There Kleene reasoned out a way to implement the predecessor function by using triples (a list with exactly three items). The more modern version uses pairs.

So here's the point. Alonzo Church was smart enough to come up with lambda calculus in the first place. This is a system which is powerful enough to describe everything that we are capable of formally describing. But he wasn't smart enough to figuring the missing piece that would make basic subtraction work. That was solved by Kleene. Additionally, it's not like Kleene just walked in and solved the thing. He was clearly thinking about it very hard. So hard in fact that he was thinking about it even when he was off at the dentist.

Solving truly hard problems isn't a question of raw intelligence. Church was clearly very intelligent. And it's also not a question of just putting in the hours. Kleene solved the predecessor function while he was on a personal errand.

If you're working on a truly hard problem, you might not be able to solve it. And this is regardless of how smart you are. Additionally, there isn't a set algorithm for coming up with the solution.

I once tried to come up with the solution to the predecessor function on my own. I almost got there without cheating, but in the end I checked around online. Had I known that the predecessor function was so difficult to originally come up with, I probably would have worked harder to solve it on my own. Although I would have still been cheating. Because I knew that it was actually possible to solve.

This is the solution that I found that I was able to actually understand in the end. It took a lot of work to figure out, but my intutition was correct. My idea was that you needed to "unwrap the number" and then just "rewrap" it one less time than it's original total. [In lambda calculus numbers are typcially represented as a number of times that you do something else. Unwrapping a number is cognatively similar to unwrapping a loop as an optimization techique. Except you're not in it for the optimization.]

0 = λ f . λ v . v
1 = λ f . λ v . f v
2 = λ f . λ v . f ( f v )

pred = λ n . λ f . λ v . n ( (λ a . λ b . b ( a f )) (λ u . v) (λ i . i) )