Friday, April 26, 2019

Problem Calculus: Spaces

  • Semantically Homogeneous:  A space where each dimension can be combined into a larger uniform multidimensional space.  Movement in each dimension is effectively the same as movement in any other dimension.  Only differing by a rotation.  For example, a function that has the signature "int * int * int -> unit" where we are representing a point in 3D euclidean space.  Each parameter is a different axis (x,y,z) in the coordinate plane.
  • Semantically Heterogeneous:  A space where each dimension cannot be combined into a larger uniform space because each dimension serves a radically different purpose.  For example, if you had the function signature from before "int * int * int -> unit", but this time if the first "int" represented "age", the second one represented "width", and the last one represented "RGB color".  In this instance the different dimensions of the space aren't uniform and each have different meaning.  You can still plot a 3D point in this space, but the meaning of moving in one dimension is radically different than the meaning of moving in another dimension.  
  • Indexed Dimension:  Where the value of one of the dimensions modifies the semantics of another of the dimensions.  For example, the type "bool * int * int" may have the meaning:  "true * x * y" and "false * r * θ".  When the boolean is true, then the space indicates a typical coordinate plane.  But when the boolean is false, then the space indicates a polar coordinate plane.  
  • Indirectly Related Dimension:  Typically, a space will have dimensions where each one is free to associate and relate to any other without any effort required.  For example, if you have "x,y,z" then you are free to consider "x + y" or "x + z" or "y + z".  Each one can reach each other space directly.  However, if your space instead looks like "linked_list<int; 3>" then the first dimension reaches the second one directly.  And the second dimension reaches the third one.  In general it makes sense to start thinking about all dimensions that are related by some predicate.  For example, all even dimensions or all dimensions separated by 18 other dimensions.  
    • Omni Related:  Typical dimensional relationships.  Everything can access everything else and it doesn't make sense to pair off the dimensions.
    • List Related:  Like the linked_list example above.  The dimensions are related to each other like on a number line.
    • Tree Related:  If you're building a tree, then the relationships can get more complicated yet.  Perhaps you want all dimensions that belong in a certain sub tree.  Or worse yet perhaps you care about dimensions that are siblings or cousins with some specific path or families of paths between them.
    • Graph Related:  Worse case scenario is that your dimensions have arbitrary relationships to other dimensions, but not omni related (or maybe sometimes they are omni related ... you have to check each time).
    • Other?:  There are generalizations of graphs, but I'm not sure things fundamentally change.  A hypergraph can be represented by a specialized graph and the relationships between dimensions are going to be preserved.
  • Abstraction Confusion:  If you have a graph (also applies to lists and trees) and the contained objects in each node have the possibility to reference the containing graph nodes.
  • Isolation Confusion:  If you have a graph (also applies to lists and trees) and the contained objects in each node have the possibility to reference contained objects in other nodes of the graph.

Friday, February 1, 2019

Problem Calculus: Index




Problem Calculus: Paths and Non-Overlapping Arrow Spaces

In addition to everything we've covered so far, you can also encounter families of spaces that have properties that make it complex.

The first thing to consider is when you have more than one path that gets you to the same destination space when starting from a given source space.


In the diagram above, we're starting with a starting element called "A".  Two arrows can take A and go to one space or another.  Finally, they arrive at a final space.  The question is:  Are the final elements equal to each other.  Ideally, you want the final two elements to be equal.  This property allows you to worry less about the spaces you're going to and the spaces you're coming from.  If this holds for all elements in the space, then you can consider the multiple paths as if they were the same and it is easier to form an intuition about the system.

Additionally, you have to be concerned about arrow spaces (input and output) that are non-overlapping in any given main space.  The issue is that it becomes difficult to know when the output of one arrow can be used for the input of another arrow.  Additionally, when trying to decide what paths exist in your system, you have to be concerned about whether the non-overlapping arrow spaces allow a path to exist.  And some paths will only sometimes exist if the arrow spaces are only partially overlapping.

Here are two arrow spaces where the incoming arrows form partial overlapping spaces.


Here are two arrow spaces where the outgoing arrows form partial overlapping spaces.


Finally, here is an incoming arrow forming a partial overlapping space with an outgoing arrow space.


All of those spaces are arrow input or output spaces that live inside of a source/destination space.  The outer space is being left out in the previous examples.

Consider the following diagram.


This diagram shows two possible paths that go through the featured space.  In one instance the arrows form partial overlapping arrow spaces.  This path will only sometimes work because some elements in the output space will be outside of the input space for the next arrow.  Additionally, one of the output spaces is completely non-overlapping with the input space of the exiting arrow.  Even though the arrow looks like it should produce values that can be used for the exiting arrow when you make a further analysis you'll discover that the elements will never be usable.

Wednesday, January 30, 2019

Problem Calculus: Arrow Complexity

With complexity projection we saw how an arrow could be difficult to comprehend because of the complexity of the spaces that it is related to.  However, arrows can also be hard to comprehend (or at the very least increase the complexity of the system as a whole) because of qualities that they have which have little to do with the spaces they are involved with.

The first type of arrow that is difficult to comprehend is the discontinuous arrow.


A discontinuous arrow is problematic because small changes in the input of the arrow will result in large (or unpredictable) changes in the output.  This will hinder building an intuition with how the arrow works.  This can be highly problematic because it means that the mostly likely way people will use it will be to just kind of try things until they get the right output.  This is mostly fine as long as two things are true 1) the input remains static so it doesn't change once the 'correct' value is picked and 2) the value never needs to be changed without allowing for sufficient analysis to ensure that the output will be okay.  

Discontinuous arrows are also problematic if the destination space is complex.  For example, a destination space with many invalid elements will not pair well with a discontinuous arrow because it will be very difficult to determine when a small change will knock your output value into an invalid value.

Next, non-deterministic arrows.


Non-deterministic arrows are difficult to comprehend because a single input element will produce different output elements at different invocations of the arrow.  This is problematic because you have to handle all possible output elements, but if you're not paying close enough attention you might not notice that different possibilities are even something that can occur.

And finally there are non-injective arrows.


An injective arrow would be one that every input element uniquely maps to a corresponding output element.  Non-injective arrows have more than one input element that can map to a given output element.  This is problematic when you are unaware or forget that you are dealing with a non-injective arrow.  Seeing a given output it is easy to make a bad assumption and believe that you know what the input element was.  The assumption will even sometimes be correct, which will further hinder a correct understanding of the system.

Problem Calculus: Complexity Projection

Early on in my investigation into cognitive complexity, I noticed that you couldn't necessarily separate the difficulty of understanding a concept from the other concepts that interact with it in the system that you are concerned with.  This was a bit surprising because before that I had done a bunch of work with type theory, lambda calculus, and functional programming.  In all of these fields being able to compose functions such that the internals are hidden is a major and recurring concept.

I didn't immediately realize that it should be part of my development of problem calculus, but I set it apart to be examined later.  However, eventually I realized that separating concepts into different modules, objects, or functions (to take a programming perspective) doesn't necessarily make a concept easier to deal with.  Sometimes the complexity of a concept leaks.  I noticed that my previous idea already stated this exact concept.  And thus Complexity Projection was born.


We've already encountered arrows in the previous problem calculus diagrams, however this arrow doesn't indicate a transform from one space to another.  This arrow indicates the complexity from one space leaking into another space.

Currently, my assertion is that complexity leaks in the following manner.


The destination space will leak complexity into the output space for any given arrow.  The output space will leak complexity into the input space of the related arrow.  The input space will leak complexity into the source space.  And finally the source space will leak complexity into any arrows that map it to other spaces.

Leaking complexity basically means that if one thing is difficult to deal with cognitively, then anything that the concept leaks complexity into will also be similarly difficult to deal with cognitively even if it is otherwise simple.  The term I've been using for this leakage is complexity projection.

For justification about the direction of the projection, consider the previous blog post about invalid elements.  If you have an output space with no invalid elements, but the destination space has many invalid elements, then the output space is going be harder to handle because it will admit elements that do not work with the space it is a subspace of.  For a contrived example imagine the function "f(x) = x + 1".  The output of this function is any number.  Now imagine that the destination space is the set of all odd numbers.  Because the destination space only lets you have odd numbers, the function output space is harder to deal with because you have to make sure that you do not clash with the requirements of the destination space even though it is not a requirement for the output space.

Similarly, the output space projects complexity onto the input space because a set of valid inputs will need to walk on eggshells to avoid failure from the set of invalid outputs.  The same principle again arises when projecting complexity from the input space to the source space.  And finally any arrow that goes from one space to the other will have all the complexity from the entire system projected onto it.  Or in other words, arrows between two spaces are hard to use because the spaces themselves are hard to comprehend.

As an example of this principle consider compilers.  A compiler is an arrow that transfers an element from the space of strings (ie source code) to the space of binary executables.  The space of strings is actually really simple.  It's the set of all possible character combinations.  However, the input space is already highly constrained.  It has many invalid elements (ex invalid parsing tokens) and the space itself is highly semantically heterogeneous.  There are many different concepts that all get mashed together in the source code.  The input space projects its complexity onto the source space.  Similarly, the output space also has many invalid elements (ex ill typed programs).  This also projects complexity.  Finally, the whole system projects onto the arrow (ie the complier).  So we shouldn't be surprised that compilers are hard for beginners to get used to even though the input (just a bunch of words and brackets) doesn't seem to be innately complicated.

Tuesday, January 29, 2019

Problem Calculus: Diagrams and Invalid Elements

Previously, we talked about Semantically Homogeneous and Semantically Heterogeneous spaces.  Let's use the following diagram to represent a semantically homogeneous space.



Because the space we're interested in can contain arbitrary dimensions, we're just going to use a single continuous line to represent a single semantic concept.  Similarly, we're going to represent semantically heterogeneous spaces as follows:


Multiple semantic concepts are represented by multiple lines.

In addition to degrees of semantically heterogeneous spaces causing difficulty in comprehending a problem, there are also issues when you encounter a space where there are invalid elements within it.

The original idea was to be concerned with topological holes in a space.  Similar to the issues with path connected spaces, topological holes may look like they have a good theoretical basis, but then you have to worry about constructing topological spaces.  Topological holes definitely need some more attention, but being concerned about invalid elements should be sufficient for now.

For example, if your space is intended to be even integers then any given odd integer would be an invalid item.  Consider a more complicated example.  Imagine a binary frame or packet used in a communication protocol.  Because binary data is binary data, in order to represent a list of items you are required to either provide a count or only be allowed a predetermined number of items.  An invalid binary packet would be a packet that had an invalid count field.


Once you consider that a space might have invalid elements in it, then you can also be concerned about the complexity of the invalid elements themselves.


Ideally, your invalid elements will form a semantically homogeneous space.  Or it might form a heterogeneous space.  Or the invalid elements might be sporadic and random without pattern.  Similarly, any given invalid subset of elements within a sporadic collection might itself form a homogeneous space, a heterogeneous space, or a sporadic space. 

Monday, January 28, 2019

Problem Calculus: Semantically Homogeneous vs Semantically Heterogeneous

My previous Problem Calculus and Cognitive Complexity outlines assert that spaces should be path connected in order to avoid complexity.  I think that this concept can be related to Semantically Homogeneous and Semantically Heterogeneous.  The argument is that Semantically Homogeneous spaces approximate path connected spaces AND the quality that you need in a space to have an easily understood space is a semantically homogeneous space.

So what's the difference between a function that looks like "func(x : int, y : int, z : int ) -> int" and a function that looks like "func( p : point3d ) -> int".  The problem calculus argument is that they are probably the same.  The source spaces for both functions are semantically homogeneous.
  
The source space involves three different dimensions that all "belong" together.  The space is one coherent component.

In order to create a semantically heterogeneous space, you can take the same function from before "func(x : int, y : int, z : int ) -> int", but instead of having the input parameters representing a point in 3D space you can have it represent some 2D point in a family of 2D spaces.

Here we have what appears to be the same thing, but the problem calculus diagram is radically different.  

Semantically homogeneous spaces are easy to comprehend because the dimensions that go into them only mean one thing.  Semantically heterogeneous spaces are hard to comprehend because the different dimensions can all indicate something potentially different.  Each different thing needs to be considered separately from the other things.  And this can become arbitrarily difficult as the space contains arbitrary more concepts that need to be understood on a case by case basis.

Here's an example of a space that is composed of a color dimension and an intensity dimension.  The space is semantically heterogeneous because the meaning of the intensity dimension changes meaning depending on what color is present.  The basic idea is the same, but the eventual output is going to be radically different.

Ideally, I think that path connected spaces probably provide a better theoretical basis, however having to worry about constructing topological spaces is probably more work than it's worth.  Additionally, there is going to be some difficulty in ensuring that the function used to show that the space is path connected is the "right" one.  There may be a space that is difficult to comprehend, but a sufficiently clever individual can figure out a function that shows that it's path connected.  Semantically homogeneous vs heterogeneous is hopefully easier to determine objectively.

Thursday, January 24, 2019

Problem Calculus: Some Terminology

We're going to be using Spaces and Arrows to form the problem calculus diagrams.  Let's look at some basic definitions.


Given an arrow there is a Source space and a Destination space that can be thought of as the type of the inputs and the type of the outputs of the arrow.  Similarly, the arrow might only accept a subset of the objects in the Source space and produce a subset of the objects in the Destination space.  The subsets are called the Input space and Output space, respectively.

Wednesday, January 23, 2019

Problem Calculus: Motivation

Why a calculus for problems?  

With respect to programming, the software engineering industry doesn't really have very good metrics for what makes for good source code.  There is the concept of code smells (the name of which is already suspicious) where an individual "knows" that something is wrong, but is relying on "good taste" to justify the decision instead of being able to actually describe why things are problematic.  Best practices won't necessarily cross problem domain or programming language boundaries.  And they have the dubious property of being attached to successful people and projects as a justification.  Is the practice a predictor of success or was the people or was the problem?  There's always experience, which always works until of course it doesn't.  The underlying theme here is ultimately appeal to authority.  Problem calculus suggests a way to have objective arguments about what code is good and what code is bad.  The principles will also cross domain and language barriers.  

Let's move onto studies for programming languages and development methodologies.  The biggest factor in all of the studies I've seen is that they do not want to contaminate the study by using experts.  The argument goes that if you run a study with someone who is an expert then their familiarity will be the determining factor, but not whether or not the concept under study is good or bad.  So, the studies use novices and students.  This is of course problematic because any conclusion you reach can really only be known to apply to novices.  The hope of course is that knowing that some concept is better for a novice will also hold for an expert, but that's wishful thinking at best.  What we want is a way to run these studies with experts without having to worry about previous experience influencing the results.  Using problem calculus we can create novel problems that do not have their origin in anything familiar.  Because the problem is artificially created, we can control how difficult the problem is and we can make sure that the problem does not have its origin in a known concept.

Finally, when should you try to solve hard problems and when should you look for a way to simplify your situation.  Using problem calculus you can determine the nature of the difficulty that you are facing.  Some difficult issues can be simplified and some cannot.  There are trade-offs that you can make so that you can control the type of difficulty that you're facing.  When are you facing a hard problem and when are you facing an unfamiliar problem.  Verify the situation you're in.

Saturday, January 19, 2019

Problem Calculus: Introduction

The concept of problems is too general and open-ended to be measured.  A fence may represent a problem for a criminal but it's the solution to the homeowner.  But if the homeowner has a dog then the fence might also be the solution to the criminal to defend against the dog.
 

When something can be a problem and a solution depending on your perspective or even a problem and a solution regardless of perspective, then you've got to narrow down what you mean by 'problem' to make any headway into describing the difficulty of any given problem.

For problem calculus, we're going to focus on concepts that are difficult for people to form an intuition about.  Ideas that are hard to navigate in a mental landscape.  It will be interesting to see if the framework will be applicable for machine learning algorithms as well as people.

One thing to keep in mind is that we will be able to measure some problems and form intuitions both that they are difficult and why they are difficult, but we may not be able to determine which problem is more difficult. 

Different problems may be difficult for different reasons.  And for that matter, different solutions to the same problem may be difficult for different reasons.  In this instance the best we can attempt is to determine how to make a given problem more simple or more complex.  For example, we can't tell which is harder, brain surgery or rocket science, but we can tell that they are hard AND we can tell that if you had to play tag while also doing  brain surgery it would be harder than doing brain surgery without playing tag.