Contemplating the comments on my last post, I began thinking about Ockham’s Razor versus Darwinian Evolution. Both of them can be used as heuristics or algorithms for creation, invention, and discovery. In 1964, Ray Solomonoff proposed A Formal Theory of Inductive Inference (Parts I and II). His theory is an Ockhamian algorithm for searching through the space of computer programs. It inspired Gold’s language identification in the limit and Kolmogorov complexity. On the other hand, genetic programming is an evolutionary algorithm for searching through the space of computer programs. It is interesting to compare these two approaches to searching in program space.
In Solomonoff’s 1964 approach, each program is treated as a black box. This is inefficient, because the knowledge that is gained from evaluating one program cannot be applied when evaluating another program. On the other hand, genetic programming combines pieces from programs to create new programs. The knowledge that is gained from past programs is used in the construction of new programs. Thus we should expect this Darwinian algorithm for searching in program space to perform better than an Ockhamian algorithm for searching in program space. In fact, genetic programming has some impressive successes. As far as I know, Ockhamian algorithms for searching in program space have not achieved anything this impressive. (Let me know if I’m wrong.)
I looked in the index of Peter Grünwald’s The Minimum Description Length Principle and in the index of Marcus Hutter’s Universal Artificial Intelligence, and I found Solomonoff and Kolmogorov, of course, but there was no mention of genetic programming or John Koza. I looked in the index of Koza’s Genetic Programming, and there was no mention of Solomonoff or Kolmogorov, but Chaitin is mentioned. Most interestingly, Ray Solomonoff has a 2006 paper, Machine Learning - Past and Future, in which he talks about his research in Genetic Programming:
Genetic Programming is my second area of recent research. Koza’s Genetic Programming system has been able to solve many difficult problems of very varied kinds.
Solomonoff was born in 1926. There’s a lesson for us young people.
Filed under: Computer Science, Evolution, Philosophy of Science | Tagged: age, Darwin, genetic programming, Ockham, Solomonoff
Isn’t a more rational statement of the “Ockhamian” hypothesis:
Given a training bit string and a test bit string, and two genetic algorithms evolved to fit the training bit string, the performance of the shorter of the two programs will tend to be better on the test bit string.
?
Given a training bit string and a test bit string, and two genetic algorithms evolved to fit the training bit string, the performance of the shorter of the two programs will tend to be better on the test bit string.
This is a reasonable hypothesis. It is worth performing some experiments to test it. I predict that there are some tasks (i.e., problems, applications) that will support the hypothesis and others that will not. This is consistent with my argument that simplicity is merely a particular kind of inductive bias. Like any inductive bias, it works when it corresponds with the world, but it doesn’t always correspond.
Isn’t a more rational statement of the “Ockhamian” hypothesis …
More rational than what? I am calling the approach of Solomonoff (1964), Grünwald, and Hutter “Ockhamian” and the approach of Koza “Darwinian”. I am pointing out that Koza’s approach has been more successful. What statement are you calling less rational than your hypothesis?
Peter,
evolutionary algorithms are cool, and I agree with your response to me on the previous post that for creativity, the evolutionary approach seems to be more able to account for innovation than pure Occam’s razor.
Evolution does it’s work by variation and selection - so you have to have random mutations (to introduce new “information”, not previously contained in the population) - and then the environment weeds out individuals which deteriorated in performance compared to previous iterations (the environment defines fitness, that is the important part for my argument below).
So let us posit for a moment that this is what happens in the brain: neurons forming new synapses, rewiring etc (http://en.wikipedia.org/wiki/Neural_plasticity); this happens mostly (only?) in the subconscious; and would correspond to our variation step above.
The ideas begin to trickle upward into consciousness (that is, more and more parts of the brain are picking up the pattern or it is appropriately assembled in the brain part picked out by your favorite theory of consciousness (http://en.wikipedia.org/wiki/Consciousness#Cognitive_neuroscience)
Incidentally, mine (at the moment :-)) is this one by Tononi:
An information integration theory of consciousness
http://www.biomedcentral.com/1471-2202/5/42
But the problem is this: there are so many possible theories, neuron connections, symbol aggregations, what have you, that it is surprising that we come up with good theories at all - so there must be some kind of razor/bias at work in the brain - a bias selected for by evolution.
This leads to the question why it was selected - because it conformed to the environment! So you can get a justification for using Occam’s razor (or it being also used by your subconscious brain) by appealing to evolution and then inferring from the results to the environment.
For the metaphysical justification of Occam’s Razor. see for instance:
Russell Standish’s “Why Occam’s Razor?”
http://arxiv.org/abs/physics/0001020
So, true creativity seems to need to combine random combinations with rapid pruning by Occam’s razor.
it is surprising that we come up with good theories at all - so there must be some kind of razor/bias at work in the brain - a bias selected for by evolution.
I agree the brain must use some kind of inductive bias, and that evolution has selected that inductive bias. The question is, what exactly is the inductive bias that is embedded in our brains? I don’t think anybody has a good answer to this question, yet.
You believe that the inductive bias in our brains is Ockham’s razor. It seems that the basis for this belief is introspection, but introspection is known to be misleading and unreliable. Furthermore, the various formalizations of Ockham’s razor don’t always agree on what is simple and what is complex.
What science requires is experimental comparison of a wide variety of precisely defined inductive biases on a wide variety of tasks. We cannot use pure argumentation to show that Ockham’s razor is the best inductive bias. We need to do experiments. Geoff Webb has done some experiments, and the results don’t look good for Ockham.
Like any inductive bias, it works when it corresponds with the world, but it doesn’t always correspond.
So the problem for our experimental design is how to fairly sample “the world”.
For example, it is fairly easy to cheat Ockham here by generating a small set of noisy data points from a simple function like x^2 in the training set, but then find a function that fits the noisy training data without error, and use that, more complex function, to generate the test data.
More rational than what?
By “rational” I mean it in the literal sense of taking a “ratio” with proper dimensionality — of zero in the case of probabilities such as number of apples that fall without shaking the tree to total number of apples as opposed to apples that fall from the tree to nouns in the English language.
The Ockhamian approach can be used without regard to the method of coming up with candidate programs, so to be really fair you have to use the same method, whatever it is, to find the two programs to be compared for length under the training data. Length under the training data, of course, includes encoding the errors in a way that is sensitive to the measurement error.
So the problem for our experimental design is how to fairly sample “the world”.
This is the right question to ask. The UCI Data Repository is one answer; maybe not the best, but it’s a start.
The Ockhamian approach can be used without regard to the method of coming up with candidate programs
This is good: You’re backing away from the claim that an Ockhamian approach is suitable for both creating (i.e., discovering, inventing) and evaluating, to the claim that it is only suitable for evaluating.
The UCI Data Repository is one answer
What is the prior research using the UCI Data Repository as a fair sample of “the world” to test hypotheses of machine learning?
Since virtually all machine learning systems divide datasets up into training and test parts, those previously established results can be leveraged in a test of Ockham’s Razor by finding pairs of programs that are equally fit on the training part, encoding their errors to yield lossless compression of the training sets and then seeing which of those compressed training sets are shortest. Ockham’s Razor predicts that the program with the shortest compressed representation of the training data will, in general, be the best fit of the lot on the test data.
You’re backing away from the claim that an Ockhamian approach is suitable for both creating (i.e., discovering, inventing) and evaluating, to the claim that it is only suitable for evaluating.
Well, actually, I don’t recall claiming that it is suitable for creating as opposed to evaluating. But in the case of evolution, since evaluation is part of creation, the difference appears to be less than clear cut.
I can’t, of course, speak for others, such Grünwald or Hutter since I am incompetent to understand, let alone critique or defend their “top down” approach to Universal AI.
Very interesting weblog. I´m interested both in machines through genetic algorithms and in the reason why reality appears to adopt the most simple mathematical laws among the ones compatible with the facts.
My conjecture links both aspects: By the anthropic principle and the multiverse hypothesis, it appears that the universe in which we live is the most simple possible, because for biological organisms to “learn” instinctively through variation and selection (and, thus, to learn the world through genetic algorithms), it is a requisite that the fitness landscape, the world, must obey simple, linear, and smooth laws at the macroscopic scale most of the time for most of the environments.
A more complex universe may need much more time for life to evolve, and this time could be more than the life span of the entire universe.
Chaotic and nonlinear phenomena must be marginal effects of underlying microscopic linear laws that describe the rest of the world (for example, a local turbulence of water obeys the same simple hydrodynamic laws as laminar flow).
May this be the deep link between Occam razor and genetic algorithms?
it is a requisite that the fitness landscape, the world, must obey simple, linear, and smooth laws at the macroscopic scale
You are essentially repeating what Günther Greindl said above. Let me repeat my answer: I agree the brain must use some kind of inductive bias, and that evolution has selected that inductive bias. The question is, what exactly is the inductive bias that is embedded in our brains? I don’t think anybody has a good answer to this question, yet.
You say the answer is “simple, linear, and smooth laws”. There are three hypotheses here that require empirical testing. As I mentioned, Geoff Webb has done some experiments, and the results don’t look good for Ockham (i.e., simplicity as an inductive bias).
Reading briefly over Webb’s paper, it appears he is not defining Ockham’s Razor in terms of approximating Kolmogorov complexity.
A lot of semantic and epistemological issues evaporate if you simply adopt Kolmogorov complexity as a framework.
Do you have any empirical evidence against Ockham’s Razor, where it is defined as approximating Kolmogorov complexity?
A lot of semantic and epistemological issues evaporate if you simply adopt Kolmogorov complexity as a framework.
I disagree. For example, there is the problem that Kolmogorov complexity is not computable.
Do you have any empirical evidence against Ockham’s Razor where it is defined as approximating Kolmogorov complexity?
Genetic programming has been more successful as a research paradigm than Kolmogorov complexity. This is empirical evidence against Kolmogorov complexity as a research paradigm. Solomonoff himself has admitted as much; he seems to feel that the best hope for the Kolmogorov (Solomonoff) paradigm is to merge with the genetic programming paradigm.
A better question is, do you have any empirical evidence in support of Kolmogorov complexity? If not, then why do you believe something without evidence?
There are two main factors in Kolmogorov complexity as a research paradigm: (1) the idea of searching in the space of computer programs (i.e., universal Turing machine programs) and (2) the idea of preferring shorter programs. The first idea is also part of genetic programming. I don’t have any problem with the first idea. The problem I have with the second idea is that there is no evidence to support it. Why are you arguing so strongly for Kolmogorov complexity? I think you are attracted to the power of the first idea. But you can keep the first idea and drop the second idea, or at least remain agnostic about the second idea, until it is supported by empirical evidence.
Peter,
I don´t think that I say anything similar to what Günther Greindl says. I say that, considering
- the weak anthropic principle
- the multiverse hypotesis, where each universe can have it´s own mathematical formulation (see Max tegmark Which mathematical structure is isomorphic to our Universe?)
- the fact that Darwinian evolution need a relatively simple, continuous , smooth fitness landscape, not plagued with random peaks
it follows that we must live in a certain universe where macroscopic phenomena must obey smooth, continuous, and parsimonious laws for the fitness landscape to be that way; that is, to permit life.
It is our universe, the thing that has been selected for life to exist inside just because it is simple, so we succeed when we try to explain real phenomena through the most smooth, simple, continuous, and parsimonious laws (some of them, that does not work always, I agree with you). Our universe has the bias. It is not by chance.
Our universe has the bias.
I agree that some inductive biases work better in our universe than others, and that this is due to the structure of our universe. Where I disagree with you and Günther is concerning exactly what inductive biases work best. I believe that we have not yet done the required experiments that would support any well-founded claims about the inductive biases that work best in our universe. You claim the biases are smoothness, continuity, simplicity, and parsimony. These are certainly popular biases, but what evidence do we really have for them? It seems to me that almost all of the arguments in favour of these biases rely on appeal to intuition and introspection. Where are the experiments? Where is the science?
Furthermore, these terms, smoothness, continuity, simplicity, and parsimony, are vague. There are formal definitions for all of these terms. Now we need to show that the formal definitions really work, on a wide variety of real-world problems. Instead, researchers tend to justify the formal definitions by claiming that they capture our intuitions. This is not science.
You are trying to support your favourite inductive biases (smoothness, continuity, simplicity, and parsimony) by pure argument, without experimentation. This is not science. If you want to convince me, stop arguing and start experimenting.
By the way, I would call parsimony and simplicity synonymous, but smoothness and continuity are distinct from each other and distinct from simplicity. So you are advocating three distinct inductive biases (smoothness, continuity, and simplicity), yet you talk about them as if they were all the same. They are not.
Here is a real-world example of a situation in which the continuity bias leads to false predictions. Suppose that shares in the XYZ company are selling for $100 each at 1:00 PM. Then there is a news report about fraud and scandal in the XYZ company, and the share prices drop to $5 each at 2:00 PM. It is wrong to assume that there is a time between 1:00 PM and 2:00 PM at which the share price is $50. Are you going to tell me that the share price must pass through $50 at some time between 1:00 PM and 2:00 PM, because of the anthropic principle?
Yes, I am too fuzzy because this line of thinking has just started. It is in a state of speculation. Sorry.
About your counter-example, I have to say that discontinuous phenomena do not not imply discontinuous causal laws. For example, even the most turbulent liquid flow is produced by simple linear hydrodynamic laws. This also applies to the weather.
In your example, linear, continuous laws, applied to complex systems, that make decisions, like brains and computers, produce a final result that sometimes is discontinuous or chaotic, but this says nothing about the basic laws that govern brains and computers.
Other example, an asteroid in direct path to the heart follows linear and continuous physical laws of inertial movement, gravitation, conservation of energy etc. But the macroscopic effect are destructive, abrupt discontinuous and chaotic.
But the other way around is not true: chaotic or discontinuous laws does not produce continuous or non chaotic phenomena.
-complicated laws produce complicated phenomena
-in some cases, underlying linear laws produce linear environments (with smooth fitness landscapes) where life can evolve.
These are the strong point in my argumentation.
Anyway, I take your advice.
About your counter-example, I have to say that discontinuous phenomena do not not imply discontinuous causal laws.
It appears that the fundamental laws of quantum mechanics are discontinuous, according to our current best physical theories. Space is divided into Planck lengths and time is divided into chronons. Therefore stock prices are likely discontinuous at both the macroscopic and microscopic scales.
Fractals seem to be good at modeling many phenomena. A fractal may have a regular, predictable structure, yet be nonlinear, non-smooth, and discontinuous.
Related:
(1) Simplicity (Stanford Encyclopedia of Philosophy)
(2) David Dowe’s Occam’s razor links