The Theoretical Foundation of Statistical Semantics

George Furnas defines Statistical Semantics as the study of “how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access”. Experiments show that statistical semantics works and it has many applications, but we may wonder why it works. How is it possible to figure out what words mean, simply by looking at patterns of words in huge collections of text? What is the theoretical foundation of statistical semantics?

I don’t have a complete and statisfying answer to this question, and we might not have an answer until we achieve true artificial intelligence. It’s tempting to say that the success of neural networks, machine learning, and statistical natural language processing is evidence that the human brain is fundamentally based on statistical algorithms, therefore it is not surprising that semantics should be based on statistics. However, this answer is only avoiding the question, by expanding its scope. Why do neural networks work?

In artificial intelligence, there is a debate between connectionism and computationalism. We may say that connectionism corresponds to statistical approaches to artificial intelligence and computationalism corresponds to logical approaches. It seems that statistical semantics must fall in the connectionist camp. There is a long tradition of looking at semantics from a logical perspective, and this leads to the question of how semantics can survive in the connectionist camp, without its traditional foundation in logic.

Peter Gärdenfors suggests a way out of this dilemma that I find appealing. In Conceptual Spaces: The Geometry of Thought, he presents a geometrical approach to semantics, as a kind of bridge between connectionist and symbolic approaches. I think this geometrical approach may provide a good foundation for statistical semantics. A geometrical approach is also offered in Geometry and Meaning and The Geometry of Information Retrieval.

Linear algebra, which is the basis for Latent Semantic Analysis, brings together all three of these themes: statistics, geometry, and logic. Geometry and Meaning shows how we can do Boolean logic with linear algebra operations. LSA is statistical in the sense that it can be used to find statistical patterns in data; it is geometrical in the sense that it is based on vectors, lines, and spaces; and it is logical in the sense that it supports Boolean operations. There are many gaps that need to be filled in here, but I can dimly see a theoretical foundation for statistical semantics.

4 Responses

  1. “we might not have an answer until we achieve true artificial intelligence”

    Depends what you mean by “having true AI”. If you achieve this goal by building a machine that can simulate the brain very well, how would this tell you anything about semantics?

    Apparently, we can almost clone human beings right now. Assuming a human being is a Turing machine, we have, in effect, (almost) already achieved true AI. What did this buy us in terms of understanding what intelligence is?

    Ok. So a human being is not a digital computer. Well, conceptually, if we accept a human being is a Turing machine, and a digital computer is a Turing machine, then what is the difference? Really?

    Soooo… maybe you shouldn’t be so pessimistic. Maybe we can make progress into understanding what is intelligence without caring so much about achieving true AI.

  2. The geometric approach to articulation, in general is interesting, since the very concept of articulation is used in both a linguistic and spatial sense. In this respect, I have to wonder whether formal theorem proving within formal languages has been pursued within a geometric paradigm? It seems the existence of many spatial paths between points corresponds to the many ways in which expressions may articulate the same identity within the formal system.

  3. Briefly, if you have two vectors, x and y, the Boolean operation x OR y is represented by the subspace spanned by x and y. A limited kind of negation, x NOT y, is represented by the projection of x onto the subspace that is orthogonal to y. For more information:

    Dominic Widdows and Stanley Peters. Word Vectors and Quantum Logic: Experiments with negation and disjunction. Eighth Mathematics of Language Conference, Bloomington, Indiana, June 20-22, 2003, pages 141-154.
    http://infomap.stanford.edu/papers/quantum-senses.pdf

    This paper discusses an approach to conjunction (AND), but it has not yet been implemented. A limited kind of conjunction can be based on disjunction (OR) and the limited negation.

    More papers by Widdows are here:

    http://www.maya.com/local/widdows/

    Geometry and Meaning is an excellent introduction to this material.

  4. By the way, my PhD supervisor, Alasdair Urquhart, discovered a connection between relevance logic and projective geometry:

    Alasdair Urquhart, “Relevant Implication and Projective Geometry.” Logique et Analyse (September 1983), 26 [103-104]:345-357.

    Abstract: “This is an informal exposition of recently discovered connections between the areas of the title. I show how to construct a model for the logic of relevant implication from a geometry. This construction leads to an undecidability proof for the propositional logics R and E, and many other relevant logics.”

    http://en.wikipedia.org/wiki/Relevance_logic
    http://en.wikipedia.org/wiki/Projective_geometry

Leave a Reply