Unified Latent Analysis

In Latent Semantic Analysis, we use a large collection of text to build a matrix, in which the rows represent words and the columns represent chunks of text. A chunk can be a sentence, a paragraph, a document, or any sequence of words. The value in a cell in the matrix is based on the frequency of the corresponding word (row) in the corresponding chunk (column). The matrix is called a term-document (word-chunk) matrix. With this matrix, we can calculate the attributional similarity between a pair of words (two rows), a pair of chunks (two columns), or a word and a chunk (a row and a column). LSA has been successfully applied to information retrieval, synonym recognition, essay scoring, and clustering.

In Latent Relational Analysis, we use a large collection of text to build a matrix, in which the rows represent pairs of words and the columns represent patterns of text. A pattern is a short sequence of words and wild cards, where a wild card can match any word. The value in a cell in the matrix is based on the frequency of strings of words in the text collection that match the corresponding pair of words (row) and the corresponding pattern (column). The matrix is called a pair-pattern matrix. With this matrix, we can calculate the relational similarity between two pairs of words (two rows). LRA has been successfully applied to answering multiple-choice word analogy questions from the SAT college entrance test and to classifying the semantic relations between nouns and their modifiers.

LSA and LRA have much in common and it is natural to wonder whether they can be unified. I’ve been trying to think of an elegant scheme that combines them, allowing us to calculate both attributional and relational similarity in the same framework. Patterns can subsume words, chunks, and pairs, so one possibility is a pattern-pattern matrix, but there are so many possible patterns that this matrix could easily exceed the capability of today’s computers, unless the patterns are constrained in some way. I haven’t yet figured out a clean way to constrain the patterns.

More exotic ideas for unification involve some kind of multi-resolution data structure, with chunks arranged in layers, corresponding to their sizes. A related idea comes from Gentner’s paper, Why We’re So Smart. Gentner points out that many words that at first seem to refer to objects, on closer examination actually refer to relations. For example, the word weapon seems to refer to an object, but whether an object is a weapon depends on the intention of an agent towards the object. A stone can be simply a stone, or it can be a weapon if an agent intends to use it as such. A gun can be a weapon, or it may be sport equipment in the hands of a sport shooter. By mapping relations between pairs of words (e.g., instrument and aggressor) to single words (e.g., weapon), it may be possible to unify attributional similarity (similarity between single words) and relational similarity (similarity between word pairs).

4 Responses to “Unified Latent Analysis”

  1. Great post (for the few who are into these things).

    I am not sure that I understand the rational behind Relational Analysis. Why Semantic Analysis is not enough? Why the relations between two words cannot be inferred from the LSA matrix?

  2. Great post (for the few who are into these things).

    Thanks. You must be near the tip of the long tail:

    http://en.wikipedia.org/wiki/Long_tail

    I am not sure that I understand the rational behind Relational Analysis. Why Semantic Analysis is not enough? Why the relations between two words cannot be inferred from the LSA matrix?

    If you want to understand the relation between a pair of words, such as dog:bark, then you need to look at sentences that contain both of these words together (“My dog made a loud bark”). It is not enough to look at sentences that contain only dog (“I took my dog for a walk”) and only bark (“I heard a bark”). LRA does the former and LSA does the latter.

    For more information, please read this paper:

    http://arxiv.org/abs/cs.CL/0608100

    If you still think LSA is enough, I would be happy to give you a copy of the SAT analogy questions, so that you can prove me wrong.

  3. Thanks. I see the difference. Of course, there is some weak dependency between the methods, if there are enough sentences that contain both dog and bark.

  4. The post made me think of a hierarchical model where first word-words are analyzed, then pair-patterns, then higher order relations. One could define an optimization model that did all these levels at once, but it would probably be more efficient to handle each layer independently.

Leave a Reply