Posted on November 1, 2008 by Peter Turney
There is a steady trickle of visitors to my post on Why Does SVD Improve Similarity Measurement?, so I gave this question a bit more thought. In that post, I offered three hypotheses about why SVD helps — high-order co-occurrence, latent meaning, and noise reduction — and I said that I didn’t know which hypothesis [...]
Filed under: Computational Linguistics, Semantics | Tagged: data analysis, SVD, text analysis | 6 Comments »
Posted on September 18, 2007 by Peter Turney
Recently I’ve been experimenting with algorithms for the Singular Value Decomposition and the Tucker Decomposition, with the goal of processing large matrices (more than 105 rows and columns) and large tensors (more than 104 rows, columns, and tubes) that are relatively sparse (about 10% density). The problem with matrices and tensors of this size is [...]
Filed under: Computational Linguistics, Computer Science, Philosophy of Mind | Tagged: data analysis, SVD, tensors, text analysis | 3 Comments »
Posted on July 24, 2007 by Peter Turney
For the last several months, I’ve been playing with tensors as an approach to data and text analysis. Here are some pointers to get started on tensors.
Tensors are a generalization of matrices to higher dimensions:
order 0 tensor = scalar
order 1 tensor = vector
order 2 tensor = matrix
order n > 2 tensor = higher order tensor
PARAFAC [...]
Filed under: Computational Linguistics, Computer Science, Semantics | Tagged: data analysis, SVD, tensors, text analysis | 4 Comments »
Posted on January 24, 2007 by Peter Turney
In response to my earlier post on Effects of High-Order Co-occurrences on Word Semantic Similarity, Tom Landauer sent me the following note:
You have given me an idea. Because I have just been asked again to review papers that say that the way LSA works is by indirect associations, it seems that few have seen my [...]
Filed under: Computational Linguistics, Semantics | Tagged: data analysis, SVD, text analysis | 2 Comments »
Posted on January 19, 2007 by Peter Turney
Comments on:
Benoît Lemaire and Guy Denhière
Effects of High-Order Co-occurrences on Word Semantic Similarity
PMI-IR estimates the semantic similarity between a pair of words by how frequently they co-occur within a certain window of text. This simple measure of similarity is surprisingly good at recognizing synonyms: it seems that synonyms often appear close together in text. [...]
Filed under: Computational Linguistics, Semantics | Tagged: LSA, PMI-IR, semantic similarity, SVD, text analysis | 4 Comments »
Posted on January 13, 2007 by Peter Turney
In Latent Semantic Analysis, we use a large collection of text to build a matrix, in which the rows represent words and the columns represent chunks of text. A chunk can be a sentence, a paragraph, a document, or any sequence of words. The value in a cell in the matrix is based on the [...]
Filed under: Computational Linguistics, Semantics | Tagged: analogy, LSA, semantic similarity, SVD | 5 Comments »