Posted on March 9, 2007 by Peter Turney
Measures of semantic distance (or, inversely, semantic relatedness) have many applications in Computational Linguistics. There are three basic approaches to measuring semantic distance: lexicon-based algorithms, corpus-based algorithms, and hybrids. In an otherwise excellent paper on lexicon-based measures, Budanitsky and Hirst criticize corpus-based measures. I discuss their criticisms here.
Filed under: Computational Linguistics, Semantics | Tagged: lexicons, corpora, semantic distance, semantic similarity | 7 Comments »
Posted on January 19, 2007 by Peter Turney
Comments on:
Benoît Lemaire and Guy Denhière
Effects of High-Order Co-occurrences on Word Semantic Similarity
PMI-IR estimates the semantic similarity between a pair of words by how frequently they co-occur within a certain window of text. This simple measure of similarity is surprisingly good at recognizing synonyms: it seems that synonyms often appear close together in text. [...]
Filed under: Computational Linguistics, Semantics | Tagged: text analysis, SVD, semantic similarity, LSA, PMI-IR | 4 Comments »
Posted on January 13, 2007 by Peter Turney
In Latent Semantic Analysis, we use a large collection of text to build a matrix, in which the rows represent words and the columns represent chunks of text. A chunk can be a sentence, a paragraph, a document, or any sequence of words. The value in a cell in the matrix is based on the [...]
Filed under: Computational Linguistics, Semantics | Tagged: analogy, SVD, semantic similarity, LSA | 4 Comments »