Lexicons versus Corpora for Measures of Semantic Distance

Measures of semantic distance (or, inversely, semantic relatedness) have many applications in Computational Linguistics. There are three basic approaches to measuring semantic distance: lexicon-based algorithms, corpus-based algorithms, and hybrids. In an otherwise excellent paper on lexicon-based measures, Budanitsky and Hirst criticize corpus-based measures. I discuss their criticisms here.

Effects of High-Order Co-occurrences on Word Semantic Similarity

Comments on:
Benoît Lemaire and Guy Denhière
Effects of High-Order Co-occurrences on Word Semantic Similarity
PMI-IR estimates the semantic similarity between a pair of words by how frequently they co-occur within a certain window of text. This simple measure of similarity is surprisingly good at recognizing synonyms: it seems that synonyms often appear close together in text. [...]

Unified Latent Analysis

In Latent Semantic Analysis, we use a large collection of text to build a matrix, in which the rows represent words and the columns represent chunks of text. A chunk can be a sentence, a paragraph, a document, or any sequence of words. The value in a cell in the matrix is based on the [...]