5X社区

Corpus Resources

Corpora are electronic bodies of linguistic data (texts) that linguists extract (isolate from their larger texts) and concordance (align by keyword) to generate natural language samples for term, phrase or syntax modeling.

Corpora can help translators empirically verify their intuitions about sense, connotation and near-synonymy, show patterns of actual frequencies or potential language use, reveal the lexical density of a text (particularly in translation research), identify semantic prosodies (connotations) and semantic preferences (the 鈥渃lustering鈥 of words around certain poles of meaning), and assist in overcoming imperfect overlap in collocational ranges across languages. Hatim and Munday (2005) map corpora in translation use as an interface with the language engineering discipline. Customized corpora may be generated with leasable software, while 鈥渇ound鈥 corpora鈥攕ome in the multimillions of words鈥攁re available on web-based concordancing sites.