You are here

The CQPweb family


CQPweb based corpus interfaces
(Please refer to for the relationship between CQPweb and CWB. In simple terms, at the back-end, concordancing in particular, is CWB, but CQPweb is more than that.)

CQPweb at Lancaster, UK, 71 corpora as of 20 August 2015 (maintained by Andrew Hardie) (English, Arabic, Chinese, Punjabi, Norwegian, Latin, Russian, Italian, Hindi etc.)
BNCweb at Lancaster University (English)

Corpora of biomedical and health literature (Neil Millar of (English)

BFSU CQPweb, 41 corpora as of 20 August 2015, the National Research Centre for Foreign Language Education, Beijing Foreign Studies University (maintained by Jiajin Xu and Liangping Wu) (English, Chinese, Japanese, Arabic, Icelandic, German, Spanish, Russian, etc.)
Department of English, The Hong Kong Polytechnic University, Hong Kong (English)
CQPweb at Huazhong Agricultural University, Wuhan, Hubei Province (English)
National Taiwan Normal University (NTNU) (English and Chinese)

Israel (Hebrew, English etc.)
CQPweb interface at MILA, Knowledge Center for Processing Hebrew, Technion Faculty of Computer Science, Technion City, Haifa and the Computational Linguistics Group (, the University of Haifa (see for a quick user's guide). The MILA CQP interface is a lightly modified version of CQPweb v3.0.

CQPweb at University for Foreigners Perugia (Perugia corpus - a reference corpus of written and spoken Italian and CAIL2 - a written learner corpus of Italian) (Italian)

The Maltese Language Resource Server (MLRS) at the Institute of Linguistics and the Department of Intelligent Computer Systems of the University of Malta (Corpus of Learner English in Malta and Korpus Malti) (English)

University of Lisbon

Universitat Autònoma de Barcelona,

CQPweb at the Zurich Center for Linguistics, Universität Zürich

Taner Sezer Turkish Corpus Server

CQPweb at Department of Linguistics, Georgetown University, 51 corpora as of 21 August 2015 (including BNC, Brown, ICE family, COCA, WaC family etc.)


CWB/CQP based corpus interfaces

IMS Corpus Workbench (CWB), University of Stuttgart, Institute for Natural Language Processing

IntelliText Corpus Queries, the Centre for Translation Studies (CTS) at the University of Leeds

Denmark (Danish, Portuguese, German, English, French, Spanish, Esperanto, Italian, Romanian, Swedish, Norwegian, Icelandic etc.)
CorpusEye at the Institute of Language and Communication (ISK) at the University of Southern Denmark (
KorpusDK (Danish)
at the Department for Digital Dictionaries and Corpora, Society for Danish Language and Literature, Copenhagen

Colibri², the German Grammar Group, Freie Universität Berlin, developed by Roland Schäfer (German, English, Spanish, Dutch, and Swedish)
ParaSol: A Parallel Corpus of Slavic and other languages, developed by Ruprecht von Waldenfels and hosted at the Humboldt University of Berlin
PolMine Corpus Server at Universität Duisburg-Essen,

Italy (Italian and English)
CQP based corpora at Corpus and Computational Linguistics Research Group, University of Bologna

Norway (Norwegian, French, etc.)
Corpora at the Text laboratory (e.g. The Corpus for Bokmål Lexicography LBK, The French Newspaper Corpus, Two Corpora with music reviews, NoWaC, SKRIV Corpus, The BigBrother Corpus, Corpus of Doctor-Patient Conversations from Ahus, Nordic Dialect Corpus, Norwegian in America, NoTa-Oslo, The Ruija Corpus, Talko, TAUS), The University of Oslo

Linguateca, AC/DC corpora, or Internet Access to Corpora: The AC/DC project, Oslo, Normay, Lisbon and other places.
Parallel corpora involving Portuguese

OPUS: The open parallel corpus at the Department of Linguistics and Philology, Uppsala University
Korp (Språkbanken)

Center for the Study of Language and Society, University of Berne
Roland Meyer, Ruprecht von Waldenfels, Michal Wozniak, Andreas Zeman (2006-2015): ParaVoz: A simple web interface for querying parallel corpora. Second Version. Bern, Regensburg, Berlin, Krakow.

Key references

Hardie, A. (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3): 380-409.

Christ, Oliver. 1994. A modular and flexible architecture for an integrated corpus query system. Proceedings of COMPLEX'94, 3rd Conference on Computational Lexicography and Text Research, Budapest, HUngary, July 7-9, pp. 23-32.

More information about the birth and evolvement of CQPweb (Cited from 'About CQPweb--Who did it' section of

"CQPweb was created by Andrew Hardie, Lancaster University, UK.
Most of the architecture, the look-and-feel, and even some snippets of code were shamelessly half-inched from BNCweb.
BNCweb's most recent version was written by Sebastian Hoffmann (University of Trier) and Stefan Evert (University of Osnabrück). It was originally created by Hans-Martin Lehmann, Sebastian Hoffmann, and Peter Schneider.

The underlying technology of CQPweb is manifold.
Concordancing is done using the IMS Corpus Workbench with its CQP corpus query processor. Thus the name. Other functions (collocations, corpus management etc.) are powered by MySQL databases.
The system uses Stefan Evert's Simple Query (CEQL) parser, which is written in Perl. The web-scripts are written in PHP. Some JavaScript is used to create interactive links and forms. The look-and-feel relies on Cascading Style Sheets plus good old fashioned HTML."

(Updated 26 August, 2015 by Jiajin Xu)

Belongs to: