You are here

Tools

Corpus tools developed by members of BFSU Corpus Research Group

(北外语料库团队成员开发的相关免费语料库软件)

Concordancers and query tools (语料库检索工具)

  • BFSU PowerConc 1.0 beta 21c: A freeware concordancer for Windows (1.8MB).
  • BFSU PowerConc 1.0 beta25b.NEW
  • BFSU CQPweb online concordancer (download CQPweb tutorial here.)
  • BFSU ParaConc 1.2.2: A freeware parallel concordancer (4.6MB)
  • Colligator 2.0: A colligation query and analysis tool (1.4MB)
  • SearchSubtitle: A programme for video based time-aligned subtitle concordancing (Chinese user interface). The tool was designed by Wenzhong Li and programmed by Zhaoyang Han (533KB).
  • PatCount 1.0: PatCount is the abbreviated form of 'pattern counting'. It is a query tool of counting the frequency of lexical, syntactic, and discoursal features in texts. The result of the tool is shown and can be exported as 'feature(s) x text(s)' matrices, which is most suitable for follow-up advanced (inferential) statistical analyses. Regular expressions are fully supported in the tool. Microsoft .Net framework is required before you run the tool. The tool was designed by Maocheng Liang and Wenxin Xiong and programmed by Wenxin Xiong (3.6MB).

Annotation tools (语料库标注工具)

Statistical tools for corpus analyis (语料库统计工具)

Specialised corpus tools (语料库分析专用工具)

  • BFSU Collocator (835KB) is a search-based collocation extration tool which yields MI, MI3, T-score, Z-score, Log-Log and Log likelihood scores of collocational strength. The tool works will raw and CLAWS-tagged PoS English texts, and does not work for texts of Chinese or other languages.
  • BFSU English Sentence Segmenter 1.1 (447KB)
  • Concordance Randomizer 1.2 (531KB)
  • The Edinburgh Associative Thesaurus query tool (EAT) (3.9MB)
  • Keywords Plus 1.0 (1.87 MB) (an earlier release of Keywords Plus tool in which the resulting keywords are linked to their original concordance lines. This feature has not been retained in version 2.)
  • Keywords Plus 2.0 (5.67 MB) (a free keyword generation tool based on the comparison of two corpora or wordlists. The tool is helpful of creating Chinese and English keyword lists, key ngram lists and key POSgram lists.)
  • Pattern Builder (7.2 MB) is an aid for those who are not familiar with regular expressions in searching PoS-tagged English texts.
  • Readability Analyzer 1.0 (1.1MB): A tool which yields Readability indices, type/token ratio (TTR), standandarised type/token ratio (STTR), lemmatised TTR, lemmatised STTR, average word length, average sentence length, etc.
  • Readability Analyzer 1.1. This version fixes the bug that some users can't save the results. A known issue for the current build is that the functionalities from Reading Ease to AWL do not work properly.
  • Sub-corpus Creator (2.9MB): Sub-corpora can be extracted based on the text strings contained in filenames of texts or in-text metadata markup.
  • Text Cleaning Library for PowerGREP (5.5KB)
  • TextSmith Tools (6.1MB): This tool showcases a methodological innovation of a genre-informed phraseological profile across the discourse segments. TextSmith segments texts by equal proportion, based on the users’ own intuitive estimation of the sections the imported texts might contain.

Data driven learning tools and resources (数据驱动学习工具)

  • BFSU Sentence Collector is a pedagogically motivated concordancing tool which allows users to refine search results according to sentence length and lexical difficulty. The results of the tool are displayed in complete sentences instead of the KWIC mode. To customise your own textual data for text collection. Please first of all segment the English texts on your own hard drive with BFSU Sentence Segmenter 1.0, and then mark up the unknown/new words based on a base word list with BFSU NewWords Marker 1.0 and save the data as an *.idx file into the index folder of BFSU Sentence Collector.NEW

Useful tools and resources that were not developed by BFSU CRG members