Here is a list of machine readable resources which are publicly available and possibly useful for the task.
Language | Name | Type | Author | Description | Free? |
---|---|---|---|---|---|
JA | Textual Entailment Evaluation Data | Collection of labeled entailment pairs | Kyoto University | An evaluation dataset with 2700 labeled textual entailment pairs. Each pair comes with a 4-level label {◎,◯,△,×} indicating the likelihood of entailment, and another label for one of 5 categories {implication, lexicon (noun), lexicon (verb), syntax, inference}. | Yes |
JA | Japanese WordNet | Lexical DB | NICT | Added Japanese equivalents to synsets of the Princeton WordNet 3.0. There are 56,741 concepts (synsets) and 92,241 words available as of v1.0. Demo is also available | Yes |
JA | Wikipedia hypernym-hyponym pairs from Hyponymy extraction tool | Ontology | NICT | This tool can extracts about 6 million pairs of hypernym-hyponym and category-instance from Japanese Wikipedia dump, in 90% accuracy. | Yes |
JA | 京都大学格フレーム(Kyoto Univ Case Frame) | Frame Dict | Kurohashi Lab, Kyoto University | Case frame dictionary automatically built from the web text. Search UI available here. | Yes |
JA | 単語感情極性対応表 (Semantic Orientations of Words) | Polarity Weighted Word List | Okumura Lab, Titech | List of words with semantic orientation value ranging between -1 and +1. E.g. “great” with +1 and “painful” with -1. | Yes |
JA | EDR電子化辞書(The EDR Electronic Dictionary) | Lexical DB | NICT | Japanese General Vocabulary with 270,000 words and corresponding 410,000 concepts and many more. | No |
JA, CS, CT | Wikipedia | Encyclopedia | Free encyclopedia. | Yes | |
JA | 日本語語彙大系(GoiTaikei) | Lexical DB | NTT | It contains 300,000 Japanese words marked with patr-of-speech and semantic classes, originally developed for the ALT-J/E Japanese-to-English machine translation system by NTT | No |
JA | 分類語彙表(Bunrui Goihyo) | Lexical DB | 国立国語研究所 | No | |
JA | 動詞含意関係データベース(Entailment Verb DB) | Lexical DB | ALAGIN | Large-scale Japanese verb phrase pairs consisting of 52,689 positive examples (pairs entailing) and 68,819 negative examples (pairs not entailing). This resource is available for ALAGIN members only (a member needs to be a resident of Japan). | Yes |
CS | 知网(HowNet) | Lexical DB | Dong Zhendong & Dong Qiang | Static demo available. Must submit an agreement form to download and use it. | Yes (conditional) |
CS | 同义词词林(TongYiCi CiLin) | Lexical DB | 梅家驹,竺一鸣,高蕴琦等编. 上海辞书出版社. 1983. | Thesaurus of synonyms and antonyms. | ? |
CS | 哈工大《同义词词林》共享版的若干改进 | Lexical DB | 哈工大 | Improved version of TongYiCi CiLin. | Yes |
CT | BOW | Lexical DB | Academia Sinica | 本資料庫以英文WordNet架構為基礎,並以以台灣地區的語言使用為經驗基礎。 | ? |
JA: Japanese, CS: Simplified Chinese, CT: Traditional Chinese
Other resources to be added to the table soon: OpenMWE for Japanese, IPAL dictionary, 動詞項構造シソーラス, 基本語データベース:語義別単語親密度, つつじ:日本語機能表現辞書, and some Chinese data listed in CNLP Platform