Commit Graph

  • eea8992024 Clean up hankcs 2022-03-22 18:04:36 -04:00
  • b74d4ff1db tok and dict_combine supports tokens containing spaces hankcs 2022-03-22 16:39:28 -04:00
  • a8734c1386 Check the type of dict_tags hankcs 2022-03-22 11:29:53 -04:00
  • 912fd38225 Test on ubuntu-latest, macos-latest, windows-latest hankcs 2022-03-11 17:04:06 -05:00
  • 3a353769c2 Update semeval16.md frank1998sj 2022-03-11 11:08:35 +08:00
  • 7924bb58a9 Revise documentation hankcs 2022-03-10 14:33:15 -05:00
  • 69506a7429 Segment 添加是否进行 Normalize 的配置方法 close https://github.com/hankcs/HanLP/issues/1714 hankcs 2022-03-08 15:58:37 -05:00
  • 4b43124fe2 将<>视作分隔符 fix https://bbs.hankcs.com/t/topic/4527 hankcs 2022-02-26 19:37:52 -05:00
  • 86d46514f5 Check edge cases that tok key is not presented in Document hankcs 2022-02-23 10:25:56 -05:00
  • 7b8d81e8e3 Fix edge cases on empty str hankcs 2022-02-23 10:23:11 -05:00
  • 8888ba6626 stop mirroring ernie weights hankcs 2022-02-23 09:33:35 -05:00
  • d994f8e35a Simplify context layer in span ranking SRL hankcs 2022-02-21 21:35:58 -05:00
  • f6e085a37b Revise documentation hankcs 2022-02-21 10:17:49 -05:00
  • 51b97e9bdf Portable同步升级到v1.8.3 hankcs 2022-02-20 23:38:55 -05:00
  • 473776687c Merge branch '1.x' into portable hankcs 2022-02-20 23:38:25 -05:00
  • 8e750ee621 修复动态自定义词典与CustomDictionaryForcing的搭配问题 fix https://github.com/hankcs/HanLP/issues/1712 v1.8.3 hankcs 2022-02-20 23:36:38 -05:00
  • 2f796dfd35 删除几个“名+名词” hankcs 2021-12-31 15:38:56 -05:00
  • 19f0331516 Update COARSE_ELECTRA_SMALL_ZH hankcs 2022-02-20 01:38:08 -05:00
  • 1034c6a99b Support traditional Chinese tok, pos, ner, dep, con, srl hankcs 2022-02-17 10:24:49 -05:00
  • 3df547533b Release a dep model trained on PKU Multi-view Chinese Treebank (PMT) hankcs 2022-02-16 10:06:25 -05:00
  • 8c8e5731bb Upgrade tok, ner, con to support traditional Chinese hankcs 2022-02-15 20:57:48 -05:00
  • 9f0a92b8cb Add extra transform to SRL components hankcs 2022-02-15 17:03:25 -05:00
  • ab6dab37e5 Support conversion from Penn Treebank to Universal Dependencies hankcs 2022-02-15 16:40:17 -05:00
  • b6dcb2a9b9 Release a pos model with radical embeddings hankcs 2022-02-15 12:26:04 -05:00
  • 5ad3242306 Release scripts for PKU Multi-view Chinese Treebank (PMT) 1.0 hankcs 2022-02-15 04:48:52 -05:00
  • d3a9c848ba Implement extra features for Transformer tagger hankcs 2022-02-14 16:04:04 -05:00
  • c7e2b6210c Revise documentation hankcs 2022-02-10 15:11:07 -05:00
  • ef6ba7e3fa Fix invalid escape sequence hankcs 2022-02-08 16:00:34 -05:00
  • 809b21d904 Release a SDP model hankcs 2022-02-08 15:48:42 -05:00
  • 9e5de0a446 Revise documentation hankcs 2022-02-08 15:48:26 -05:00
  • ffdf1eca8a Add tokenizer_config.json to ernie-gram mirror hankcs 2022-02-07 10:41:43 -05:00
  • be601433ce Release a small fine-grained tok model hankcs 2022-02-05 19:06:19 -05:00
  • f297d22149 Support mengzi PLMs hankcs 2022-02-05 18:56:34 -05:00
  • f44e1329c4 Fix offset mapping in transformer_tokenizer hankcs 2022-02-05 13:25:48 -05:00
  • ca56508434 Rename to CTB9_DEP_ELECTRA_SMALL hankcs 2022-02-05 00:13:32 -05:00
  • de161d084c Add a conll=True parameter to parsers hankcs 2022-02-05 00:07:25 -05:00
  • 923d91186f Rename to CTB9_CON_ELECTRA_SMALL for clarity hankcs 2022-02-04 23:22:19 -05:00
  • 3af1ba5e8d Add a conll=True parameter to parsers hankcs 2022-02-04 23:19:54 -05:00
  • 20cef743cf Improve spelling checking hankcs 2022-02-04 23:17:51 -05:00
  • 997a2e5ec8 Release a CTB9 dep model hankcs 2022-02-04 22:20:00 -05:00
  • e4039c0f23 Release a CTB9 tok model hankcs 2022-02-04 18:57:12 -05:00
  • 2c78f64209 Remove experimental StructuralAttentionModel hankcs 2022-02-01 19:59:18 -05:00
  • 7f241465b9 Allow the user to disable IPYTHON hankcs 2022-02-01 15:58:00 -05:00
  • fbdb4e2ad0 Fix visualization html in Jupyter hankcs 2022-02-01 15:29:07 -05:00
  • 1d19d7412d Revise documentation hankcs 2022-02-01 11:49:42 -05:00
  • 36437b2a19 Add batch_size to most_similar hankcs 2022-01-31 19:43:59 -05:00
  • 171be441af Add version info into word2vec hankcs 2022-01-31 17:04:20 -05:00
  • cefbd4f50e Block special tokens from the output of MLM hankcs 2022-01-31 11:05:26 -05:00
  • 42de9b6e0b Mirror Chinese word vectors from https://github.com/Embedding/Chinese-Word-Vectors hankcs 2022-01-30 22:59:53 -05:00
  • 4fc0537a5a Enable word2vec to load arbitrary txt vector files hankcs 2022-01-30 22:01:15 -05:00
  • 92a4e8cf25 Fix unk in word2vec hankcs 2022-01-30 21:29:37 -05:00
  • 3589f0aa9d Improve typing for save_json hankcs 2022-01-30 19:08:44 -05:00
  • 190fc31134 Revise documentation hankcs 2022-01-30 18:18:47 -05:00
  • b9af710b62 Implement Masked Language Model for filling blank hankcs 2022-01-30 14:34:20 -05:00
  • d88ce5b61f Implement most_similar for word2vec hankcs 2022-01-30 14:33:35 -05:00
  • 81ccd122cb Revise documentation hankcs 2022-01-26 20:32:01 -05:00
  • 00eaae9789 Guide the offline user to https://hanlp.hankcs.com/docs/install.html#server-without-internet instead of repeating questions hankcs 2022-01-26 20:26:45 -05:00
  • ca76dc60e1 Revise documentation hankcs 2022-01-26 19:28:50 -05:00
  • 006c323750 Check version conflicts for some careless users hankcs 2022-01-26 18:52:51 -05:00
  • 52337a1c95 Revise documentation hankcs 2022-01-25 20:53:25 -05:00
  • ed9066ae28 Release a state-of-the-art AMR model for English hankcs 2022-01-25 12:05:17 -05:00
  • f2d7b3e647 Revise documentation hankcs 2022-01-18 19:53:57 -05:00
  • 8aab9edde6 Improve the __repr__ of Pipe hankcs 2022-01-18 18:57:17 -05:00
  • 17fb9f7cd1 Release a CTB9 pos model hankcs 2022-01-18 18:47:44 -05:00
  • dcd70cdb62 Improve pipeline inputs hankcs 2022-01-18 18:47:27 -05:00
  • 096d780e30 Improve constituency tree visualization hankcs 2022-01-18 18:44:51 -05:00
  • 7c09e39868 Release two constituency models hankcs 2022-01-18 11:28:16 -05:00
  • 6c02812969 Fix loading legacy NgramConvTokenizer components: https://bbs.hankcs.com/t/topic/4440 hankcs 2022-01-15 12:35:00 -05:00
  • e8044b27ae Fix tf memory leak: https://github.com/tensorflow/tensorflow/issues/37653#issuecomment-1000517720 v2.1.0-beta.0 hankcs 2021-12-28 22:05:32 -05:00
  • 1cbcdfea98 Enrich the SpanF1 metric hankcs 2021-10-26 19:46:16 -04:00
  • 6df38e33fa Beta Launch hankcs 2021-12-28 21:14:50 -05:00
  • 0a3a9b506a Make the pipeline API compatible with both TensorFlow and PyTorch backends hankcs 2021-12-28 21:04:09 -05:00
  • bf5cf5663f Fix en pipeline hankcs 2021-12-28 19:30:21 -05:00
  • 9d292be2f3 Rename some modules hankcs 2021-12-28 19:26:53 -05:00
  • b461e23b61 Separate resource from datasets and group tf components together with their torch siblings hankcs 2021-12-28 19:24:07 -05:00
  • 195071b49a Improve logging on empty data file hankcs 2021-12-27 17:26:55 -05:00
  • d9765845cf Revise documentation hankcs 2021-12-27 13:37:29 -05:00
  • 8a55380b7e Implement a simple component for fasttext such that it can be loaded using hanlp.load hankcs 2021-12-27 13:31:49 -05:00
  • ca5e5cddd0 Remove dependency on bert-for-tf2 hankcs 2021-12-26 23:27:30 -05:00
  • b88049ea83 Remove dependency on alnlp hankcs 2021-12-26 19:32:37 -05:00
  • 283e0052ce Clean up parsers that are not interesting hankcs 2021-12-26 18:59:40 -05:00
  • 9f6bfaaa18 Revise documentation hankcs 2021-12-20 12:55:02 -05:00
  • 83db2ea0b3 Implement a simple component for word2vec such that it can be loaded using hanlp.load hankcs 2021-12-12 23:19:13 -05:00
  • 46a838fa58 Clean up hankcs 2021-12-12 18:24:39 -05:00
  • 9060034d98 Translate documents to Japanese doc-ja hankcs 2021-12-08 19:22:45 -05:00
  • 119116b510 Use Token as the header for NER/SRL/CON hankcs 2021-12-01 23:39:57 -05:00
  • 0a3f3f74b8 Improve hints for downloading hankcs 2021-11-30 14:56:37 -05:00
  • 67202fe927 Revise documentation hankcs 2021-11-16 23:56:29 -05:00
  • 2de961b246 Merge pull request #1699 from TITC/patch-1 hankcs 2021-12-07 12:42:53 -05:00
  • d34dab3aa5 Update DoubleArrayTrie.java Yuhang Tao 2021-12-07 21:45:56 +08:00
  • d8835821ba Requires keras==2.6.0 fix https://github.com/hankcs/HanLP/issues/1693 hankcs 2021-11-07 14:27:34 -05:00
  • 5d4b4013e7 Logging TensorFlow version on exceptions hankcs 2021-11-07 13:15:01 -05:00
  • bf3da0394d Update EMNLP21 paper: https://aclanthology.org/2021.emnlp-main.451/ hankcs 2021-11-06 18:14:21 -04:00
  • 6cff68915f 根据总词频动态决定未登录词的默认词频 hankcs 2021-11-04 23:21:17 -04:00
  • 756fd157fe Revise documentation hankcs 2021-11-04 21:29:49 -04:00
  • c36dbaa260 Accelerating on Apple Silicon M1 chips hankcs 2021-11-04 04:41:56 -04:00
  • e658a88ba2 Enhance tags before longest prefix matching hankcs 2021-11-01 09:57:23 -04:00
  • 439711ef57 Revise documentation hankcs 2021-10-28 01:03:48 -04:00
  • f02ac9b57e pos and ner now support conditional-matching custom dict called dict_tags hankcs 2021-10-27 20:11:54 -04:00
  • 9705b1f263 Trie supports matching on Tuple[str] now hankcs 2021-10-27 19:50:08 -04:00