Commit Graph

  • 91ec9fed80 Add a regression component that can directly load HF models hankcs 2023-02-22 13:50:09 -05:00
  • 52b3d2af57 Training script for UD-MTL hankcs 2023-02-21 19:42:47 -05:00
  • 36aace0d7f Add a classifier that can directly load HF models hankcs 2023-02-17 19:26:25 -05:00
  • 9d0f8804a4 Release abstractive_summarization APIs hankcs 2023-02-07 12:57:46 -05:00
  • 1323221c38 修复自定义词典路径传参 fix: https://github.com/hankcs/HanLP/issues/1799 hankcs 2023-01-13 10:02:08 -05:00
  • 4a7dd629b4 Upgrade Jackson Databind hankcs 2022-12-10 01:20:46 -05:00
  • fbfc8fb230 Add support for Python 3.10 hankcs 2022-12-07 16:33:47 -05:00
  • 651acff985 Test on ubuntu-20.04, macos-latest, windows-latest hankcs 2022-12-07 16:28:42 -05:00
  • a8ac5e188e Implementation of "Graph Pre-training for AMR Parsing and Generation" hankcs 2022-12-06 22:50:19 -05:00
  • 266a7832d2 Add dependency on sentencepiece hankcs 2022-11-04 06:25:08 -04:00
  • 5d46c4b9e0 Improve log hankcs 2022-11-04 06:17:21 -04:00
  • 188604d8c3 Test on ubuntu-latest, macos-latest, windows-latest hankcs 2022-11-03 19:46:52 -04:00
  • 025f964807 Improve how MTL handles empty strings hankcs 2022-11-02 23:16:52 -04:00
  • 7f7d7b62b1 Fix printing dummy root constituent hankcs 2022-11-02 23:13:02 -04:00
  • dd989cca7a allow for zero length dataset hankcs 2022-11-02 22:58:55 -04:00
  • 863aa36bb8 Remove PKU98_POS_ELECTRA_SMALL as it is replaced by PKU_POS_ELECTRA_SMALL hankcs 2022-10-06 14:24:12 -04:00
  • 8a4af18bd2 Revise documentation hankcs 2022-09-28 20:02:31 -04:00
  • 5ba95eabc9 Release language identification APIs which can recognize 176 languages hankcs 2022-09-28 17:38:18 -04:00
  • 125b2b01e0 Revise documentation hankcs 2022-09-15 11:41:14 -04:00
  • b165273646 自定义词典支持.tsv格式 fix: https://github.com/hankcs/HanLP/issues/1785 hankcs 2022-09-15 16:00:23 -04:00
  • 87a5e9b9fc add TSL cert verify switch to support network env behind private TSL gateway Hu Dengke 2022-09-15 20:24:03 +08:00
  • 01331888c5 Make empty string consistent between STL and MTL: https://github.com/hankcs/HanLP/issues/1778 hankcs 2022-08-26 21:02:53 -04:00
  • 9210d5aa78 Fix empty string tokens in TransformerSequenceTokenizer fix: https://github.com/hankcs/HanLP/issues/1778 hankcs 2022-08-26 02:56:48 -04:00
  • 76bde7817d Script to train SOTA PKU CWS hankcs 2022-08-24 12:53:45 -04:00
  • b216b242e1 Merge pull request #1775 from carl10086/bugfix/bintrie_parsetext hankcs 2022-08-12 01:30:27 -04:00
  • 3dddcfe82a Revise documentation hankcs 2022-08-12 01:06:58 -04:00
  • 551d578ab5 bugfix: 修复 bintrie 树全分词时 提前跳出循环 bug carl10086 2022-08-12 00:15:44 +08:00
  • ea17ae2283 Fix tokenizer evaluation during training fix: https://github.com/hankcs/HanLP/issues/1773 hankcs 2022-08-11 02:08:13 -04:00
  • 866d8a639a Ask users to read the doc when they try to set dict for a MTL component hankcs 2022-08-10 22:21:43 -04:00
  • 8fe00ad4c5 Update MUL-MTL model with SDP fixed hankcs 2022-07-31 16:51:34 -04:00
  • 6d1d6a0f19 Release grammatical_error_correction APIs hankcs 2022-07-29 19:11:29 -04:00
  • c3d90154c3 Disable MPS on M1 due to its poor robustness hankcs 2022-07-19 17:57:09 -04:00
  • 75c95b9fbe Update the SDP model hankcs 2022-07-19 17:22:17 -04:00
  • 342481b245 Fix sdp that root doesn't get learnt hankcs 2022-07-18 19:54:39 -04:00
  • 4339b09a29 avoid cycle between a pair of nodes hankcs 2022-07-18 18:57:48 -04:00
  • 6dbd5a8e42 Revise documentation hankcs 2022-07-15 20:20:19 -04:00
  • 2a628ead79 Fix decompression on Windows fix: https://github.com/hankcs/HanLP/issues/1757 hankcs 2022-07-07 01:39:34 -04:00
  • 5d14517cf5 Fix cases that a single char gets split into multiple subtokens fix: https://github.com/hankcs/HanLP/issues/1756 fix: https://github.com/hankcs/HanLP/issues/1754 hankcs 2022-07-05 21:46:07 -04:00
  • eca5f99e58 Improve pretty_print style hankcs 2022-07-02 16:01:20 -04:00
  • 08724dd82e Improve helper functions for Document hankcs 2022-06-28 21:58:16 -04:00
  • 7a71ceedeb Release a small MTL model trained on our new corpora hankcs 2022-06-26 17:58:57 -04:00
  • 5c53c38f40 Release mMiniLMv2L12 version of MTL on UD210 hankcs 2022-06-21 19:03:20 -04:00
  • 16e32af76d Revise documentation hankcs 2022-06-19 10:23:59 -04:00
  • 3d01174e87 Replace XLM_SMALL with MMINILMV2L6 hankcs 2022-06-19 09:08:50 -04:00
  • 9c8b620df5 Fix transformer tokenizer on CIMERLI™ hankcs 2022-06-17 23:49:24 -04:00
  • 9d9f45c14f Release multilingual tokenizers trained with MiniLMv2 hankcs 2022-06-17 02:32:55 -04:00
  • c48049287e Prepare to retire SUBWORD_ENCODING_CWS hankcs 2022-06-16 10:10:46 -04:00
  • 3a3b246d62 Release mMiniLMv2 with spaces pruned hankcs 2022-06-16 09:57:40 -04:00
  • 7616dbb4cf Update two tok models trained on 100m corpora hankcs 2022-06-15 23:24:53 -04:00
  • 4fbd69bf82 Release a multilingual MTL model trained with MiniLMv2 hankcs 2022-06-15 22:13:26 -04:00
  • 3e0d16eb0b Update UD_TOK_XLM_SMALL model hankcs 2022-06-15 21:37:58 -04:00
  • d16773de6b Revise documentation hankcs 2022-06-15 20:43:33 -04:00
  • 65b1e58044 transformer_layers means number of bottom layers hankcs 2022-06-15 18:07:05 -04:00
  • 18275b54f4 Expose only split_sentence hankcs 2022-06-15 16:19:49 -04:00
  • c9317aeb80 Fix edge cases in split_sentence hankcs 2022-06-15 16:16:50 -04:00
  • 584ce7e5ed Release a multilingual tokenizer trained with MiniLMv2 hankcs 2022-06-14 20:14:02 -04:00
  • 52cf3b5ecc Activate dict_force in load hankcs 2022-06-14 20:13:55 -04:00
  • df8308af42 Release xlm-roberta-small-no-space which has spaces pruned hankcs 2022-06-13 12:02:44 -04:00
  • 044156a6dc Revise documentation hankcs 2022-06-12 14:07:20 -04:00
  • ee3d178fd3 Update two tok models with F1 > 98% hankcs 2022-06-12 11:43:39 -04:00
  • 606fe2fb3c Release xlm-roberta-base-no-space which has spaces pruned hankcs 2022-06-10 23:52:34 -04:00
  • 17492c1edb Fix pruning using max_seq_len hankcs 2022-06-10 23:52:13 -04:00
  • 9b1ed200b8 Support eval_trn to speed up training hankcs 2021-06-02 20:48:14 -04:00
  • 1c474e3f1f Revise documentation hankcs 2022-06-09 21:58:16 -04:00
  • e1c07005b0 max_sequence_length of TransformerEncoder defaults to max_position_embeddings hankcs 2022-06-09 20:07:43 -04:00
  • 646ca7d57e Support 130 languages trained on Universal Dependencies 2.10 hankcs 2022-06-08 01:15:25 -04:00
  • c78515c872 Fix offset generated with dict_force hankcs 2022-06-07 22:07:37 -04:00
  • 924c768ba7 Support accelerated PyTorch on macOS M1 chips: https://www.hankcs.com/nlp/hanlp-official-m1-support.html hankcs 2022-06-06 23:22:33 -04:00
  • 8721373f15 Support Universal Dependencies 2.10 hankcs 2022-05-18 22:25:45 -04:00
  • 68e6527dfd Improve error log hankcs 2022-05-12 22:37:49 -04:00
  • 672d662c9a Deprecated length_field. Since the memory consumption is dominated by encoders, input_ids is always the field that determines the length of a sample. hankcs 2022-05-11 18:07:16 -04:00
  • 11113b8027 Release MSR_TOK_ELECTRA_BASE_CRF model hankcs 2022-05-07 19:50:55 -04:00
  • 3fb16ccc0a Revise documentation hankcs 2022-05-04 20:23:24 -04:00
  • 65058e4033 Release RESTful extractive_summarization APIs hankcs 2022-05-04 11:51:00 -04:00
  • c371be1719 Release two Electra base tok models trained on CTB9 hankcs 2022-04-26 11:34:24 -04:00
  • 94b88c0997 Warn the user that only zh supports coarse tokenization hankcs 2022-04-24 23:10:15 -04:00
  • 2d5aba2b09 Improve the robustness of SRL visualization hankcs 2022-04-20 19:53:42 -04:00
  • eb35d12533 Revise documentation hankcs 2022-04-20 12:55:08 -04:00
  • 7af95780e7 Fix the len of trie fix: https://github.com/hankcs/HanLP/issues/1728 hankcs 2022-04-30 14:46:01 -05:00
  • 77217d5e3e Fix output_spans with dict_combine fix: https://github.com/hankcs/HanLP/issues/1727 hankcs 2022-04-20 10:04:05 -04:00
  • 396568c355 Give PadSequenceDataLoader the option to skip padding hankcs 2022-04-19 13:59:12 -04:00
  • 53179223de Fix fasttext URL in PTB_POS_RNN_FASTTEXT_EN hankcs 2022-04-18 10:20:06 -04:00
  • 86c68657e3 Revise documentation hankcs 2022-04-16 16:06:13 -04:00
  • e52dc9f4d0 Fix matching issue caused by dict_force in e7eb64b05b fix: https://github.com/hankcs/HanLP/issues/1722 hankcs 2022-04-16 16:06:13 -04:00
  • d90717d6a1 Revise documentation hankcs 2022-04-16 15:37:29 -04:00
  • 95f89563c5 Release RESTful keyphrase_extraction APIs hankcs 2022-04-15 23:48:45 -04:00
  • 9b3a786ea5 Release a Chinese MRP model with Mengzi PLM hankcs 2022-04-15 12:29:06 -04:00
  • 15bb02f3d5 Fix edge cases of empty inputs for MTL hankcs 2022-04-14 11:50:39 -04:00
  • ea11f96778 Use the latest perin-parser hankcs 2022-04-14 00:28:29 -04:00
  • 19eb659fee Release RESTful abstract_meaning_representation APIs hankcs 2022-04-13 02:45:59 -04:00
  • 26ff093c0a Release a SOTA joint Chinese-English AMR model hankcs 2022-04-12 22:41:09 -04:00
  • a808b0fa27 Fix training CRF in TaggingNamedEntityRecognition: https://bbs.hankcs.com/t/topic/4132/3?u=hankcs hankcs 2022-04-11 23:42:07 -04:00
  • 867cc8da53 修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: https://github.com/hankcs/HanLP/issues/1718 hankcs 2022-04-09 13:40:25 -04:00
  • 342a2b272b Revise documentation hankcs 2022-04-07 23:59:21 -04:00
  • 740e37d4d1 Add language parameter to hanlp_restful.HanLPClient.__call__ hankcs 2022-04-06 19:20:40 -04:00
  • dd18590d4e Improve visualization of constituency tree hankcs 2022-04-01 18:40:39 -04:00
  • cb6ee6c167 Release an ERNIE-GRAM constituency model hankcs 2022-03-31 12:50:29 -04:00
  • 4467a3c88f Optimize merging sub-tokens hankcs 2022-03-22 19:10:59 -04:00
  • e843a4e66a Test on ubuntu-latest, macos-latest, windows-latest hankcs 2022-03-22 18:51:44 -04:00
  • e7eb64b05b Let dict_force match original text directly hankcs 2022-03-22 18:15:37 -04:00