HanLP

hankcs/HanLP

Fork 0

mirror of https://github.com/hankcs/HanLP.git synced 2026-04-02 16:58:28 +00:00

Commit Graph

Select branches

Hide Pull Requests

1.x

dev

doc-ja

doc-zh

master

portable

#1040

#1044

#1071

#1104

#1194

#1195

#1199

#12

#1203

#1226

#1252

#1259

#1286

#1312

#1313

#132

#132

#134

#134

#1346

#1360

#1365

#139

#14

#1426

#1428

#1438

#1439

#1497

#1541

#1543

#1556

#1579

#1579

#1591

#1593

#1595

#1595

#1596

#1608

#1637

#1638

#1639

#1640

#1656

#1674

#1675

#1681

#1682

#1684

#1699

#171

#1715

#1716

#1775

#1779

#1786

#1795

#1805

#1817

#1835

#1853

#1858

#1861

#1869

#1879

#1889

#1889

#1892

#1892

#1898

#1899

#1922

#1931

#1957

#1957

#1961

#20

#20

#213

#219

#254

#254

#262

#268

#278

#291

#297

#327

#35

#36

#39

#39

#40

#41

#463

#47

#47

#48

#482

#490

#490

#508

#509

#524

#524

#532

#534

#534

#535

#535

#547

#548

#559

#560

#569

#569

#573

#578

#589

#592

#597

#608

#623

#624

#634

#634

#637

#637

#641

#647

#728

#728

#77

#78

#782

#788

#788

#79

#793

#793

#80

#801

#81

#811

#819

#827

#83

#837

#840

#840

#841

#845

#850

#86

#861

#861

#88

#938

#938

#941

#945

#945

#954

v1.0.0

v1.1.0

v1.1.1

v1.1.2

v1.1.3

v1.1.4

v1.1.5

v1.2.0

v1.2.1

v1.2.10

v1.2.11

v1.2.2

v1.2.3

v1.2.4

v1.2.5

v1.2.6

v1.2.7

v1.2.8

v1.2.9

v1.3.0

v1.3.1

v1.3.2

v1.3.3

v1.3.4

v1.3.5

v1.4.0

v1.5.0

v1.5.1

v1.5.2

v1.5.3

v1.5.4

v1.6.0

v1.6.1

v1.6.2

v1.6.3

v1.6.4

v1.6.5

v1.6.6

v1.6.7

v1.6.8

v1.7.0

v1.7.1

v1.7.2

v1.7.3

v1.7.4

v1.7.5

v1.7.6

v1.7.7

v1.7.8

v1.8.0

v1.8.1

v1.8.2

v1.8.3

v1.8.4

v1.8.5

v1.8.6

v2.0.0-alpha.0

v2.1.0

v2.1.0-alpha.0

v2.1.0-beta.0

v2.1.0-beta.62

v2.1.1

91ec9fed80 Add a regression component that can directly load HF models hankcs 2023-02-22 13:50:09 -05:00
52b3d2af57 Training script for UD-MTL hankcs 2023-02-21 19:42:47 -05:00
36aace0d7f Add a classifier that can directly load HF models hankcs 2023-02-17 19:26:25 -05:00
9d0f8804a4 Release abstractive_summarization APIs hankcs 2023-02-07 12:57:46 -05:00
1323221c38 修复自定义词典路径传参 fix: https://github.com/hankcs/HanLP/issues/1799 hankcs 2023-01-13 10:02:08 -05:00
4a7dd629b4 Upgrade Jackson Databind hankcs 2022-12-10 01:20:46 -05:00
fbfc8fb230 Add support for Python 3.10 hankcs 2022-12-07 16:33:47 -05:00
651acff985 Test on ubuntu-20.04, macos-latest, windows-latest hankcs 2022-12-07 16:28:42 -05:00
a8ac5e188e Implementation of "Graph Pre-training for AMR Parsing and Generation" hankcs 2022-12-06 22:50:19 -05:00
266a7832d2 Add dependency on sentencepiece hankcs 2022-11-04 06:25:08 -04:00
5d46c4b9e0 Improve log hankcs 2022-11-04 06:17:21 -04:00
188604d8c3 Test on ubuntu-latest, macos-latest, windows-latest hankcs 2022-11-03 19:46:52 -04:00
025f964807 Improve how MTL handles empty strings hankcs 2022-11-02 23:16:52 -04:00
7f7d7b62b1 Fix printing dummy root constituent hankcs 2022-11-02 23:13:02 -04:00
dd989cca7a allow for zero length dataset hankcs 2022-11-02 22:58:55 -04:00
863aa36bb8 Remove PKU98_POS_ELECTRA_SMALL as it is replaced by PKU_POS_ELECTRA_SMALL hankcs 2022-10-06 14:24:12 -04:00
8a4af18bd2 Revise documentation hankcs 2022-09-28 20:02:31 -04:00
5ba95eabc9 Release language identification APIs which can recognize 176 languages hankcs 2022-09-28 17:38:18 -04:00
125b2b01e0 Revise documentation hankcs 2022-09-15 11:41:14 -04:00
b165273646 自定义词典支持.tsv格式 fix: https://github.com/hankcs/HanLP/issues/1785 hankcs 2022-09-15 16:00:23 -04:00
87a5e9b9fc add TSL cert verify switch to support network env behind private TSL gateway Hu Dengke 2022-09-15 20:24:03 +08:00
01331888c5 Make empty string consistent between STL and MTL: https://github.com/hankcs/HanLP/issues/1778 hankcs 2022-08-26 21:02:53 -04:00
9210d5aa78 Fix empty string tokens in TransformerSequenceTokenizer fix: https://github.com/hankcs/HanLP/issues/1778 hankcs 2022-08-26 02:56:48 -04:00
76bde7817d Script to train SOTA PKU CWS hankcs 2022-08-24 12:53:45 -04:00
b216b242e1 Merge pull request #1775 from carl10086/bugfix/bintrie_parsetext hankcs 2022-08-12 01:30:27 -04:00
3dddcfe82a Revise documentation hankcs 2022-08-12 01:06:58 -04:00
551d578ab5 bugfix: 修复 bintrie 树全分词时提前跳出循环 bug carl10086 2022-08-12 00:15:44 +08:00
ea17ae2283 Fix tokenizer evaluation during training fix: https://github.com/hankcs/HanLP/issues/1773 hankcs 2022-08-11 02:08:13 -04:00
866d8a639a Ask users to read the doc when they try to set dict for a MTL component hankcs 2022-08-10 22:21:43 -04:00
8fe00ad4c5 Update MUL-MTL model with SDP fixed hankcs 2022-07-31 16:51:34 -04:00
6d1d6a0f19 Release grammatical_error_correction APIs hankcs 2022-07-29 19:11:29 -04:00
c3d90154c3 Disable MPS on M1 due to its poor robustness hankcs 2022-07-19 17:57:09 -04:00
75c95b9fbe Update the SDP model hankcs 2022-07-19 17:22:17 -04:00
342481b245 Fix sdp that root doesn't get learnt hankcs 2022-07-18 19:54:39 -04:00
4339b09a29 avoid cycle between a pair of nodes hankcs 2022-07-18 18:57:48 -04:00
6dbd5a8e42 Revise documentation hankcs 2022-07-15 20:20:19 -04:00
2a628ead79 Fix decompression on Windows fix: https://github.com/hankcs/HanLP/issues/1757 hankcs 2022-07-07 01:39:34 -04:00
5d14517cf5 Fix cases that a single char gets split into multiple subtokens fix: https://github.com/hankcs/HanLP/issues/1756 fix: https://github.com/hankcs/HanLP/issues/1754 hankcs 2022-07-05 21:46:07 -04:00
eca5f99e58 Improve pretty_print style hankcs 2022-07-02 16:01:20 -04:00
08724dd82e Improve helper functions for Document hankcs 2022-06-28 21:58:16 -04:00
7a71ceedeb Release a small MTL model trained on our new corpora hankcs 2022-06-26 17:58:57 -04:00
5c53c38f40 Release mMiniLMv2L12 version of MTL on UD210 hankcs 2022-06-21 19:03:20 -04:00
16e32af76d Revise documentation hankcs 2022-06-19 10:23:59 -04:00
3d01174e87 Replace XLM_SMALL with MMINILMV2L6 hankcs 2022-06-19 09:08:50 -04:00
9c8b620df5 Fix transformer tokenizer on CIMERLI™ hankcs 2022-06-17 23:49:24 -04:00
9d9f45c14f Release multilingual tokenizers trained with MiniLMv2 hankcs 2022-06-17 02:32:55 -04:00
c48049287e Prepare to retire SUBWORD_ENCODING_CWS hankcs 2022-06-16 10:10:46 -04:00
3a3b246d62 Release mMiniLMv2 with spaces pruned hankcs 2022-06-16 09:57:40 -04:00
7616dbb4cf Update two tok models trained on 100m corpora hankcs 2022-06-15 23:24:53 -04:00
4fbd69bf82 Release a multilingual MTL model trained with MiniLMv2 hankcs 2022-06-15 22:13:26 -04:00
3e0d16eb0b Update UD_TOK_XLM_SMALL model hankcs 2022-06-15 21:37:58 -04:00
d16773de6b Revise documentation hankcs 2022-06-15 20:43:33 -04:00
65b1e58044 transformer_layers means number of bottom layers hankcs 2022-06-15 18:07:05 -04:00
18275b54f4 Expose only split_sentence hankcs 2022-06-15 16:19:49 -04:00
c9317aeb80 Fix edge cases in split_sentence hankcs 2022-06-15 16:16:50 -04:00
584ce7e5ed Release a multilingual tokenizer trained with MiniLMv2 hankcs 2022-06-14 20:14:02 -04:00
52cf3b5ecc Activate dict_force in load hankcs 2022-06-14 20:13:55 -04:00
df8308af42 Release xlm-roberta-small-no-space which has spaces pruned hankcs 2022-06-13 12:02:44 -04:00
044156a6dc Revise documentation hankcs 2022-06-12 14:07:20 -04:00
ee3d178fd3 Update two tok models with F1 > 98% hankcs 2022-06-12 11:43:39 -04:00
606fe2fb3c Release xlm-roberta-base-no-space which has spaces pruned hankcs 2022-06-10 23:52:34 -04:00
17492c1edb Fix pruning using max_seq_len hankcs 2022-06-10 23:52:13 -04:00
9b1ed200b8 Support eval_trn to speed up training hankcs 2021-06-02 20:48:14 -04:00
1c474e3f1f Revise documentation hankcs 2022-06-09 21:58:16 -04:00
e1c07005b0 max_sequence_length of TransformerEncoder defaults to max_position_embeddings hankcs 2022-06-09 20:07:43 -04:00
646ca7d57e Support 130 languages trained on Universal Dependencies 2.10 hankcs 2022-06-08 01:15:25 -04:00
c78515c872 Fix offset generated with dict_force hankcs 2022-06-07 22:07:37 -04:00
924c768ba7 Support accelerated PyTorch on macOS M1 chips: https://www.hankcs.com/nlp/hanlp-official-m1-support.html hankcs 2022-06-06 23:22:33 -04:00
8721373f15 Support Universal Dependencies 2.10 hankcs 2022-05-18 22:25:45 -04:00
68e6527dfd Improve error log hankcs 2022-05-12 22:37:49 -04:00
672d662c9a Deprecated length_field. Since the memory consumption is dominated by encoders, input_ids is always the field that determines the length of a sample. hankcs 2022-05-11 18:07:16 -04:00
11113b8027 Release MSR_TOK_ELECTRA_BASE_CRF model hankcs 2022-05-07 19:50:55 -04:00
3fb16ccc0a Revise documentation hankcs 2022-05-04 20:23:24 -04:00
65058e4033 Release RESTful extractive_summarization APIs hankcs 2022-05-04 11:51:00 -04:00
c371be1719 Release two Electra base tok models trained on CTB9 hankcs 2022-04-26 11:34:24 -04:00
94b88c0997 Warn the user that only zh supports coarse tokenization hankcs 2022-04-24 23:10:15 -04:00
2d5aba2b09 Improve the robustness of SRL visualization hankcs 2022-04-20 19:53:42 -04:00
eb35d12533 Revise documentation hankcs 2022-04-20 12:55:08 -04:00
7af95780e7 Fix the len of trie fix: https://github.com/hankcs/HanLP/issues/1728 hankcs 2022-04-30 14:46:01 -05:00
77217d5e3e Fix output_spans with dict_combine fix: https://github.com/hankcs/HanLP/issues/1727 hankcs 2022-04-20 10:04:05 -04:00
396568c355 Give PadSequenceDataLoader the option to skip padding hankcs 2022-04-19 13:59:12 -04:00
53179223de Fix fasttext URL in PTB_POS_RNN_FASTTEXT_EN hankcs 2022-04-18 10:20:06 -04:00
86c68657e3 Revise documentation hankcs 2022-04-16 16:06:13 -04:00
e52dc9f4d0 Fix matching issue caused by dict_force in e7eb64b05b fix: https://github.com/hankcs/HanLP/issues/1722 hankcs 2022-04-16 16:06:13 -04:00
d90717d6a1 Revise documentation hankcs 2022-04-16 15:37:29 -04:00
95f89563c5 Release RESTful keyphrase_extraction APIs hankcs 2022-04-15 23:48:45 -04:00
9b3a786ea5 Release a Chinese MRP model with Mengzi PLM hankcs 2022-04-15 12:29:06 -04:00
15bb02f3d5 Fix edge cases of empty inputs for MTL hankcs 2022-04-14 11:50:39 -04:00
ea11f96778 Use the latest perin-parser hankcs 2022-04-14 00:28:29 -04:00
19eb659fee Release RESTful abstract_meaning_representation APIs hankcs 2022-04-13 02:45:59 -04:00
26ff093c0a Release a SOTA joint Chinese-English AMR model hankcs 2022-04-12 22:41:09 -04:00
a808b0fa27 Fix training CRF in TaggingNamedEntityRecognition: https://bbs.hankcs.com/t/topic/4132/3?u=hankcs hankcs 2022-04-11 23:42:07 -04:00
867cc8da53 修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: https://github.com/hankcs/HanLP/issues/1718 hankcs 2022-04-09 13:40:25 -04:00
342a2b272b Revise documentation hankcs 2022-04-07 23:59:21 -04:00
740e37d4d1 Add language parameter to hanlp_restful.HanLPClient.__call__ hankcs 2022-04-06 19:20:40 -04:00
dd18590d4e Improve visualization of constituency tree hankcs 2022-04-01 18:40:39 -04:00
cb6ee6c167 Release an ERNIE-GRAM constituency model hankcs 2022-03-31 12:50:29 -04:00
4467a3c88f Optimize merging sub-tokens hankcs 2022-03-22 19:10:59 -04:00
e843a4e66a Test on ubuntu-latest, macos-latest, windows-latest hankcs 2022-03-22 18:51:44 -04:00
e7eb64b05b Let dict_force match original text directly hankcs 2022-03-22 18:15:37 -04:00

1 2 3 4 5 ...