Commit Graph

  • 54713244e0 Add a SetPhraseHead operation for rearranging words, useful in the short term for updating name phrases in the UD datasets (especially the Sindhi one) so the first word in a name is the head via flat relations John Bauer 2025-05-05 22:32:39 -07:00
  • f2c6d40f29 Oops, allow an EditNode which only removes morpho features John Bauer 2025-05-05 14:56:08 -07:00
  • 6e6a7d0697 Allow lowercase -removemorphofeatures John Bauer 2025-05-05 14:52:20 -07:00
  • 5e4ca9cae6 The previous formatting looked horrible. Try to update the javadoc layout for Ssurgeon John Bauer 2025-05-01 08:48:20 -07:00
  • cf540f9ccd Add a Semgrex feature to test for connection in either direction, either gov or dep. Greatly simplifies expressions which previously needed to look in both directions John Bauer 2025-04-19 00:45:57 -07:00
  • 77f72914ac When using Ssurgeon to split a word, and we know exactly how we want the word split, we can provide those splits rather than trying to regex our way to the right solution John Bauer 2025-04-15 13:22:10 -07:00
  • 634536287e Chrome is blocking .bib downloads with http, so switch the link to https John Bauer 2025-04-07 16:32:15 -07:00
  • 54fec11b6b Add the ability to search for morphofeatures using a regex in the key. Greatly simplifies certain searches for whether or not a feature exists, for example John Bauer 2025-04-06 17:24:48 -07:00
  • 0f1997f1ec Merge branch 'dev' of github.com:stanfordnlp/CoreNLP into dev John Bauer 2025-04-06 17:22:59 -07:00
  • fbca04d1dc Add the ability to remove just a single Morpho feature without having set it. Turns out, the one who needed that feature was me. Also, leave a note on a bomb that can happen if an operation is put over two lines John Bauer 2025-04-06 17:22:52 -07:00
  • ff8d700ea0 Mention other places where jakarta.json is used. Christopher Manning 2025-03-28 14:21:04 -07:00
  • 96e5cdd266 Update links for version 4.5.9 John Bauer 2025-03-24 01:00:25 -07:00
  • cabc020812 Update readmes etc for a version 4.5.9 release v4.5.9 John Bauer 2025-03-23 23:38:02 -07:00
  • 94739c7062 Added lemmas to tsv even without verbose output format option churow 2025-03-17 10:52:30 +09:00
  • bf3af4027b Remove the functionality to allow any user specified deserializer - it is pointed out that it may be possible for a properly constructed deserializer to execute arbitrary code. See https://github.com/stanfordnlp/CoreNLP/security/advisories/GHSA-wv35-hv9v-526p John Bauer 2025-03-22 01:10:33 -07:00
  • 346259c0c1 Fix a buggy test for broken semgrex contains features John Bauer 2025-03-21 13:40:34 -07:00
  • b5719a2d57 Allow negative constraints that match an existing positive constraint. Actually, there may not even be a reason to disallow two positive constraints if they are both regex John Bauer 2025-03-21 13:38:56 -07:00
  • 821b8fd0fe Add documentation for some of the features recently added to Semgrex and Ssurgeon John Bauer 2025-03-08 16:46:42 -08:00
  • dd511c44b6 Remove CoreNLP naturalli/demo, now archived in CoreNLP-research John Bauer 2025-03-07 20:02:20 -08:00
  • 3c30b3b8c9 Rewrite the contains syntax to look a bit more like a map. John Bauer 2025-03-05 23:53:21 -08:00
  • ff1d903d62 Add a negative containment to semgrex to match the containment option John Bauer 2025-02-27 08:08:18 -08:00
  • 84ac9328e2 Add a node containment option to semgrex that works on CoreAnnotations which are Maps. John Bauer 2025-02-26 12:53:20 -08:00
  • 3a89d67637 Verify that two more keywords used in Ssurgeon don't conflict with AnnotationLookup keys John Bauer 2025-03-06 11:50:37 -08:00
  • 8e7d1218a7 Add functionality to Ssurgeon that allows for removing a field (such as lemma) from a node John Bauer 2025-03-06 11:43:28 -08:00
  • 156fad1352 Add to Ssurgeon a ReindexGraph operation which recounts the indices on the nodes John Bauer 2025-03-04 23:26:04 -08:00
  • 499eb5b096 Move the Ssurgeon reindex operations from AddDep to SsurgeonUtils John Bauer 2025-03-04 18:23:38 -08:00
  • dcee001522 Small semgrex doc fix John Bauer 2025-02-26 16:47:22 -08:00
  • 0de1865fbb Refactor checking matches for Attributes. Also, functionality change - if an Attribute is negated, accept the node not having any value at all John Bauer 2025-02-26 16:44:19 -08:00
  • bff35886ad When creating a NodePattern from NodeAttributes in Semgrex, pass around the NodeAttributes as a single object instead of passing individual pieces. Will make it easier to add more pieces to the NodeAttributes John Bauer 2025-02-26 13:01:50 -08:00
  • 4bdd5fd6d2 This is a Map, not a List John Bauer 2025-02-26 11:37:23 -08:00
  • d2895eebd7 Add a tiny bit of doc John Bauer 2025-02-26 11:31:21 -08:00
  • d1e483dce4 Update a comment on the -updateMorphoFeatures ability of Ssurgeon EditNode John Bauer 2025-02-26 11:08:14 -08:00
  • f16b16e113 Simplify - remove duplicate CoNLLUTagUpdater script (only difference being, write to disk or write to stdout) John Bauer 2025-02-24 23:57:14 -08:00
  • 1a1f4b147a Add a small link to the python version of the nndep trainer John Bauer 2025-02-24 17:48:30 -08:00
  • 7de9961e71 Update some more links & tables John Bauer 2025-02-24 17:37:01 -08:00
  • cb35ff53d1 Attempt to fix some tables and links in the dependency parser page John Bauer 2025-02-24 17:29:58 -08:00
  • ec801b3d2d Try to fix missing nndep-example.png John Bauer 2025-02-24 17:14:10 -08:00
  • 7399e9b416 Add the capacity to negate attributes in a node, rather than requiring negative lookahead regex John Bauer 2025-02-19 01:19:56 -08:00
  • 81290ba443 Whitespace in Semgrex John Bauer 2025-02-12 18:02:08 -08:00
  • 298df018b0 Turn isLink final John Bauer 2025-02-12 17:29:44 -08:00
  • 4f15b08899 Add a couple more demonyms from the LinES and ParTUT treebanks John Bauer 2024-12-30 23:43:23 -08:00
  • 0be6b7f03f Whitespace John Bauer 2025-02-11 22:57:29 -08:00
  • 5004daa9f4 Update for release of 4.5.8 John Bauer 2024-12-28 23:42:24 -08:00
  • 2241e83553 Update info for .NET Christopher Manning 2024-12-23 17:16:56 -08:00
  • 59706394c2 Update the README for a full release of 4.5.8 v4.5.8 John Bauer 2024-12-12 13:31:47 -08:00
  • 9b1c058d8e Oops, address of new json library was incorrect John Bauer 2024-12-11 12:28:33 -08:00
  • 698d84b82f Add a prop file for a newer version of the French WikiNER dataset John Bauer 2024-12-04 03:12:16 -08:00
  • 766013c1bf UD converter now doesn't put SpaceAfter=no on the first or middle words of an MWT John Bauer 2024-11-28 11:23:17 -08:00
  • 87fe2ef82e use the get which returns null instead of throwing John Bauer 2024-11-28 03:16:51 -08:00
  • 2341d3310c Don't be SpaceAfter=No annotations on words which are at the start or middle of an MWT John Bauer 2024-11-27 23:20:02 -08:00
  • f8b69c46ca Add a line about how to use a specific tokenizer model, properly escaping the model name, as part of wget. https://github.com/stanfordnlp/CoreNLP/issues/1470 John Bauer 2024-11-27 15:07:32 -08:00
  • 6f6eb93585 Need to cache tokenize annotators based on the segment properties as well John Bauer 2024-11-27 01:33:01 -08:00
  • 7a0dc7b60a Add an explanation on how to escape the Arabic segmenter name John Bauer 2024-11-27 01:20:13 -08:00
  • 9732f82e63 Add a bit of doc about a particular converter feature John Bauer 2024-11-12 13:27:00 -08:00
  • 26f40ee9c8 When processing "not only" and similar phrases into UD, separate them from the CONJP (sometimes ADVP by error) that they show up in. This allows the later part of the converter to connect both of them to the parent with advmod. John Bauer 2024-11-07 21:05:56 -08:00
  • 7ee07602ff update LS First -> RB First in the Treebank Corrector John Bauer 2024-11-07 09:34:46 -08:00
  • 1fe3a3e717 Sort the lines of the graph when printing it for debug purposes, so that random hash ordering doesn't cause the graph to have a different output John Bauer 2024-06-04 20:37:50 -07:00
  • 819126fdbb Fixes Yoda speak dependencies John Bauer 2024-06-04 17:53:52 -07:00
  • 580a135b00 Add a bunch of words not previously included in the say regex John Bauer 2024-05-18 12:23:51 -07:00
  • cf11fbf309 Alphabetize the regex for English say patterns John Bauer 2024-05-18 11:35:01 -07:00
  • d9f6589de1 Move the corrector earlier in the UDCoverter process. Uses the corrected trees for the structure of the UD graphs, not just the tags. Noticeably reduces the number of validator errors John Bauer 2024-04-30 19:09:14 -07:00
  • f2df23094d Add an array-like constructor for the CompositeTreeTransformer John Bauer 2024-04-29 23:02:33 -07:00
  • 3fef54a8fd If the Corrector is used, use its xpos tags as well when building the conll John Bauer 2024-04-29 19:31:03 -07:00
  • cd9482375b Update a couple trees to have adverb types instead of ADP. Need to make sure the XPOS tags get updated in the converter when using the PTB corrector John Bauer 2024-04-29 18:53:04 -07:00
  • 5bac5eb3e8 Also flatten combined RB or ADVP phrases John Bauer 2024-04-29 09:29:41 -07:00
  • 62c4ea4579 Useful debugging output from CoordinationTransformer. Could think about changing TreeGraphNode to print out the whole tree, but that would presumably mess up some various dependency outputs John Bauer 2024-04-29 09:28:00 -07:00
  • 339e301309 Remove the QP in a structure such as '(NP (QP About a) day)' so that the resulting dependencies both connect to day instead of from about -> a, changing the UD nummod to a det John Bauer 2024-04-29 08:06:58 -07:00
  • e27d041936 UPOS for LS can be NUM, not X - for example, first, 1), a) John Bauer 2024-04-28 01:17:15 -07:00
  • 018ad17263 Change the dependency relation of list items to discourse instead of nummod, as described in https://github.com/UniversalDependencies/UD_English-EWT/issues/518 John Bauer 2024-04-27 16:13:03 -07:00
  • 5047ac1a2d Add files such as the fonts and brat js libraries to the server as local files John Bauer 2024-11-06 17:18:56 -08:00
  • 461db9114d Only allow the contextRoot to process annotations. Should make it so the server no longer appears to have a WordPress blog for the automated security checks John Bauer 2024-11-06 01:42:32 -08:00
  • 2c574ad3f3 Update Lucene from 7.5.0 to 9.9.2 as requested in https://github.com/stanfordnlp/CoreNLP/issues/1408 to address https://osv.dev/vulnerability/OSV-2023-705 lucene_update John Bauer 2024-01-17 15:31:27 -08:00
  • cfa4349bc4 Apparently MDT is too ambiguous to expect SUTime to get a canonical time zone, unless we add our own locale-aware time zone disambiguator. https://stackoverflow.com/questions/79116972 John Bauer 2024-10-23 08:49:07 -07:00
  • c55691c262 Update SUTime to pick timezones based on the current year John Bauer 2024-10-22 16:53:48 -07:00
  • 50179209d2 Test that icecream is merged as expected (a test case from a user) John Bauer 2024-10-21 17:58:40 -07:00
  • e83284920c Update joda-time to 2.13.0 to hopefully fix the Moscow timezone unit test. Probably should update to Java time instead John Bauer 2024-10-18 17:46:22 -07:00
  • c3be0fdb8c Remove WTS annotator in sutime itest John Bauer 2024-10-18 13:39:13 -07:00
  • 98c0b7d50a Add an option to skip the MWT in a conllu file when training a tagger John Bauer 2024-10-16 21:47:10 -07:00
  • e87f437888 Also output the known tags in a dataset after the dataset has been retagged in the srparser John Bauer 2024-10-16 17:22:18 -07:00
  • 614b9368d9 Add a logging line which tells us which tags are in the tagger used by the srparser John Bauer 2024-10-16 17:09:46 -07:00
  • 83b38bb394 Update test score for German model to reflect latest UD dataset. Add a bit more explanation to the missed accuracy assertion John Bauer 2024-10-15 14:33:56 -07:00
  • e1e7227ec8 update lucene to 7.7.3 https://github.com/stanfordnlp/CoreNLP/issues/1465 John Bauer 2024-10-08 19:22:03 -07:00
  • 79ddc313b4 Oops, need to update the sample .xml as well John Bauer 2024-10-08 19:52:46 -07:00
  • f8e48099da Update javax to jakarta json 1.1.6 to hopefully fix some security issues. https://github.com/stanfordnlp/CoreNLP/issues/1465 John Bauer 2024-10-08 16:59:50 -07:00
  • f2b9441333 Update protobuf from 3.19.6 to 3.25.5 https://github.com/stanfordnlp/CoreNLP/issues/1465 John Bauer 2024-09-24 13:56:54 -07:00
  • 05804a35df Remove recursion - really weird graphs can cause a stack overflow. Instead, just loop until the vertexIterator is used up. Observed in https://github.com/stanfordnlp/CoreNLP/issues/1461 John Bauer 2024-08-15 05:26:18 -07:00
  • 154fd142ce Move LuceneSentenceIndex out of util to patterns (only place where used) Christopher Manning 2024-07-29 10:27:32 -07:00
  • 404adab67f Remove single dependency of another package on edu.stanford.nlp.patterns Christopher Manning 2024-07-22 16:05:26 -07:00
  • afb76925e8 Tiny cleanup; no functional changes Christopher Manning 2024-07-22 15:13:12 -07:00
  • 0e39b3731d Add the ability to mark newly created nodes with names in the SemgrexMatcher, allowing for a compound operation which then assigns more fields to that node John Bauer 2024-07-02 13:46:22 -07:00
  • 13ede5a265 Add an Ssurgeon feature which splits a word into pieces based on regex matches. A word can be specified as the head of the new pieces, along with the relation. Other words are pushed down the sentence to make the indices line up John Bauer 2024-07-02 01:00:09 -07:00
  • bf8ee06747 weight was not being used in AddDep... generally not likely to matter though John Bauer 2024-07-01 01:00:20 -07:00
  • 147552b56e My fault for not noticing this BS earlier John Bauer 2024-06-27 10:28:57 -07:00
  • a3532c2c69 Oops, got the semantics of subList wrong in the debugging code John Bauer 2024-06-26 16:33:49 -07:00
  • 3a35e53909 Whitespace John Bauer 2024-06-26 16:32:35 -07:00
  • 899204a68b Refactor somewhat? Will allow for logging more stuff upon failure John Bauer 2024-06-25 20:08:15 -07:00
  • 6e554c7f64 Try to throw an exception with the end of the log file if the coref benchmark test doesn't work as expected John Bauer 2024-06-25 19:59:42 -07:00
  • 243a8df032 austrian german month names for german tokenizer post processor Bernhard 2024-06-24 21:53:18 +02:00
  • 935d2fa409 Print out LOG FILE when printing the log file. Whitespace align some stuff John Bauer 2024-06-18 01:01:53 -07:00
  • ce88c9cd68 why :( John Bauer 2024-06-04 15:55:20 -07:00