Commit Graph

  • 8bdf456c20 Remove obsolete documentation from RemoveNamedEdge John Bauer 2023-03-01 14:21:24 -08:00
  • 9ad7886057 Update comments in Semgrex - not sure that single root matches is still a limitation John Bauer 2023-01-26 08:51:58 -08:00
  • b77103cd68 Fix a comment John Bauer 2023-01-23 18:18:21 -08:00
  • edaaf83ecd Add an exception for a missing node name to setRoots. Test the results of some basic setRoots operations John Bauer 2023-01-23 16:06:11 -08:00
  • e94c812ab1 Some Ssurgeon commands don't use the standard argsbox. Parse those separately before using the argsbox (otherwise SetRoots won't correctly parse, for example) John Bauer 2023-01-23 15:12:42 -08:00
  • 45f8c945f2 Brief test that killAllIncomingEdges does what we expect. Use a list instead of an iterator to find the edges to be deleted (to ensure the iteration doesn't mess up the maps while iterating) John Bauer 2023-01-21 09:00:34 -08:00
  • 1b9203e81b No checked Exceptions when building a CoreLabel from items Throw UnsupportedOperationException immediately for a type which can't be converted in UnsupportedOperationException when building a CoreLabel from items John Bauer 2023-03-04 19:17:38 -08:00
  • e7a7657713 Get rid of setRelation - it was simply broken as the various edges might be in Set or Map, and changing the relation would change the hash key John Bauer 2023-03-03 18:13:41 -08:00
  • e3e01a5c23 Make this test junit 4 and make it not hide failures John Bauer 2023-03-05 11:10:46 -08:00
  • 78cf1afb3e Add a test that the ProtobufAnnotationSerializer keeps the tree labels as the same objects when deserializing a document John Bauer 2023-02-27 18:04:43 -08:00
  • e0b6e4a2f6 The punchline to the previous few commits - now the words from the gold tree can be used to determine whether or not to eliminate the words in the guess tree. This will make it so the test & gold trees are the same, hopefully eliminating most or all of the 'Unable to evaluate...' that happens after retagging trees with the POS tagger John Bauer 2023-02-22 23:43:07 -08:00
  • 954facf8c2 Update the Collinizer interface to allow for two trees, both the test tree and the gold tree John Bauer 2023-02-20 12:41:02 -08:00
  • 996e4e9cbd Separate the AbstractCollinizer from the TreeTransformer John Bauer 2023-02-25 16:12:00 -08:00
  • 5e2ec0fbeb Make a new base class for the Collinizers - eventually we will separate the transform method so the gold & guess trees can have the same punctuation treatment John Bauer 2023-02-20 09:02:38 -08:00
  • c2aef03813 Remove a bunch of obscure eval code which is not used anywhere in the main repo - it has been moved to -research John Bauer 2023-02-22 22:15:47 -08:00
  • e274c4605b Collinizer is not used in this script, apparently. Not sure what this script even does (cdm had mentioned deleting it) John Bauer 2023-02-21 23:14:32 -08:00
  • 49650f643a Whitespace John Bauer 2023-02-21 21:34:04 -08:00
  • 33e6c42b37 Add the choice of dependency graph to output to the TextOutputter, as requested in #1339 John Bauer 2023-02-18 23:25:45 -08:00
  • 5c085f0cdf Fix a weird typo John Bauer 2023-02-18 22:45:34 -08:00
  • 812317be1d Switch the semantics of = and ~ in Semgrex. Now, =foo on an edge means subsequent uses of = must match the same exact edge, whereas ~ means the relation name must be the same. Hopefully all existing uses are not broken since they case about the relation string but not the match type, and the relation string can still be matched John Bauer 2023-01-31 18:34:52 -08:00
  • 74fc2bdcca resetRoots clears roots if there are no nodes at all. This addresses a very minor bug where deleting an entire graph (such as with a Ssurgeon rule that accidentally deleted everything) results in a graph that can't be printed without NPE John Bauer 2023-02-03 09:14:48 -08:00
  • 52d05fc84b Use edgeName to represent both the edge itself and its relation. We want to switch = and ~ in the SemgrexPattern without ruining old patterns. Since the Javadoc (incorrectly) stated that backreferences were not working for =, hopefully no one will have patterns using =, so as long as we still support the naming function, switching won't break anything... John Bauer 2023-01-31 18:05:29 -08:00
  • 00dee0732b Debug output using the SemgrexPattern to show how something matched John Bauer 2023-01-29 19:51:06 -08:00
  • dbdb55b32f Fix typo in per_children.rules John Bauer 2023-01-30 16:17:34 -08:00
  • 5a34b90c94 Save the semgrex rules for EN, ZH, ES to git - will be useful for tracking changes or just so other people can see the rules John Bauer 2023-01-30 09:53:03 -08:00
  • 60c685e777 MentionsAnnotation is used by KBP but isn't set... this certainly looks like it should be a requirement, not a satisfied John Bauer 2023-01-29 14:04:38 -08:00
  • 92a7227a7f Add release doc for some recent CoreNLP - some notes are missing though John Bauer 2023-01-25 16:14:34 -08:00
  • b61f768628 Update 4.5.1 -> 4.5.2 John Bauer 2023-01-20 17:01:25 -08:00
  • a8aaaf2002 Update download links for 4.5.2 v4.5.2 John Bauer 2023-01-20 16:58:17 -08:00
  • 7c3b91a0d7 Update poms & readmes for 4.5.2 John Bauer 2023-01-19 23:53:27 -08:00
  • 232931dd8d Iterate through all the patterns in an Ssurgeon, even if one does (or doesn't) change the graph John Bauer 2023-01-19 19:01:57 -08:00
  • 9c3dfee5af Add a CLI processor which processes Ssurgeon requests. Will be the CoreNLP side of an Ssurgeon interface for Python. Included in this change is adding optional tokens to the DependencyGraph John Bauer 2023-01-19 17:24:56 -08:00
  • 9a39cf0a84 Add any named edges to the results of a Semgrex CLI request John Bauer 2023-01-18 13:29:41 -08:00
  • 4ed3dc020a Allow reln == null to represent not having any requirements on the label of the edge to be deleted. Essentially, it deletes all the edges between the two nodes. Still need to test the gov / dep wildcard variants of those - they seem less useful when calling iterate(), though John Bauer 2023-01-15 23:41:19 -08:00
  • 195e259e36 Zero argument commmands in Ssurgeon need slightly different arg parsing. This is the only bugfix needed to make killNonRooted work as expected John Bauer 2023-01-15 20:17:13 -08:00
  • 2823808cfa Upgrade Ssurgeon RemoveNamedEdge to remove edges based on their name (it was currently unused everyone, including RTE, so changing the interface should be no harm done) John Bauer 2023-01-15 15:54:51 -08:00
  • f9ee36f4b7 Ssurgeon KillNonRootedNodes: comment on how this works John Bauer 2023-01-13 07:50:02 -08:00
  • 7ad1d9c9bb Update the prune operation to only reset the roots if a root is deleted John Bauer 2023-01-13 01:35:26 -08:00
  • b606bec620 Add a test to Ssurgeon - removeEdge already works as we expect it to John Bauer 2023-01-10 22:14:06 -08:00
  • e870f2acd0 Add an 'iterate' method to Ssurgeon which repeats an operation until it no longer does anything new John Bauer 2023-01-10 11:12:16 -08:00
  • 63921945b1 Ssurgeon Javadoc fix John Bauer 2022-12-07 17:43:53 -08:00
  • 0688a0118d Add a boolean which returns whether or not something was changed by a SsurgeonEdit evaluation. Add a bunch of notes to the operations, including some notes on whether or not they are used in RTE John Bauer 2022-11-04 18:30:04 -07:00
  • 6deae6726f Executes a single addEdge. Perhaps need to investigate making this work multiple times over a single graph instead of returning all possible single modifications of the graph John Bauer 2022-11-03 00:14:07 -07:00
  • 774e59981c wait a second... John Bauer 2022-11-02 23:53:25 -07:00
  • 91ded427df Add an empty test to check the simplest XML format for an Ssurgeon operation John Bauer 2022-11-02 23:45:35 -07:00
  • 0c70cef986 Turn a bunch more exceptions into RuntimeException instead of checked in Ssurgeon John Bauer 2022-11-02 23:13:39 -07:00
  • b5453d5e32 Simplify exception handling in Ssurgeon John Bauer 2022-11-02 23:04:52 -07:00
  • 3e9cc91851 Make a few things final for better readability in Ssurgeon John Bauer 2022-11-02 23:01:20 -07:00
  • 2ddef3d11d Align some whitespace for better readability in Ssurgeon John Bauer 2022-11-02 23:00:33 -07:00
  • c39d25bc6a Save the edge being matched in the GraphRelation. John Bauer 2023-01-14 21:29:57 -08:00
  • 4b1728ccb1 Backreferencing works fine for relation names. Update the documentation and add a test of the feature John Bauer 2023-01-14 22:28:55 -08:00
  • 352f01e194 Make a Semgrex reference an actual link for use in javadoc John Bauer 2022-11-08 00:35:56 -08:00
  • e1080d3906 Add to Semgrex some of the other relations defined by spacy's implementation of semgrex: <++ <-- >++ <-- John Bauer 2022-11-02 18:21:19 -07:00
  • f4497f889f Add a couple left/right relations similar to the existing one to Semgrex. Implementations of the Spacy extensions of the same operations John Bauer 2022-11-01 17:57:41 -07:00
  • 379503c2f7 Make a couple items in SemanticGraphEdge final John Bauer 2023-01-13 14:48:39 -08:00
  • c8ead705e1 Knowing that Semgraph member is a Set should be easier to understand when looking at the collection of roots John Bauer 2023-01-13 01:31:50 -08:00
  • a48971134c Add the ability to reuse indices in SemanticGraph.valueOf This possibly changes the meaning of existing expressions, since it was previously possible to assign multiple words to the same index, but that was a bad feature anyway John Bauer 2023-01-10 22:00:45 -08:00
  • 8c01472dbb Add a couple simple tests of SemanticGraph.valueOf John Bauer 2023-01-10 21:54:45 -08:00
  • a536c80c0d Use the incoming edge iterator instead of the parent nodes. A bit cleaner, and a later change to keep the matched edge will be much easier to write John Bauer 2023-01-15 11:05:58 -08:00
  • c2bfb57186 Return dependencies with punctuation dependencies when converting constituencies to dependencies in the CLI John Bauer 2023-01-09 15:05:07 -08:00
  • d978de1795 Update xom from 1.3.7 to 1.3.8 to hopefully make the transitive dependency xerces unneeded (it has a security vulnerability). I do not believe the features of xom which CoreNLP uses needs xerces John Bauer 2023-01-04 01:07:24 -08:00
  • d9d5fb9e8e Add a Java command line tool which converts trees to dependency graphs using protobufs. Included is an update to SemanticGraph.valueOf to set a sentIndex and an option in Tree to yield CoreLabels with the word as the Value instead of the tag John Bauer 2023-01-02 12:08:07 -08:00
  • 67783e3328 Add a proto field for a conversion from constituency to dependency trees John Bauer 2023-01-02 12:05:07 -08:00
  • 3864cd4d11 You can't tell me what to do! John Bauer 2023-01-02 10:22:46 -08:00
  • 8709dc0e8e Instead of deserializing ParseTree objects before reconstructing the tokens, wait until we have the final sentence tokens so we can attach those to the leaves of the Tree(s) being created John Bauer 2022-12-15 10:35:03 -08:00
  • 83612aafec Refactor some shared code for processing serialized annotations and combining them with the tokens John Bauer 2022-12-15 10:07:01 -08:00
  • b6c0ccde11 Update single letter variable name John Bauer 2022-12-14 10:20:07 -08:00
  • efdf511898 If the entitymentions annotator ran (indicated by MentionsAnnotation being set), return an empty list of entityMentions for an empty document. Addresses #1322 John Bauer 2022-12-07 17:13:40 -08:00
  • 98f8ba38d9 Update libraries used in the corenlp webapp John Bauer 2022-11-24 14:11:43 -08:00
  • c4c294e226 fix the 'Last updated' field John Bauer 2022-11-24 13:45:10 -08:00
  • 3013cb11f0 Update parser demo to use serDictionary and a more recent segmenter model John Bauer 2022-11-24 13:43:22 -08:00
  • 77ce2968df Update documentation on splitting c'mon Christopher Manning 2022-11-22 10:29:16 -08:00
  • ffdf0125bf Fix the tokenization of 'email' or other things that start with 'em while hopefully not affecting the tokenization of other words. Addresses #1316 John Bauer 2022-11-19 20:13:31 -08:00
  • 10aed64750 Fix an incorrect comment John Bauer 2022-11-19 18:39:42 -08:00
  • 520a0c1c57 Apparent issue in the reading code: the data might not come to Java fast enough from the Python client (or wherever) for it to read the whole thing in one bite. To fix this, we read chunks until reading a chunk fails or there are no more chunks. John Bauer 2022-11-08 18:11:26 -08:00
  • 17b5487c4c Update protos in dev branch to use the new 3.19.6 compiler John Bauer 2022-11-04 00:15:33 -07:00
  • 5b1266892a Update protobuf to 3.19.6 - will avoid a reported vulnerability. Eventually will need to upgrade to a newer version 3.21 or 4 John Bauer 2022-11-04 00:08:24 -07:00
  • 34f4bddba0 Reverse the order of I->J so that tests looking at direction can be used with parents & children John Bauer 2022-11-02 10:09:52 -07:00
  • 8e6cc983b6 Fix a few random javadoc errors. No functional changes John Bauer 2022-11-01 15:52:00 -07:00
  • 52e4f93f99 Add a processor which adds lemmas to word_tag John Bauer 2022-10-13 23:01:31 -07:00
  • e280030435 Fix PTB tokenizer unit test to match new tokenization of c'mon John Bauer 2022-10-08 10:08:28 -07:00
  • 002baa3819 Turn off the 'Testing on Treebank' logging if every other logging line is disabled. Allows for multiple uses in a row without being too noisy John Bauer 2022-10-08 06:10:07 -07:00
  • fe02e741ae Add test Christopher Manning 2022-10-04 11:37:46 -07:00
  • 769273d5b9 Add c'mon and 1782117821 to known Christopher Manning 2022-10-04 11:31:29 -07:00
  • 3b852d8037 Add a description for other version lamrongol 2022-12-25 13:29:00 +09:00
  • c61acf3432 Add a guide to install by Gradle on README.md lamrongol 2022-12-25 13:22:41 +09:00
  • 5231c1104f Tiny updates to mentions of java versions Christopher Manning 2022-11-22 10:50:30 -08:00
  • d6ab4d37a8 Remove -Xmx... and update tokenize,ssplit to tokenize in the documentation John Bauer 2022-09-13 23:57:47 -07:00
  • ebf988ea41 Update what's new note for 4.5.1 John Bauer 2022-08-30 13:04:58 -07:00
  • c895e82cf8 Update various links for 4.5.1 John Bauer 2022-08-29 21:16:23 -07:00
  • f7782ff5f2 Merge branch 'main' into dev v4.5.1 John Bauer 2022-08-29 20:58:39 -07:00
  • d1dd7a66ef Also update readme for new version (we even forgot to do this for 4.5.0) John Bauer 2022-08-29 20:58:27 -07:00
  • af0ec98261 Also update some more pom files to 4.5.1 John Bauer 2022-08-29 20:48:37 -07:00
  • 440b44809c Update various readmes for a 4.5.1 bugfix release John Bauer 2022-08-29 17:24:26 -07:00
  • f99b5ab87f Remove an optimization in semgrex which may be causing crashes John Bauer 2022-08-12 13:47:50 -07:00
  • 755edcf3ee Add a test of the leading comma being split John Bauer 2022-08-23 22:36:42 -07:00
  • 1b12faa64b Make the fallthrough character tokenization also capture unpaired surrogates. Putting it in an | expression should make it so that full codepoints are preferred and half codepoints are only used in an emergency John Bauer 2022-08-18 11:31:18 -07:00
  • 63fda499e9 Allow larger number with comma in fraction denominator and three dollar signs Christopher Manning 2022-08-23 21:01:26 -07:00
  • 945b2764f0 Merge remote-tracking branch 'refs/remotes/origin/dev' into dev Christopher Manning 2022-08-23 20:19:56 -07:00
  • 974383ab73 Adjust NUMBER to not be able to start with a comma Christopher Manning 2022-08-23 20:19:53 -07:00