Commit Graph

  • 682ec349df Also remove AppleJavaExtensions.jar from .classpath dev Christopher Manning 2026-02-10 09:29:09 -08:00
  • f1b362f980 Add the VariableStrings to the response to a semgrex query John Bauer 2026-02-04 07:53:30 -08:00
  • 80d35c98f5 Add support for returning matches in order if there is a ::sort operation in the Semgrex John Bauer 2026-02-04 00:23:05 -08:00
  • 3b6eb8e210 Delete old no longer applicable special purpose license Christopher Manning 2026-02-03 11:17:00 -08:00
  • bd2017efe8 Use IdentityHashMaps to find the indices of the sentence and semgrex pattern (changing graphIndex to sentenceIndex). John Bauer 2026-02-02 21:44:47 -08:00
  • 2747d8f5b0 Add a Sort operation... need to make it properly communicate with the python client, though John Bauer 2026-01-28 00:40:50 -08:00
  • 4bfb0de34e Allow for uniq in Semgrex over edge names (which checks based on the relation type) John Bauer 2025-11-26 23:44:55 -08:00
  • 1edb99a23f Add the ability to semgrex to uniq matches based on a regex varstring, including a couple tests of the ability. Also checks for name conflicts with nodes so that there are no weird ambiguous cases John Bauer 2025-11-19 15:50:55 -08:00
  • 3b6063759c Use Deprecated instead of deprecated, so the compiler can stop complaining John Bauer 2025-11-19 13:43:01 -08:00
  • 029417f399 Add a small bit of doc on the variable groups in Semgrex John Bauer 2025-11-19 09:09:56 -08:00
  • 3c2c6a2723 Upgrade Apache commons lang3 to version 3.19.0 Christopher Manning 2025-11-11 16:32:39 -08:00
  • d4dbc90384 Continued cleanup of old AppleJavaExtrensions and moving to Java 11 Christopher Manning 2025-11-11 14:59:44 -08:00
  • 5ef39a57a9 Update TregexGUI to use Java 11+ Desktop class (not historical Apple stuff) Christopher Manning 2025-11-11 13:05:02 -08:00
  • 645f407229 Replace deprecated newInstance() method Christopher Manning 2025-11-09 06:01:58 -08:00
  • ea0b5a9628 Replace deprecated newInstance() method Christopher Manning 2025-11-08 13:47:33 -08:00
  • ddd3ba20f2 Added to history gh-pages Christopher Manning 2025-10-27 10:38:57 -07:00
  • 87f92a72b6 Fix a piece of doc John Bauer 2025-10-26 17:55:43 -07:00
  • 09b8ecdb28 fix(io): use Thread.currentThread().getContextClassLoader() instead of system class loader when loading resources Clément Doumouro 2025-10-23 17:49:18 +02:00
  • e2c67a07d8 Connect varGroups semgrex_edge_variables John Bauer 2025-08-07 23:08:32 -07:00
  • e7b9256445 Rebuild John Bauer 2025-08-07 23:00:34 -07:00
  • 18816bc1ad Get started connected edges to the VariableStrings as well. This gonna suck, considering how many relations need to be edited... probably the satisfies() and iterate() methods both need updates John Bauer 2025-06-29 22:47:32 -07:00
  • adb8b7adf7 Oops, should pass around the edgename for the CONNECTED relation John Bauer 2025-08-07 23:10:24 -07:00
  • 8d890c9347 Add an error check for a missing -node parameter in the Ssurgeon SplitWord operation John Bauer 2025-07-15 11:46:39 -07:00
  • 3b6684c438 Add some doc on existing features for SplitWord John Bauer 2025-07-15 11:33:34 -07:00
  • 1c6bf4c20f Fix date and Maven (SonaType) link Christopher Manning 2025-06-30 11:02:44 -07:00
  • b301397207 Implement VariableStrings for the attributes. John Bauer 2025-06-27 22:21:28 -07:00
  • 8eba4a35dc Whitespace John Bauer 2025-06-27 22:40:35 -07:00
  • c414b5e73c Upgrade joda-time to 2.14.0 Christopher Manning 2025-06-27 13:48:23 -07:00
  • 24a00d4f75 Pass the graphNumber to (almost all) the GraphRelations built. Still need to pass multiple graphs to the Semgrex and then to the relations ssurgeon_features John Bauer 2025-02-13 00:49:24 -08:00
  • 90749918a7 Add *number as a parsed item in a Semgrex expression. Not implemented in the SemgrexPattern yet, though John Bauer 2023-10-26 16:44:22 -07:00
  • 4f915f1b70 Add the capacity for MergeNodes to merge nodes with the same head. Useful for a case we found with two parts of a time that was too tokenized were pointing to the same head, rather than the time being one self-contained phrase John Bauer 2025-06-16 14:23:09 -07:00
  • 120dbba797 Allow for MergeNodes where multiple nodes pointing out of the group, but to the same node John Bauer 2025-06-14 02:37:02 -07:00
  • 2b4bb1e854 Post 4.5.10 updates to the gh-pages John Bauer 2025-06-07 13:06:59 -07:00
  • 1b7edd19c4 Version bump - remove lucene, add a :: uniq operator to Semgrex main v4.5.10 John Bauer 2025-06-07 12:56:15 -07:00
  • 02296fee9d Remove lucene from pom files John Bauer 2025-06-06 09:39:10 -07:00
  • 2d64892bab Add 'uniq' as a keyword to SemgrexParser.jj Add a UniqPattern which removes duplicates based on the node names given (using the values of those nodes) John Bauer 2025-05-31 12:12:33 -07:00
  • 95cc7ca5f7 SemgrexMatch now supports the other getters for the SemgrexMatcher results John Bauer 2025-06-04 17:23:30 -07:00
  • 5d3367ed17 Upgrade Ssurgeon MergeNodes to treat links inside the same subtree as not relevant to which node is the head. Only consider links outside the subtree when picking a parent John Bauer 2025-06-03 00:13:38 -07:00
  • dc898c58de Attempt to read -pattern as a filename - presumably filenames won't typically work directly as a search string. Read in multiple conllu files if multiple files are provided. John Bauer 2025-06-02 15:58:41 -07:00
  • 3d2c5d4f5d Fix the sentence fiddling when a document boundary is reached John Bauer 2025-06-02 15:50:43 -07:00
  • 8baa096016 Move the batch processing higher up in the file, before the compile methods John Bauer 2025-06-02 13:31:48 -07:00
  • d622867cb6 Add a test of the numbers of things returned by a batch search over a set of graphs John Bauer 2025-06-02 13:27:15 -07:00
  • 570af1ff68 Add a basic toString() to summarize a SemgrexMatch John Bauer 2025-06-02 13:26:48 -07:00
  • 3010f9a97b Refactor a method which processes a list of sentences all at once for a given Semgrex expression. Will make it easier to extend, such as with postprocessing steps John Bauer 2025-06-02 02:35:29 -07:00
  • 3da798173f This is done, actually John Bauer 2025-06-02 02:01:41 -07:00
  • f61ca87762 Separate out the finding of the matches in a Semgrex from the printing of the matches. Will make it easier to do further operations such as sorting or uniqing the matches John Bauer 2025-06-01 01:24:31 -07:00
  • 9358205752 Remove a noisy and mostly useless log line John Bauer 2025-06-01 01:23:35 -07:00
  • 91446297ae Move VariableStrings to util (this may wreck serialized tregex results, if such a thing exists) John Bauer 2025-05-31 18:55:10 -07:00
  • 06e4677879 Consolidate VariableStrings into one location - the two were almost exact copies of each other John Bauer 2025-05-31 18:47:55 -07:00
  • 79fee94843 Add a copy constructor for VariableStrings, switch out the usage of Generics. Will need to condense this into one class with the tregex version John Bauer 2025-05-31 18:41:04 -07:00
  • ebdac385c2 MergeNodes can now handle multiple nodes at the same time John Bauer 2025-05-31 12:12:10 -07:00
  • d9083ce510 Print out EmptyIndex in the CoNLLUDocumentWriter if needed on enhanced dependency graphs. Currently, no special technique to separate copy nodes from empty nodes - presumably no conllu file will ever have both John Bauer 2025-05-31 00:28:51 -07:00
  • d4162c6237 Use // instead of /* when possible John Bauer 2025-05-30 21:39:23 -07:00
  • 638216c075 Mark a bug John Bauer 2025-05-30 21:12:01 -07:00
  • 873b136c5f When outputting MWT, output the SpaceAfter/SpacesAfter from the last word on the MWT token rather than on the last word. This better agrees with the UD standard for where to print these things John Bauer 2025-05-30 21:08:13 -07:00
  • 8f120b9e93 Add a check for the sentIndex on the IndexedWords John Bauer 2025-05-30 20:49:14 -07:00
  • 8ded0138b7 Set the sentenceIdx *before* building the SemanticGraph, so that the hashCodes don't get messed up later when setting the sentenceIdx John Bauer 2025-05-30 20:45:18 -07:00
  • 9914d88ef3 Oops, need to update the test for the new relations as well John Bauer 2025-05-30 18:36:55 -07:00
  • b01834445c Use a LinkedHashMap when building misc key values, allowing the pieces to stay in the same order. Use UniversalEnglish as the language for GrammaticalRelation so that the default separator is : John Bauer 2025-05-30 18:33:00 -07:00
  • c7b15fdeb8 Save the SpacesBefore on an MWT. Save the rest of the MWT misc annotations on the first word of the MWT. Test both operations John Bauer 2025-05-30 18:20:56 -07:00
  • 09428318e1 Oops, forgot to include the test file John Bauer 2025-05-30 18:10:15 -07:00
  • 6152f3a2bd Refactor the mwt misc key values - will want to keep the non-space ones separately. Check that the SpacesAfter is correctly processed John Bauer 2025-05-30 18:09:48 -07:00
  • e5d494ed38 Keep the Misc fields on a CoreLabel John Bauer 2025-05-30 17:58:33 -07:00
  • a9bef7d3c8 Don't use lineSeparator for the 'after' between sentences. Just put a normal whitespace unless the CoNLLU document specifically has a SpacesAfter John Bauer 2025-05-30 17:00:52 -07:00
  • 8d6ea40613 Oops, need to check the comments exist in the unit test John Bauer 2025-05-30 15:22:53 -07:00
  • 723f20d92e Put comments from the CoNLLU on the annotation John Bauer 2025-05-30 14:48:29 -07:00
  • e2acb52131 Include SpacesAfter as well as SpaceAfter=No in the CoNLLUDocumentWriter John Bauer 2025-05-30 09:34:29 -07:00
  • 9eaa7d154f V1 of multigraph multigraph_ssurgeon John Bauer 2023-10-15 23:27:43 -07:00
  • d9b61c4c59 Add CoNLLU as an output format to SemgrexPattern John Bauer 2025-05-30 09:18:18 -07:00
  • c3d2dec05b Update whitespace John Bauer 2025-05-30 08:54:47 -07:00
  • b6ba831e3d Oops, was missing the conllu file for one of the reader tests John Bauer 2025-05-30 00:20:06 -07:00
  • 9caaec5ee9 Switch the SemgrexPattern reader used to the pipeline.CoNLLUReader, which now supports reading more features from the SemanticGraphs John Bauer 2025-05-29 23:36:27 -07:00
  • 78cd91825c Properly handle a graph of one word (previously caused a crash because there were no Edges from which to extract the roots) John Bauer 2025-05-29 23:35:31 -07:00
  • c5d5548fb3 Verify that the enhanced graph isn't present from the enhanced-free conllu John Bauer 2025-05-29 23:31:25 -07:00
  • d95e8b808b Fix a comment John Bauer 2025-05-29 23:26:21 -07:00
  • 378e02edd4 Simple test of the reading code when there's no enhanced graph - clearly was needed, considering the previous bug John Bauer 2025-05-29 23:18:30 -07:00
  • 91cb8e0aa7 Oops, introduced a bug when there's no enhanced graph John Bauer 2025-05-29 23:11:05 -07:00
  • b3f8f97397 Read a SpacesBefore at the start of a sentence / document and keep it on the token. Other spaces are generally set to match the after of the previous token. Include a short test of that in the unit test John Bauer 2025-05-29 15:48:15 -07:00
  • c59716b670 Read the enhanced graph in the CoNLLUReader John Bauer 2025-05-29 10:07:46 -07:00
  • 40cb460186 Oops, need to set the SentenceIndexAnnotation on the empty tokens John Bauer 2025-05-29 10:03:32 -07:00
  • 54041db865 Prebuild the IndexedWords when building the SemanticGraph. Will make it easier to build an enhanced graph, since we can easily reuse the index with the empty index from the enhanced column John Bauer 2025-05-28 23:39:31 -07:00
  • 75bec7c143 CoNLLUReader processes empty tokens and adds them to the Sentence CoreMap with the EmptyTokensAnnotation list. Add a check in the CoNLLUReaderITest that checks that a sentence with an empty token is properly read. John Bauer 2025-05-28 23:24:51 -07:00
  • 5c470ce3bf Refactor the code that processes one line, and change it to get the index directly from that line instead of counting the index for each line of the conllu sentence John Bauer 2025-05-28 22:57:01 -07:00
  • 86308933fc Remove from CoNLLUReader a method not used anywhere John Bauer 2025-05-28 21:00:05 -07:00
  • a9937e27c6 Make this test work on Windows John Bauer 2025-05-28 16:56:30 -07:00
  • 01302ee8bc Process SpaceAfter/SpacesAfter on only the last token of an MWT. All others are automatically set to '' John Bauer 2025-05-28 14:28:08 -07:00
  • b2a76458ab Process SpacesAfter as well as SpaceAfter in the CoNLLUReader John Bauer 2025-05-28 14:11:15 -07:00
  • 3d57346907 Add a comment on something wrong John Bauer 2025-05-28 14:04:01 -07:00
  • 2ddf9aaa0c Properly read XPOS in the CoNLLUReader John Bauer 2025-05-27 09:02:45 -07:00
  • 3996bacecd Update the CoNLLUReader test to read from the actual AnCora UD sentences, using expected gold values instead of reparsed values as the gold. This will let us test reading things the pipeline doesn't produce (such as UPOS vs XPOS or an enchanced graph) once those features are added John Bauer 2025-05-27 08:48:16 -07:00
  • 063595ab0f Use lineSeparator() for the end of line. Add several comments on things we could upgrade John Bauer 2025-05-26 01:35:18 -07:00
  • a88bc562fa Move the conllu reader example file to data (easier to edit / find for viewing) John Bauer 2025-05-25 23:01:55 -07:00
  • 3219884a58 Remove lucene from the core repo now that patterns is moved John Bauer 2025-05-25 02:09:14 -07:00
  • 39f2158c6c Remove the patterns directory from core (now moved, with its history, to CoreNLP-research) John Bauer 2025-05-25 00:16:22 -07:00
  • 3d6a8e29d7 Add another demonym from ParTUT to the lemmatizer John Bauer 2025-05-17 00:04:28 -07:00
  • 50fdd771d6 Add some comments on MergeNodes John Bauer 2025-05-16 23:27:27 -07:00
  • 0c4135390d Add a comment on something that needs fixing in the reader John Bauer 2025-05-12 13:44:40 -07:00
  • b7e302a4af Comment on a potential simplification John Bauer 2025-05-12 00:52:14 -07:00
  • c905429ee1 Process features in the CoNLLUReader. Also need to process xpos, keep the misc, and possibly build an extra semantic graph John Bauer 2025-05-12 12:03:41 -07:00
  • 259c2bfa5e Fix documentation bug John Bauer 2025-05-11 02:26:02 -07:00