Parses for Gutenberg Children corpus BERTRAM-weights-parses - MST-parses using word-pair weights extracted from BERT-like Neural Net. Different experiments capital-LGEnglish-noQuotes-manual - 200+ selected sentences manually parsed by linguist, using LG Parser version 5.5.1 tokenization capital-random - random parses generated from LG5.5.1/capital/parses capital-sequential - sequential parses generated from LG5.5.1/capital/parses ChildrenGutenberg_dotless_annotated_MP005W2 - Different MST-Parses of Adagram pre-disambiguated corpus with no dots (annotator params Min Prob=0.005, Window=2) GC_lgAll - MST-Parses with LG-ANY-all mode (since 2019-04 - not in use because of tokenization inconsistent with LG-English) GC_parses_win6_omdist - MST-Parses with clique-dist-win6, mst-dist modes (since 2019-04 - not in use because of tokenization inconsistent with LG-English) GC-lg-cleaned - MST-Parses of GC corpus tokenized by LG-English?.?.? (older than 5.5.1, tokenization difference less than 1% against 5.5.1) LG - LG-English parses with some old LG Parser version (since 2019-04 - not in use) LG5.5.1 - LG-English parses with LG Parser version 5.5.1 LG5.6.2 - LG-English parses with LG Parser version 5.6.2 GC_LGEnglish_noQuotes_fullyParsed_psg - Fully parsed sentences with no direct speech parsed with Anton's PSG parser GC_LGEnglish_noQuotes_fullyParsed_win6_omdist_parses - Fully parsed sentences with no direct speech parsed with MST-Parser R=6, Weight = 6/r, mst-dist = +1/r. Weights calculated only from lowercased version of http://langlearn.singularitynet.io/data/cleaned/English/Gutenberg-Children-Books/capital-LGEnglish-noQuotes-fullyParsed/GC_LGEnglish_noQuotes_fullyParsed.txt GC_psg - Parsed with Anton's PSG parser in non-incremental mode GC_psg_inc - Parsed with Anton's PSG parser in incremental mode test - test corpora for references For all MST-parsed files (also inside the folders above), the filename specifies the counting method used, as well as the mst-dist mode: - For lg-ANY, the suffix "lg" is used, followed by the number of parses used (or All if all parses returned by LG-ANY were used). E.g. EnglishPOC_disamb-parses-lgAll.ull - For clique methods, the suffix "win" is used, followed by the window size used. E.g. EnglishPOC_disamb-parses-win6.ull - If distance weighting was used, the suffix "dist" is added at the end, prefixed by "o" if the weighting was used during the observe (counting) phase, or "m" if during mst-parsing. E.g. EnglishPOC_disamb-parses-win6-omdist.ull