Gut_16_10_xx_text.txt:
- Gut_16_10: Gutenberg corpus, sentences 16+ words long, word with frequencies 10+
- xx = 10,20,30...90: number of KMeans clusters
- text:
  - clusters_overview
  - cluster_top_words
  - cluster_words -- /detailed_data/ dir
  - cluster_similarities - cosine similarities to cluster centroids for words in ...cluster_words

/detailed_data/ ...cluster_words, cluster_similarities

/cluster_words+weighted_similarity/... = |-delimited cluster words \t SWF/Lexicon \t SWF/Cluster
 - SWF/Lexicon - similaruty weigthed by word frequency within lexicon
 - SWF/Lexicon - similaruty weigthed by word frequency within cluster

/shorter_lexicons/ - lexicons pruned based on word and word pair frequencies.
- objective: find compact lexicons with better "natural" clustering