Subtree Classifier Properties window
You can configure the Subtree Classifier properties with this window.
You need to retrain the project before any changes made to these settings can take affect.
- Text Filtering
-
This group has the following settings:
- Use digits
-
This setting controls whether the classifier uses digits as features or ignores them during text filtering. (Default: Cleared)
- Min. word length
-
All words that are shorter than this value are ignored during text filtering. Independently of word length, features with a very low or high frequency are also not taken into account. (Default: 3)
- Training
-
This group has the following settings:
- Max. number of features
-
Limits the maximum number of internally generated features per class. (Default: 5000)
- Min. feature length
-
Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the Min. word length. (Default: 3)
- Max. feature length
-
Specifies the maximum number of characters that are used for a feature. Should not be larger than 64 characters. (Default: 50)
- Automatic selection of Min. feature frequency
-
Enables the Min. feature frequency to be set automatically. If this setting is selected, you cannot manually assign a Min. feature frequency value. (Default: Cleared)
- Min. feature frequency
-
Specifies how often a substring is displayed inside the training set of a class to be used as a feature for content classification. (Default: 2)
- Start features at beginning of words
-
Specifies that a feature substring needs to start at the beginning of a word. If not checked, the substring can start anywhere. (Default: Selected)
- Max. words per feature (0-n)
-
Limits the number of words per feature. A value of zero means unlimited words, although the total number of characters of the words per feature cannot exceed the "Max. feature length" property. (Default: 2)
- Use fuzzy string match
-
Enables matching fuzziness with the disadvantage of slower classification performance. (Default: Cleared)
- Fuzzy length (5-10)
-
Configures the fuzzy string comparison. (Default: 5)
- Automatic selection of Min. class entropy
-
Enables the Min. class entropy to be set automatically. If this setting is selected, you cannot manually assigned a Min. class entropy value. (Default: Cleared)
- Min. class entropy (0.0 - 1.0)
-
Controls the importance of a feature, depending on the number of classes where it is displayed. A value of 1.0 requires that a feature is displayed only inside the sample documents of a single class; otherwise, it is not used for classification. The lower the value, the more classes can contain the feature inside the training set. (Default: 0.600)
The following buttons are available at the bottom of this window:
Button |
Description |
---|---|
OK |
Closes the window and saves your changes. |
Cancel |
Closes the window without saving your changes. |
Apply |
Applies your changes without closing the window. |
Related topics: