

Starting in PolyAnalyst 6.0.950, you now have access to a new node named Categorize Binaries. The node provides a method for deriving a compact representation of hundreds of multiple choice variables. The node operates by converting a large set of binary variables into a small number of categorical variables.
For example, suppose you have a dataset of sales transactions involving hundreds of products. Suppose the dataset is designed so that each separate product is represented in a separate column, where a product column is true if the product was present in a sales transaction. Let us take a popular product line like names of beer (Pilzner, Carlsberg, Guinness, etc).
Customer Id | Bought Pilzner? | Bought Carlsberg? | Bought Guinness? |
---|---|---|---|
1 | Yes | No | No |
2 | No | No | Yes |
3 | No | Yes | No |
For most of the transactions only one brand is selected. In other words, most of the binary product dataset attributes are false. The Categorize Binaries node may be relevant in such a situation in providing a more compact representation of the dataset so that other nodes may use the compact representation more successfully. The node could generate a dataset similar in nature to the input dataset of transactions where all of the product variables are stored in a small number of categorical variables like Choice 1, Choice 2, Choice 3.
Customer Id | Choice 1 |
---|---|
1 | Pilzner |
2 | Guinness |
3 | Carlsberg |
Choice 1 would represent the favorite brand, choice 2 the second favorite, and so on. The only necessary parameter is the desired maximum number of customer preferences.