Labelling statistics
Picking the right label
Now that (more or less) accurate labelling has been implemented and a mechanism for picking a malware label out of all VT predictions fixed (majority class of family with at least 5 predictions) it is possible to produce some statistics about the initial dataset.
200 binaries dataset
Here are statistics for the labels of 200 binaries for each of the 5 labelling categories.