Picking the right label

Now that (more or less) accurate labelling has been implemented and a mechanism for picking a malware label out of all VT predictions fixed (majority class of family with at least 5 predictions) it is possible to produce some statistics about the initial dataset.

200 binaries dataset

Here are statistics for the labels of 200 binaries for each of the 5 labelling categories.

Platform

Platform

Meta-type

Meta-type

Type

Type

Family

Family

CVE

CVE