Normalising variety of virus-total malware names

Week 2 has been a lot of hard work to transform mediocre VT names normalisation implemented in lib.cuckoo.common.virustotal into something that can be used as a labeller necessary for later malware clustering.

For starters, I decided to introduce 5 different categories that need to be revised before proceeding to the next step of the project, namely choosing a strategy to pick a representative set of labels from all normalised VT predictions for given binary. These 5 categories are:

  • Platform - an OS that the binary can harm;
  • CVE (Common Vulnerabilities and Exposures) - vulnerability that the malware is exploiting;
  • Meta-type - one of trojan (clicker, downloader, dropper, notifier, proxy, spyware, backdoor) and riskware (these types: adware, softwarebundler, hacktool, rogue or any other not important enough threat like: grayware, hktl, keygen, onlinegames, scareware, startpage, suspicious, unwanted);
  • Type - any of: adware, behavior, browsermodifier, constructor, ddos, dialer, dos, exploit, hacktool, joke, misleading, monitoringtool, program, pws, ransom, remoteaccess, riskware, rogue, rootkit, settingsmodifier, softwarebundler, spammer, spoofer, tool, trojan, clicker, downloader, dropper, notifier, proxy, spyware, backdoor, virtool, virus, worm;
  • Family - all the remaining tokens which are not blacklisted.

Normalisation results for some binaries

The following plots are (if available) included for each binary:

  • Platform - count-plot over all VT platform tokens;
  • CVE - count-plot over all VT CVE tokens;
  • Meta-type - count-plot over all VT meta-type tokens;
  • Type - count-plot over all VT type tokens;
  • Family - count-plot over all VT family tokens.

#20

Platform

#20 platform

Meta-type

#20 meta-type

Type

#20 type

Family

#20 family

#22

Platform

#22 platform

Meta-type

#22 meta-type

Type

#22 type

Family

#22 family

#97

Platform

#97 platform

Meta-type

#97 meta-type

Type

#97 type

Family

#97 family

#138

Platform

#138 platform

Meta-type

#138 meta-type

Type

#138 type

Family

#138 family

#149

Platform

#149 platform

CVE

#149 cve

Meta-type

#149 meta-type

Type

#149 type

Family

#149 family

#154

Platform

#154 platform

Meta-type

#154 meta-type

Type

#154 type

Family

#154 family

#172

Platform

#172 platform

Meta-type

#172 meta-type

Type

#172 type

Family

#172 family

#192

Platform

#192 platform

Meta-type

#192 meta-type

Type

#192 type

Family

#192 family