To integrate my code with the Cuckoo Sandbox I’ve created a command line interface and a simple way to invoking the clustering with all the configuration stored in a single file: conf/cuckooml.conf.

To do clustering with default parameters it’s enough to do:

python cuckoo.py --ml

You can also cluster from python interface with these two simple lines of code:

    from modules.cuckooml.cuckooml import init_cuckooml
    init_cuckooml()

These commands will produce csv files with all sort of statistics about clustering:

  • clustering results,
  • clustering fit,
  • label distribution among clusters,
  • samples that behave abnormally,
  • anomalies among samples, and
  • clustering of test samples.

A set of these files — for CuckooML run with the default configuration on this sample of malware reports — can be found here.