DigDNA Genetic Algorithm

A Genetic Algorithm based on the digital fingerprint encoding.

Summary

Since decades, genetic algorithms have been used as an effective heuristic in order to solve optimisation problems. However, in order to be applied, genetic algorithms require a string-based genetic encoding of information, which severely limited their applicability when dealing with online accounts. Recently, a behavioural modelling technique inspired by biological DNA has been proposed – and successfully applied – for monitoring and detecting spambots in online social networks (OSN). In the so-called digital DNA representation, the behavioural lifetime of an OSN account is encoded as a sequence of characters, namely a digital DNA sequence. The combination of the digital DNA modelling technique and the evolutionary simulations of genetic algorithms open up the unprecedented opportunity to synthesise online accounts resembling the behaviour of genuine ones, thus escaping the most advanced detection techniques. The prototype presented here proposes an ad-hoc genetic algorithm for the synthesis of online accounts and it tests the capability of the synthetic accounts to disguise as real ones, achieving successful results.

This tool accompanies:

  • Stefano Tognazzi, Stefano Cresci, Marinella Petrocchi, Angelo Spognardi. “A Masked Ball: Synthesis and Robustness of SocialBots 2.0″ recently submitted to the International Conference on Knowledge Discovery and Data Mining (KDD 2018).

Code

DigDNA

Usage

The prototype features:

  • A compiled version of GLCR, a tool on which our algorithm relies on in order to compute the Longest Common String. The source of that tool is available here and its description can be found here.
  • A pre-generated LCS curve of the benchmark group of 100 genuine accounts discussed in the paper.
  • A compiled version of the code and the C++ source code of the algorithm described in the paper already set up with the settings used in the experimental evaluation.

In order to run the prototype just open the console and compile and run with the following lines:

  1.  g++ -c GANCkld.cpp
  2.  g++ -o GANCkld GANCkld.cpp
  3. ./GANCkld

Please note that the experiments presented in the paper were run on a server with a lot of resources and therefore it may perform in a very slow manner on a standard personal machine (desktop/laptop).

About

For suggestions, remarks, bugs or requests please do not hesitate to contact any of us.

  • stefano.tognazzi@imtlucca.it
  • stefano.cresci@iit.cnr.it
  • marinella.petrocchi@iit.cnr.it
  • spognardi@di.uniroma1.it