Download the latest version here.
This implementation t_MS4 is a beta release of the MS4
algorithm. t_MS4 uses previous development : NLD-decoding to find NLD
classes (C part of the code, included) and altgraph to manage trees. It requires Python 2.
MS4 is a method that selects among all the segments of similarity detected by the N-local decoding algorithm (Didier et al., 2007) those on which a classification of unaligned set of sequences is based. The N-local decoding detects local similarity of size 2N-1 containing a variable number of mismatch and it has been proved to be successful for alignment-free classification for fixed value of N. The aim of this method is to automatically adapt N (the size of the similarities detected) to the local context and the data-set under consideration. Then, it computes a dissimilarity matrix based on these detected similarities for classifying sequences. For low values of N, similarities are spurious and many hits occur inside one sequence. For large values of N, similarities are exact words shared by no more than 2 sequences. MS4 fixes N as the average number of occurrences per sequence that are smaller than a given parameter Kappa (Corel et al., 2010).