Finding duplicate BlitzMax code with simian
I’ve recently started using simian to track down duplicate code blocks in my BlitzMax projects. Although simian doesn’t come with BlitzMax support built-in, it’s possible to use some of the other settings to work with bmx code.
Examples
I have java -jar simian.jar
aliased to just simian
so it’s easier to read.
An example run looks a little like this (scanning pangolin/contentdb):
$ simian -excludes="**/*.tests.bmx" -language=vb -threshold=5 **/*.bmx
Similarity Analyser 2.5.10 - http://www.harukizaemon.com/simian
Copyright (c) 2003-2018 Simon Harris. All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
{failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, language=VB, threshold=5}
Found 5 duplicate lines with fingerprint 58349a293c079cddfe30357e667a5aa4 in the following files:
Between lines 193 and 206 in pangolin.mod/contentdb.mod/src/entity_template.bmx
Between lines 151 and 163 in pangolin.mod/contentdb.mod/src/component_schema.bmx
Found 7 duplicate lines with fingerprint 285244b124a3a29e5c65d4106d0aba7d in the following files:
Between lines 57 and 70 in pangolin.mod/contentdb.mod/src/component_schema.bmx
Between lines 81 and 96 in pangolin.mod/contentdb.mod/src/component_field.bmx
Found 24 duplicate lines in 4 blocks in 3 files
Processed a total of 748 significant (1598 raw) lines in 9 files
Processing time: 0.099sec
Analysis is quick; the entire pangolin project takes under a second to check.
Treating BlitzMax code as plaintext
By default simian treats files with unknown extensions as plaintext.
The following will find files in the “src” directory that have more than 6 lines duplicated:
simian src/**/*.bmx
Ignoring BlitzMax comments
Plaintext is enough for most use cases, but it also includes comments in duplication checks. To exclude comments, we can set the language to Visual Basic as it uses the same '
comment syntax as BlitzMax.
The adjusted version will now ignore duplicated comments:
simian -language=vb src/**/*.bmx
Adjusting the threshold
The -threshold
parameter can be used to adjust the number of copied lines that trigger a copy warning.
The following will trigger a warning if 5 lines or more match in separate files:
simian -language=vb -threshold=5 src/**/*.bmx