Saturday 6 May 2017

Sharp Tools for Java Refactoring: Byte Code Analysis

I'm currently refactoring parts of the CDK core classes. As part of this I often need to find specific patterns/idioms that need to be changed. Whilst source code inspections and an IDE can make this task easy sometimes the tools aren't quite sharp enough.

I needed to find all occurrences of a reference (instead of value) comparison on a particular class. In Java there is no operator overload and so you can have situations like this:
Integer a = new Integer(25);
Integer b = new Integer(25);
if (a == b) {} // false
if (a.equals(b)) {} // true
I mentioned operating overloading but it's more subtle and is more about comparing reference vs. value comparison. In C/C++ we can have similar behaviour:
int aval = 25, bval = 25;
int *a = &aval;
int *b = &bval;
if (a == b) {} // false
if (*a == *b) {} // true
Most IDE's and code inspection programs will warn about common occurrences (for example Integer) but I wanted to find places where the CDK's classes were used like this. A simple text grep will find some but will have false positives and negatives requiring lots of manual checking. Daniel suggested the well known FindBugs might be able help.

Rather than analyze source code like PMD and Checkstyle, FindBugs analyses Java byte code with a set of defaults detectors to find often subtle but critical mistakes. FindBugs can be configured with custom detectors (see here), however the inspection I needed (RC: Suspicious reference comparison to constant) was almost there. After digging around in the source code I found you can provide a list of custom classes to detect. However it took a bit of trial and error to get what I needed.

First up we turn off all inspections except for the one we're looking for (we need to fix many others reported but I was looking for something specific). To do this we create an XML config that will only run the specific inspection (RC for Reference Comparison):
findbugs-include.xml
<?xml version="1.0" encoding="UTF-8"?>
<FindBugsFilter>
  <Match>
    <Bug code="RC"/>
  </Match>
</FindBugsFilter>
We then run findbugs with this configuration and provide the frc.suspicious property.
Running findbugs
$> findbugs -textui \
            -include findbugs-include.xml \
            -property "frc.suspicious=org.openscience.cdk.interfaces.IAtom" \
            base/*/target/cdk-*-2.0-SNAPSHOT.jar 
This produces an accurate report of all the places the references are compared. Here's a sample:
H C RC: Suspicious comparison of org.openscience.cdk.interfaces.IAtom references in org.openscience.cdk.Bond.getOther(IAtom)  At Bond.java:[line 253]
H C RC: Suspicious comparison of org.openscience.cdk.interfaces.IAtom references in org.openscience.cdk.Bond.getConnectedAtom(IAtom)  At Bond.java:[line 265]
H C RC: Suspicious comparison of org.openscience.cdk.interfaces.IAtom references in org.openscience.cdk.Bond.getConnectedAtoms(IAtom)  At Bond.java:[line 281]
H C RC: Suspicious comparison of org.openscience.cdk.interfaces.IAtom references in org.openscience.cdk.Bond.contains(IAtom)  At Bond.java:[line 300]
...