Sunday, August 30, 2009

Selecting A Coverage Tool

At work, we use Clover to track coverage.

For programming at home, a Clover license is way too expensive (not as bad as those software vendors that used to not list prices and instead said "Call for quote", but close).

Also, we've had some issues with Clover:
  1. It sometimes falsely claims that code is covered when it isn't.
  2. It doesn't support branch coverage.
  3. It can't instrument assignments in conditionals.
(The Clover rep confirmed the above problems a couple of years ago. These may have been fixed by now. However, because the licenses are so expensive, we haven't upgraded, so we're stuck with the problems even if they're fixed in the current version. Also, the Clover rep didn't think the third issue would ever be fixed due to the way they instrument the code.)

For a simple example of problem #2, start with a function like this:
public boolean eval(boolean x, boolean y, boolean z)
{
return x && (y || z);
}
Clover scores 100% coverage if the entire expression evaluates at least once to true and at least once to false, so this is sufficient:
assertTrue(eval(true, true, false));
assertFalse(eval(true, false, false));
But evaluating all of the meaningfully distinct ways for the expression to evaluate requires more tests:
assertTrue(eval(true, true, false));
assertTrue(eval(true, false, true));
assertFalse(eval(true, false, false));
assertFalse(eval(false, true, false));
Problem #3 means that this code can't be covered:
while (currentLine = stream.readLine() != null)
Covering that code with Clover requires rewriting it as a do/while.

For the above reasons, I went in search of a coverage tool other than Clover for programming at home.

Googling around, EMMA and EclEmma (an Eclipse plugin for EMMA) kept showing up, so I tried them.

Unfortunately, where Clover incorrectly says that uncovered code is covered, EMMA incorrectly says that covered code is not covered:



True, it only screws up occasionally, but if you're five lines short of 100% coverage, and the five lines are spurious tool errors, it's annoying.

This was unfortunate, because the EclEmma plugin is very fast and completely non-intrusive, and would be great to use. Plus, it's currently the only coverage tool that works with Android.

After reporting these problems, I looked around for alternatives. A bunch of coverage tools were listed here, and I tried each of them out. Most didn't install, or weren't compatible with the latest Eclipse, or hadn't been maintained for a number of years, etc.

But Cobertura works really well.

Unfortunately, there's no Eclipse plugin for Cobertura, and it's slower than the other two tools, but compared to getting the wrong answer, that's not too much to give up. The dream tool would combine Cobertura's accuracy with EclEmma's speed and usability, but no such tool exists.

This table compares the three tools:
















































CloverEMMA/EclEmmaCobertura
Statement coverageyesyesyes
Branch coveragenonoyes
No quirks or bugsnonoyes
Fastyesyesno
Reportsyesyesyes
Maven integrationyesyessort of
Works with mocking toolsyesyesyes
Eclipse pluginyesyessort of

Freenoyesyes
Active communityyesyessort of



Notes:
  • Maven integration for Cobertura is provided by a separate open-source project. This project has been around for a while. There are some problems using the integration.
  • Eclipse integration for Cobertura is provided by a separate open-source project. This project is new and not very mature yet.
  • Cobertura's community support is rated "sort of" because the only forum is via an email distribution list, and responses can take a couple of days. On the other hand, the code is mature and easy to use, so there isn't much need for help (most of the emails I sent were questions about setting up the ant scripts).
  • JMockit can be configured to generate coverage reports, but currently only statement covered is supported. The author plans to add additional features (including branch coverage), at which point it should be evaluated like the other three tools.
  • For a comparison of coverage tools that arrives at the opposite conclusion to mine (partly due to other requirements), see http://javapulse.net/2008/09/02/coverage-emma-cobertura-maven.

Tuesday, August 25, 2009

Eclipse Plugins For Java Dependency Analysis

Once I got our project's code imported into Eclipse in a usable way, the real work started.

The plan is to upgrade the code to use newer APIs and services. To do that, we need a way to analyze the dependencies, to identify cycles, determine the easiest places to refactor, etc.

A web search found a useful evaluation of various dependency-analyzers for Eclipse.

I tried all of the tools listed in the evaluation, and liked STAN the best.

STAN installs easily, is intuitive to use, and doesn't choke on our million-plus lines of code (after configuring it to analyze class-to-class dependencies instead of at the method level). Plus, questions emailed to STAN are answered quickly--the support is good. We bought a license.

But STAN is a commercial product, and I also needed something to use at home. Something inexpensive.

That's when I noticed a comment at the bottom of the evaluation. (It had been there all the time, but who reads comments, right?)

The comment recommends CAP, and it's a good recommendation. CAP is similar to STAN in terms of ease of installation and use, it doesn't choke on our million-line project, and it's faster than STAN and uses less memory. Plus it's free, which is hard to beat.

However, CAP hasn't been upgraded for a while, and the author is intermittently difficult to contact. Also, STAN has a better display of rolled-up package dependencies (for example, if you have com.abc and com.abc.def, you can see dependencies on com.abc, or com.abc.def, or com.abc*, but in CAP you can only see com.abc or com.abc.def individually).

STAN doesn't offer floating licenses, which makes it pretty expensive if more than a couple engineers will be using it. Dependency analysis is something an engineer might do while learning or refactoring a code base, and then not do again for months, so floating licenses would make sense.

Both tools are good. STAN's package roll-ups are really useful when you have a lot of subpackages. CAP's price is hard to beat.

If you do use CAP, please support open source and send the author a donation. I was the first person to do that, which is kind of a shame.

Update: See this post for another good Java dependency analyzer.

Sunday, August 23, 2009

Using Eclipse With Large Code Bases, Part IV

Previous posts described how we created a set of Eclipse projects that break up our million-line, single-rooted source tree into manageable chunks.

Today, I'll describe how we solved a similar problem with our runtime classes:

5. JAXB is used to generate some .class files into a runtime directory "rt", but that same directory contains all of the .class files for the system

Some details on the situation:
  • Our project currently uses gmake to compile the million-plus lines of code.
  • Builds output to a runtime directory on the developer's machine called "rt". All classes required to launch the application are either in rt, or in JARs in a "lib" directory.
  • Many of the classes in the rt directory are duplicates of classes Eclipse compiles in the projects, but some of the classes in the rt directory are generated by JAXB (which is executed by gmake as part of the builds), and are not in the source tree.
  • Eclipse needs to see the generated classes in order to compile.
For example:
package com.parent.child1;

import com.parent.Parent;

import com.parent.child1.generated.Gen1;

public class Child1 extends Parent
{
private Gen1 gen1;
}
Eclipse supports linked resources, so at first it seemed like we just needed to define a linked resource for external classes that pointed to rt. Unfortunately, there are so many classes in rt that Eclipse again ground to a halt.

Fortunately, by now we were experienced with using links to subset a source directory hierarchy, so I just used the same approach to subset the runtime directory hierarchy.

First, run commands to add a link into the rt directory at the desired location:

mkdir C:\projects\Child1\rt\com\parent\child1\generated
junction C:\projects\Child1\rt\com\parent\child1\generated C:\rt\com\parent\child1\generated


Then add the rt directory to the .classpath:
<classpathentry kind="lib" path="rt"/>
Refresh in Eclipse, and the code builds:



Notes:
  • As was the case with source files, multiple links to runtime-class directories can be created for a single project.
  • If multiple projects need the same .class files from an rt directory, the rt link should be set only in the project that generates the .class files. Other projects should point to that rt link via Build Path → Add Class Folder...

In Part V, I'll describe how we fixed a problem with the Perforce Eclipse plugin caused by using links.

Saturday, August 22, 2009

Using Eclipse With Large Code Bases, Part III

In Part II, we saw how to use links to break a large single-rooted source tree into separate Eclipse projects.

In those examples, the packages didn't have mutual dependencies--they were a directed, acyclic graph.

But in my project's legacy code base, there is another complication:

4. Packages and layers have mutual dependencies (for example, business logic in the UI layer), but Eclipse treats cycles among projects as compile errors

Of course, it would be better not to have cycles, but in a legacy code base it's not always easy, or even tractable, to remove them. Remember that my project has to work with this constraint:

6. None of this can be changed, at least not any time soon

So, a more realistic example has child1 depend on child2, and vice-versa:
package com.parent.child1;

import com.parent.Parent;
import com.parent.child2.Child2;

public class Child1 extends Parent
{
private Child2 child2;
}

package com.parent.child2;

import com.parent.Parent;
import com.parent.child1.Child1;

public class Child2 extends Parent
{
private Child1 child1;
}
This is allowed by Java, but not allowed (by default) by Eclipse for packages in separate projects, even after adding the project dependencies:



Fortunately, Eclipse allows cycles to be a warning instead of an error:


Unfortunately, that just converts the errors into warnings, which clutter the window (we already know we have cycles):


Fortunately, Eclipse supports filtering out specific warnings. In the Problems window, click the down-arrow icon on the right (View Menu), select Configure Contents..., and add a filter:


And now the Problems window is empty.

Part IV describes how we fixed this remaining issue:

5. JAXB is used to generate some .class files into a runtime directory "rt", but that same directory contains all of the .class files for the system

Friday, August 21, 2009

Using Eclipse With Large Code Bases, Part II

In Part I, linked source with excludes failed to solve the problem of how to break up a large, single-rooted source tree into multiple projects in Eclipse.

After a night off, I thought of using file-system links to create the illusion of multiple directory roots. But we needed directory-to-directory links, not file-to-file links, and Windows doesn't directly support those.

A coworker found the Windows-specific "junction" command, which can be downloaded from Microsoft. It's an add-on to Windows, not part of the standard set of shell commands. With the junction command, I was able to create a multi-rooted source tree that points to the source it needs from the single-rooted source tree.

For example, starting with:
C:\src\
com\
parent\
Parent.java
child1\
Child1.java
child2\
Child.java
Create a parallel structure:
C:\projects\
Child1\
.classpath
.project
src\
Child2\
.classpath
.project
src\
The .project files don't have linked-source directives in them, and the .classpath files just have the standard <classpathentry including="**/*.java" kind="src" path="src"/> entries.

From a command prompt, execute commands to create links from the project src directories into the real source tree:


mkdir C:\projects\Child1\src\com\parent\child1
junction C:\projects\Child1\src\com\parent\child1 C:\src\com\parent\child1

mkdir C:\projects\Child2\src\com\parent\child2

junction C:\projects\Child2\src\com\parent\child2 C:\src\com\parent\child2


(The commands for creating directory and file links in linux/Mac are of course different, but the concepts are the same.)

The resulting directory structure looks like this:
C:\projects\
Child1\
.classpath
.project
src\
com\
parent\
child1 → linked to
C:\src\com\parent\child1
Child2\
.classpath
.project
src\
com\
parent\
child2 → linked to
C:\src\com\parent\child2
After refreshing the projects in Eclipse, the unwanted packages and source files are gone:


Unfortunately, the code doesn't compile:


Remember the third item in the list of problems?:

3. Some source files used throughout the code are located in the top of the source tree

It has come back to haunt us. We have to have visibility to Parent.java in both projects, but we can't link to the root of the source tree, because that's the problem we're trying to solve with links.

To fix this, create another project, Parent, but use a file link instead of a directory link:

mkdir C:\projects\Parent\src\com\parent
fsutil hardlink create C:\projects\Parent\src\com\parent\Parent.java C:\src\com\parent\Parent.java


Then add the Parent project to the dependencies of Child1 and Child2, and now it does compile:


This approach works very well--the entire million-plus lines of code is broken up into 40+ projects in Eclipse, and the code compiles quickly after the initial import.

You can envision an Eclipse plugin that would semi-automate this process. At a minimum it would be nice to generate the projects and link scripts from some kind of description, instead of editing the files by hand. Unfortunately, by the time I had worked out the pattern, the projects and links were mostly already finished.

Note: Although it's not shown in the examples above, this approach can also be used to link to multiple child nodes in a source tree to produce a combined tree for a project. For example:

mkdir C:\projects\Child1\src\com\parent\child1
junction C:\projects\Child1\src\com\parent\child1 C:\src\com\parent\child1


mkdir C:\projects\Child1\src\com\parent\otherChild
junction C:\projects\Child1\src\com\parent\otherChild C:\src\com\parent\otherChild


In Part III, I'll describe how we dealt with this issue:

4. Packages and layers have mutual dependencies (for example, business logic in the UI layer), but Eclipse treats cycles among projects as compile errors

Thursday, August 20, 2009

Using Eclipse With Large Code Bases, Part I

The project I'm currently working on has more than a million lines of source code. Some of the code was written as long ago as 1998, so as odd as it sounds to call anything involving Java "legacy", this is a legacy Java codebase.

I wanted to bring the code into Eclipse, but not as one giant million-line project. Instead, I wanted to break it up into smaller projects.

But there were complications:
  1. The source tree has a single root directory
  2. Eclipse can't nest projects
  3. Some source files used throughout the code are located in the top of the source tree
  4. Packages and layers have mutual dependencies (for example, business logic in the UI layer), but Eclipse treats cycles among projects as compile errors
  5. JAXB is used to generate some .class files into a runtime directory "rt", but that same directory contains all of the .class files for the system
  6. None of this can be changed, at least not any time soon
Items #1 and #2 mean that the Eclipse projects have to be located outside the source tree, and point to source in the source tree.

Fortunately, Eclipse supports linked source, so I cre
ated the Eclipse projects in a different location, and set their build paths to have linked-source entries that pointed to the source tree.

The first step is to create a global linked-resource variable that points to the root of the source tree:


Then right-click on each project and select
Build Path → Configure Build Path... Link Source... → Variables..., and select the global linked-resource variable.

Saving the changes results in a .classpath entry like this:
<classpathentry kind="src" path="src"/>
and a .project entry like this:
<linkedResources>
<link>
<name>src</name>
<type>2</type>
<locationURI>src</locationURI>
</link>
</linkedResources>
for each project.

(
Because I had so many projects to manage, I edited the .project files directly, instead of interactively.)

Unfortunately, item #1 complicated linking to the source, because every Eclipse project linked to the same root directory, which meant every Eclipse project saw the same source instead of just seeing the source for that project.

For example, if the source tree looks like:
C:\src\
com\
parent\
Parent.java
child1\
Child1.java
child2\
Child.java
and we want two Eclipse projects, "Child1" and "Child2", they both have to start their source trees at C:\src. So Child1 sees Child2's code in the child2\ directory, and Child2 sees Child1's code in the child1\ directory:


Fortunately, Eclipse supports source exclusion. To exclude source, right-click on a project and select Build Path → Configure Build Path... Source, Excluded Edit... Exclusion Patterns: Add..., and add every package and/or file you don't want included.

This modifies the .classpath files to have entries like:
<classpathentry kind="src" path="src"
excluding="com/parent/child1/"/>
Using this approach, I was able to exclude unwanted packages from each project. Because there are a lot of packages, this was tedious and took hours, but it worked:


Unfortunately, once everything was configured and I launched Eclipse, it took 35 minutes to load.

35. Minutes.

It turned out that Eclipse bogs down if it has to import a lot of source code and then filter it out. Ideally it would filter it out first and only load the remainder, but it doesn't seem to do that.

After reporting this problem
, I switched to plan B, which is described in Part II.