Revapi

Full-featured API checker for Java and beyond.

Revapi has a very simple architecture consisting of a simple linear pipeline that processes a "forest" (a set of trees) of individual API elements.

The API elements come from archives. As of now, the only possible source of archives is the file system (future versions of Revapi may introduce other implementations for different sources of archives).

A set of archives containing the old version of API is passed to an archive analyzer that produces a forest of API elements. Then archives of the new version are analyzed the same way.

The two API forests are then consistently sorted and traversed in parallel. New or missing elements are detected and equivalent elements are compared using an element analyzer, producing a set of reports that contain the differences found while comparing the elements. The differences are then simply reported.

The following diagram depicts the work flow graphically.

Each of the stages in that work flow is configurable. The configuration is supplied as JSON files, that are validated against JSON schemas that each extension can supply.

Extension points

The diagram above hints at several extension points available in Revapi.

Archives

An archive is a very simple abstraction over a file that represents the archive containing some API elements (like Java’s jar archive or a class file, an XML document, properties file, etc).

Revapi actually doesn’t provide any implementation of it on its own (only the standalone Revapi’s CLI contains a private implementation able to read from files) but it is trivial to implement one.

API Analyzer

An API analyzer (which is kinda implicit in the diagram) is the main interface for implementing API checks for custom "language". It provides and configures the analyzers of the archives and API difference analyzers both to be detailed below.

Archive Analyzer

An archive analyzer is instantiated and configured by the API analyzer to analyze archives of a version of API. It represents the results of the analysis as an element forest (i.e. a set of element trees).

Element Filter

An element filter can filter out elements from the element forest before they are passed further down the API analysis pipeline. The same set of element filters is applied to both the old API and new API element forests.

Difference Analyzer

The magic happens in the difference analyzers. Revapi simultaneously traverses the two element forests discovering new or removed elements and matching the comparable elements in them (using a co-iterator). It then passes the matched pairs to the difference analyzer that performs the actual analysis of changes and provides the reports summarizing them.

Reports

A report summarizes the differences found between 2 elements - one from the old API and the other from the new API (accounting for removals or additions by one of the pair being null).

In addition to the two elements in comparison, the report also contains the list of the differences the analyzer found between the two.

Each difference is identified by its code. The code is a textual ID of the difference that should be unique. In addition to that, the difference can also define a human readable name and description. The difference has a classification - a mapping between a compatibility type and the severity - basically saying how severe the difference is for given type of compatibility. In addition to that, a difference contains attachments which is a varying list of additional information about the difference.

Revapi recognizes 4 types of compatibility:

  • SOURCE - old and new API is source compatible if the code using the old API can be compiled against the new API without modification.

  • BINARY - old and new API is binary compatible if the code compiled against the old API can run using the new API with modification and error.

  • SEMANTIC - old and new API is semantically compatible if they behave the same

  • OTHER - other type of compatibility not captured by the previous three.

And here are the severities of differences:

  • BREAKING - the differences breaks the API compatibility (of given type)

  • POTENTIALLY_BREAKING - the difference may break the API compatibility (of given type) under some specific circumstances

  • NON_BREAKING - the difference doesn’t break the API

  • EQUIVALENT - "there is no change" - this is provided so that transforms and other tools can declare that certain changes are not even non-breaking - they are effectively non-existent.

Difference Transform

Once the differences are found they are supplied to the difference transforms. These extensions can, as the name suggests, transform the found differences into different ones or altogether remove them from the results.

Transformation Algorithm

As briefly explained above, Revapi compares element pairs from old and new API one at a time. For each element pair, a report is produced detailing all the found differences. Each such report is then processed by the transformers. Each transformer is given a chance to transform the differences from the original report and their intended changes are gathered. After the "round", the changes are applied to the list of differences for the element pair and all the transformers can again react on the new list of differences. This repeats until no further changes are made to the list by the transformers.

You can spot in the explanation above that there is a good chance for an infinite loop if two or more transformers form a "cycle", meaning that a difference that produced by one transformer is changed again into the original by a second transformer, which again is transformed by the first transformer, etc.

Revapi guards against this simply by doing at most 1 000 000 such iterations and then throwing an error.

Transformation Blocks

One thing was not explicitly mentioned in the basic description of the transformation algorithm. Transformations can be grouped into blocks that then act as a single transformation in the above algorithm.

What is this good for?

You can notice that it is hard (read impossible without transformation blocks) to "prepare" differences using one transform and then produce the final difference using a different transform.

As an example, let’s suppose that we would like to use Revapi for checking semantic versioning of our code but we would only like to base our semantic version on the binary compatibility of the code, disregarding any source or semantic incompatibilities.

Such thing would be impossible without transformation blocks because the transformation algorithm makes sure each transform sees all the differences and all changes to the original differences are transferred to the next "transformation round".

So, how would we use transformation blocks and how would we configure Revapi to only consider binary compatibility?

Let’s use Maven for our example:

<analysisConfiguration>
  <revapi.semver.ignore>
    <enabled>true</enabled>
  </revapi.semver.ignore>
  <revapi.reclassify>
    <item>
      <regex>true</regex>
      <code>.*</code>
      <classify>
        <SOURCE>EQUIVALENT</SOURCE>
        <SEMANTIC>EQUIVALENT</SEMANTIC>
        <OTHER>EQUIVALENT</OTHER>
      </classify>
    </item>
  </revapi.reclassify>
</analysisConfiguration>
<pipelineConfiguration>
  <transformBlocks>
    <block>
      <item>revapi.reclassify</item>
      <item>revapi.semver.ignore</item>
    </block>
  </transformBlocks>
</pipelineConfiguration>

What have we done there? The analysis configuration looks "normal". We enable the revapi.semver.ignore extension and leave it with the default configuration. We additionally configure revapi.reclassify to tone down any difference (with any code, by using .* as the regex to match any difference code) to EQUIVALENT, effectively "switching them off" for all compatibility types but BINARY.

The new thing is the pipelineConfiguration. This tells Revapi to group the two transforms together and consider them as one - the "output" difference of revapi.reclassify is used as "input" difference to revapi.semver.ignore and "output" of that is used for the reporting purposes. The important thing is that revapi.semver.ignore never sees the original differences as reported by the analyzer. It only ever sees the differences first transformed by revapi.reclassify.

Reporter

Finally, after the final set of differences is settled, it is passed to the reporters. These are responsible to report the found differences to the caller somehow (standard output, database, xml files, whatever one imagines).

Back to top

Msb3 Maven skin by Marek Romanowski.