Code Generation

Model-driven engineering (MDE) or Model-driven software engineering (MDSE) is a software development methodology  which focuses on creating and using domain models. Domain models are abstract representations of a excerpt of the real world. They contain and organize the knowledge and activities that govern a particular application domain,

Code Generation is often seen as a minor step in model driven software development. The following presentation of the Code Generation 2014 conference gives a good example for a sophisticated code generation environment. This code generation functionality is part of our MetaModules language workbench.

(To watch the video in full HD resolution go to the Vimeo-Web-Site and switch to fullscreen)

Transcript

Welcome to our presentation about Model Driven Software Development for Source Code Generation.

My name is Jürgen Mutschall and I am the development lead of the MetaModules Language Workbench.

This was a short presentation at the Language Workbench Challenge during the Code Generation conference in Cambridge, April 2014.

I would like to show, that it is possible to fully-automatically generate model-based relational database structures, which are efficient for complex model structures and a huge amount of data.

Model driven software development is split in 3 phases, which can be reiterated again and again to create the final software solution:

In the first phase you use text editors, graphical editors (like UML editors) and other import mechanisms to collect model information.

In the second phase you transform between models and validate the models against given constrains.

In the third phase you generate output (export data from the model repository). This can be source code, configuration information, descriptions and reports.

In this presentation I will focus on the generation of source code for JEE persistence through Hibernate.

But, let’s hold on for a second. What are the challenges of source code generation? Every developer knows a little bit about template based generation, using prepared text files with some “holes” which are completed by the generator during the processing.

But if you have a complex model structure with references, tags, associations, inheritance, …  it becomes difficult to generate the output by a stateless model visitor.

Beside this you may want to generate complex code structures for different, independent aspects combined in one target source code object, e.g. persistence, web services, …

Maybe you want to generate separated code for hidden, system managed code parts and managed user code, which can be extended by the developer but is controlled by the runtime system.

And in the end, the resulting code should be readable and should provide maximum performance.

The “Generation Gap”-Pattern is an often used mechanism for combining generated code and user managed code that is provided by the developer. This pattern is too simple for huge software solutions with a lot of independent different generation aspects that must be coordinated by the code generator and the runtime environment.

We would like to propose to use an “aspect pattern” for every generation aspect and a generated controller structure of classes which encapsulates and controls the source code which is provided by developer. The controller classes also mangage the delegates from the modeled class contract to the aspect specific classes. This even allows generating manageable code for target platforms which does not support features like multiple inheritance and dynamic dispatch.

As an example generator domain we have chosen the domain of relation database persistence in combination with an Object-Relational-Mapper, in our case Hibernate.

We want to generate a fully-functional database from a arbitrary complex UML class diagram.

Using the JPA mechanism, provided by Hibernate, we have to struggle with the difficulties of JPA based Persistence:

  • The impedance mismatch between the object-oriented model and the relational model, which is not hidden by the Hibernate layer.
  • The poor and incomplete low-level annotation mechanism of JPA.
  • Using JPA “right”. Naive use of JPA leads to poor scalability and performance.

Our main goal is a high-performance database with no manual “tuning”.

As a showcase for the capability of the source generation module and the performance of the resulting system we selected a persistent repository of Java source code.

The input for the generator is a complete model of the Java syntax, consisting of hundreds of entities and relations. This model we derived from the Eclipse JDT EMF Java model.

The target system is a small notebook with a java based, non-high-performance, relation database, Derby and a JBOSS application server.

To show the performance of the demo solution we imported the complete Eclipse Kepler source code into the generated code repository (34.000 Java classes). The client for the demo repository is a small web application that allows navigating the source code graph and is able to load incrementally parts of the source code.

I will show that we can use standard JPQL mechanisms to retrieve and refactor selectively parts of the whole model (30 Mio. objects) with a good performance.

Let’s start with the preparation of the model: We used emf2UML to generate the UML model for the given JDT Java model and added some OO features which are not supported by the EMF metamodel.

We are using the Eclipse workbench and here we see the JDT ecore file with the JavaAST model. It contains the primitive types and the structure of the complete Java syntax. You see, that a lot of inheritance is used, e.g. for Expressions and statements. Here we see, e.g. the nested structure of the Java “switch” construct with the contained expressions.

Here we see the derived UML model, which contains 150 assocations, 90, classes, 94 generalizations.

The UML model contains all the information of the EMF based ecore-Model and adds some additional hints. Again, you can see the complex inheritance relationships and the containment hierarchy according to the Java syntax definition.

Of course, all the needed primitive types of UML and the java language library are defined.

Primarily, UML is a graphical notation language. Of course you can use your preferred, graphical UML editor to edit the UML model. For this presentation we used the TopCased UML system.

Here you can see our changes of the derived UML model. For scalability and performance reasons we have “interrupted” the Expression and Statement inheritance hierarchy by some interfaces.

And you can use the editor to show additional aspects of the underlying model or modify the UML model.

Next we would like show the general performance of the generator.

For this, we delete the generated code, clean the project and generate everything from scratch.

This takes some seconds, but during the generation a LOT of code is generated (~ 50.000 LOC). After the generation the Java Code is automatically updated and compiled.

Here you can see the generated code. A lot of code.

And if we have a close look to the generated code, you can see that it is readable and looks quite similar to manual written code. E.g. the files with the prefix JPA are the aspects for the persistence management.

If we look at the managed user code, which is generated to the src folder instead of the src-gen folder, you see nothing of the hidden, complex code hierarchy.  You see a quite simple POJO which implements some contract interfaces.

Now the developer can start to override some of the default implementations or add some custom functionality to the POJO, independently of the persistence aspect. As you can see, the associations are completely hidden in the generated contracts and implementation, but can be overridden by the developer.

The demo web application is a simple standard JSF based web browser application, which shows a 2 level list. We can page through all entity types in the database and concurrently the web application counts the entities in the database. By clicking on an entity instance, the application retrieves the entity instance and the children of the instance, reconstructs the Java AST, reconstructs the Java Source Code, formats the code, sends the code the client web browser and the code is highlighted by a javascript syntax highlighter.

Now we start the JBOSS application server. The server is initialized, Hibernate is started, Hibernate initializes and checks the table structure and the server is ready. Please keep in mind that we have imported upfront 34.000 Java classes as source code into the database, about 20 GB data.

We start a standard web browser. The web browser connects to the server and shows the first page of all entity types in an alphabetic sequence. Immediately the web application retrieves the count of the entities in the database. Here you see, that there are, e.g. more than 13000 anonymous class declarations in the Eclipse Kepler source code.

We page through the list of entity types, e.g.  there are more than 150000 casts in the source code. Now we click on the CastExpression link and a detailed browser for the 150000 CastExpressionEntities is opened. Each entity instance is identified by a UUID.

We can page through the list of entities. By selecting one of the entities, the web application retrieves the JavaAST fragment from the database, reconstructs the Cast-Expression with all containing elements like Literals and generates source code which is sent to the browser.

Here you can see the CastExpression source code, in this simple case, consisting of 5 entities and on the top of the screen you see the hierarchy of this code fragment in the containing compilation unit.

We can go up the containment hierarchy and the application incrementally retrieves more and more of the Java source code. E.g. the type declaration consisting of 631 entities. Part of the antlr parser generator.

Let’s go back to browser. We page through some other entity types, select the InstanceOfExpression. Here we see a type declaration and it takes some time to retrieve the whole JavaAST because it contains a lot of references to other Java declarations. This are more than 7000 nested  entities.

As the last part of the presentation, we would like to show that you can use standard JPQL, the SQL-like query language of JPA, to retrieve information and can do refactoring. In this example we show a renaming of methods.

Here we see a Java JUnit test, which calls a rename method on the Java repository service. If we run this test, it connects to the server, retrieves all declarations and uses of methods named “getName” and replaces them with “getMyName”, committing in a standard database transaction.

As you can see, this takes not very long. The test found about 19000 occurrences in the whole Eclipse Kepler source code, replaces them and committed the transaction. It took 5 seconds on our small notebook.

Let’s have a look at the implementation of the method “renameMethods” of the service. It is quite simple. We create a query, selecting the names of MethodInvocations and MethodDeclarations, which match the given name and replaces the Identifier in the SimpleName. That’s all.

All other typical operations can be implemented by some simple lines of JPQL and Java.

This is the end of my presentation. Thank you very much for your attention.

If you have any questions, please contact me by email.