Installation

Prerequisites
Lavoisier, written in Java, can be deployed on every system, you just need to have JDK 1.6 installed on your system.
Download
Download Lavoisier from here.
Install

Unzip the downloaded file on your disk.

Notice: Be sure that Lavoisier can write in default "/tmp" directory of your system to write cache files. You can override the default value by editing file etc/engine/engine.properties: lavoisier.cache.directory = /my_path

Usage

Start the service
Start lavoisier by using script bin/lavoisier.sh or bin/lavoisier.bat:
bin/lavoisier console

Query data views

Open the example URL with any tool: cURL, wget or just click on the link in your browser.

Get data of the view named "example":

Notice: Clicking on this link will add ?accept=xml at the end of the URL, in order to prevent your browser from generating a web page with downloaded XML data. You can also see the XML data by displaying the source code of the generated web page.

Get the list of average values from view "example":

Get this list in JSON format:


List data views
You can see the list of configured data views by opening the console in your web browser: http://localhost:8080/lavoisier

Build your first data view

Edit configuration file

In Lavoisier, a data aggregation application is built by assembling plugins. This is done by editing the XML configuration files under directory:

etc/app/views/

This configuration file is structured as follow:

<views xmlns="http://software.in2p3.fr/lavoisier/application.xsd">
    <view name="my_data_view">
        <!-- plugins here -->
    </view>

    <!-- other data views here -->
</views>

Notice: Using an editor with at least XSD-based auto-completion and syntax-checking is highly recommended. See instruction about how to configure this with Jetbrains editor here

Notice: Testing configuration changes requires reloading the configuration. This can be done by invoking the reload operation http://localhost:8080/reload or by clicking on the "Reload" button of the web console of Lavoisier.

Notice: The full configuration example of this tutorial is available at the end of this page.


Data view with a single plugin

Create a data view named "profit":

<view name="profit"></view>

Add a HTTPConnector plugin into this data view to download XML data:

        <connector type="HTTPConnector">
            <parameter name="url">http://software.in2p3.fr/lavoisier/adaptors.xml</parameter>
        </connector>

url: The URL of the data to download.

Then you can query this data view:


Data view with a plugins chain

Change the parameter "url" of your connector to download CSV data:

        <connector type="HTTPConnector">
            <parameter name="url">http://software.in2p3.fr/lavoisier/input.csv</parameter>
        </connector>

url: The URL of the data to download.

Then your data view can not be queried anymore because its connector does not generate XML data.

You must add a CSVSerializer plugin to convert CSV input data to XML format:

        <serializer type="CSVSerializer">
            <parameter name="header">true</parameter>
        </serializer>

header: If set to true, then first row is a header.

Now you can query this data view (do not forget to reload the configuration!).

Output:

<tables title="profit" xmlns="http://software.in2p3.fr/lavoisier/tables.xsd">
    <table>
        <column_labels>
            <column_label>Month</column_label>
            <column_label>Sales</column_label>
            <column_label>Profit</column_label>
        </column_labels>
        <row>
            <column>March</column>
            <column>28</column>
            <column>89</column>
        </row>
        <!-- other rows... -->
    </table>
</tables>

Add a XML template to select only data of the column "Profit":

        <processors xmlns:e="http://software.in2p3.fr/lavoisier/entries.xsd" xmlns:ns="http://software.in2p3.fr/lavoisier/tables.xsd">
            <element in="ns:tables" out="e:entries">
                <element-ignore in="ns:table">
                    <element-ignore in="ns:row">
                        <element in="ns:column" if="@label='Profit'" out="e:entry">
                            <attribute-ignore in="label"/>
                        </element>
                    </element-ignore>
                </element-ignore>
            </element>
        </processors>

Notice: If the XML document has namespace(s), then this namespace must be mapped to a prefix, and this prefix must be used in your XPath. In our example, the node /tables (i.e. element "tables" with no namespace) does not exist, while node /ns:tables exists and belongs to the namespace "http://software.in2p3.fr/lavoisier/tables.xsd".

You can query this data view to see the result of selection.

Output:

<e:entries xmlns:e="http://software.in2p3.fr/lavoisier/entries.xsd">
    <e:entry>89</e:entry>
    <e:entry>1587</e:entry>
    <e:entry>529</e:entry>
    <e:entry>2103</e:entry>
    <e:entry>1985</e:entry>
    <e:entry>345</e:entry>
</e:entries>

Add a cache to optimize latency and availability of the data:

        <cache type="FileCache">
            <trigger type="ViewCreatedTrigger"/>
            <trigger type="DeltaTimeTrigger">
                <parameter name="hours">1</parameter>
            </trigger>
        </cache>

hours: The number of hours in period.

At least one trigger plugins must be declared inside the cache element. Trigger plugins used in this example are:
  • ViewCreatedTrigger: triggers cache refresh when view is created (i.e. at startup)
  • DeltaTimeTrigger: triggers cache refresh when time is elapsed
These plugins enable Lavoisier to know when to refresh this cache.

You can query the data view and notice that the result is still the same.


Dynamically evaluate some plugin parameters

Any parameter of any plugin can have its value dynamically set. This is done by adding an attribute "eval" with an expression written into the standard XPath language.

Replace the text value of the parameter "url" of your HTTPConnector, with an attribute "eval":

        <connector type="HTTPConnector">
            <parameter name="url" eval="concat(property('baseurl'), 'input.csv')"/>
        </connector>

url: The URL of the data to download.

This expression uses two XPath functions:

Set the property "baseurl" into the file etc/app/app.properties:

baseurl=http://software.in2p3.fr/lavoisier/

Notice: The pre-configured data view system-properties shows the list of non-hidden properties.

You can query the data view and notice that the result is still the same.

Join with a second data view

Build your second data view

Create a data view named "profit_invoices":

<view name="profit_invoices"></view>

Add a StringConnector plugin into this data view to put inline data in JSON format:

        <connector type="StringConnector">
            <parameter name="content">{"invoices":["128","243","187","306","247","203"]}</parameter>
        </connector>

content: The string value.

Your data view can not be queried yet because its connector does not generate XML data.

You must add a JSONSerializer plugin to convert JSON input data to XML format:

        <serializer type="JSONSerializer"/>

Then you can query this data view:

Output:

<object>
    <invoices>
        <item>128</item>
        <item>243</item>
        <item>187</item>
        <item>306</item>
        <item>247</item>
        <item>203</item>
    </invoices>
</object>

Add an element <processors> with the prefix/namespace associations that will be needed by the XML templates added in the 2 next steps of this tutorial:

<processors xmlns:e="http://software.in2p3.fr/lavoisier/entries.xsd" xmlns:ns="http://software.in2p3.fr/lavoisier/tables.xsd">
</processors>

Since the JSON hash-table contains a single entry ("invoices"), the generated root element "object" is useless and can be removed with a simple XML template:

            <element-ignore in="object">
                <element in="invoices"/>
            </element-ignore>

Output:

<invoices>
    <item>128</item>
    <item>243</item>
    <item>187</item>
    <item>306</item>
    <item>247</item>
    <item>203</item>
</invoices>

Replace the number of invoices by an element node containing this value, in order to avoid mixed content later (when adding child nodes to element "item"). Lets modify the previous XML template to do this:

            <element-ignore in="object">
                <element in="invoices">
                    <element in="item">
                        <element-create-as-parent out="invoices">
                            <text/>
                        </element-create-as-parent>
                    </element>
                </element>
            </element-ignore>

Output:

<invoices>
    <item>
        <invoices>128</invoices>
    </item>
    <item>
        <invoices>243</invoices>
    </item>
    <!-- other rows... -->
</invoices>

Join it with view "profit"

Add an element <processors></processors> that will contain the XML template below.

Data from view "profit" is retrieved with function view(), in order to be inserted in the current view with the template rule "attribute-create". This could be done by adding new XML template, or by modifying the previous XML template:

            <element-ignore in="object">
                <set variable="profit">view('profit')/e:entries/e:entry</set>
                <set variable="position">0</set>
                <element in="invoices">
                    <element in="item">
                        <set variable="position">$position + 1</set>
                        <attribute-create out="profit">$profit[$position]</attribute-create>
                        <element-create-as-parent out="invoices">
                            <text/>
                        </element-create-as-parent>
                    </element>
                </element>
            </element-ignore>

You can query this data view to see the result of join.

Output:

<invoices>
    <item profit="89">
        <invoices>128</invoices>
    </item>
    <item profit="1587">
        <invoices>243</invoices>
    </item>
    <!-- other rows... -->
</invoices>

Build a parameterizable data view

Define view parameters

Create a data view named "ratio":

<view name="ratio"></view>

Add arguments into this data view:

        <argument name="profit"/><argument name="invoices"/>

Add a XMLConnector plugin to create an XML element containing the ratio of these arguments:

        <connector type="XMLConnector">
            <parameter name="content" eval="new_element('ratio', 100 * $profit div $invoices)"/>
        </connector>

content: The XML data.


Invoke your parameterizable data view from a web client

You can query this data view from a web client:

Output:

<ratio>200.0</ratio>

Notice: See user documentation for other ways to send the argument values.

You can also query it from the automatically generated web form.


Invoke your parameterizable data view from view "profit_invoices"

Create a new XML template to insert, in each selected item, the ratio between the profit and the number of invoices:

            <element in="invoices">
                <element in="item">
                    <element-create>view('ratio', entries(entry('profit',../@profit), entry('invoices',../invoices/text())))</element-create>
                </element>
            </element>

The XPath function view() takes 2 arguments:

  • The first argument is the name of the invoked view ("ratio").
  • The second argument is a hash-table containing the values of the view arguments ("profit" and "invoices"). This map is filled by adding entries with function entry(key,value).

You can query the modified data view "profit_invoices" to check the result.

Output:

<invoices>
    <item profit="89">
        <invoices>128</invoices>
        <ratio>69.53125</ratio>
    </item>
    <item profit="1587">
        <invoices>243</invoices>
        <ratio>653.0864197530864</ratio>
    </item>
    <!-- other rows... -->
</invoices>

Rendering of data views

Default rendering

With your web browser, check some of the default renderings of data view "profit":

Now do the same with data view "profit_invoices":


Configure 2D-based rendering

Tree-based rendering (e.g. XML, JSON) always contains the full information since it has the same structure as the internal structure of data views (XML), while 2D-based rendering (e.g. HTML table, chart) often requires some configuration to better match user needs.

We want to do the following improvement to 2D based-rendering of data view "profit_invoices":

  • Set a better title to HTML and chart renderings
  • Set better labels to columns of HTML table and axes of chart rendering
  • Round values of "ratio" to display only 2 digits after the decimal point in HTML and chart renderings
  • Order rows of HTML table with descending "profit" values
  • Share the same Y-axe for "profit" and "ratio" in chart rendering

All these improvement are done by adding this at the end of the data view "profit_invoices":

        <pre-renderers>
            <title>'Profit / Number of invoices'</title>
            <row foreach="/invoices/item">
                <column unit="$" order="descending">@profit</column>
                <column unit="nb">invoices</column>
                <column unit="$" label="ratio">(round(ratio * 100) div 100)</column>
            </row>
        </pre-renderers>

foreach: An absolute XPath expression to select the rows.

column: A relative XPath expression to select the column.

unit: The data unit for the content of the column.

label: The label of the column (default label is the name of the selected node).

Now check again 2D-based rendering (e.g. HTML table, chart) of data view "profit_invoices" to see how it has been improved by this small configuration.


Override the default configuration of a renderer plugin
TODO
Configure document-based rendering
TODO

Full configuration of this tutorial

Data view "profit"
    <view name="profit">
        <connector type="HTTPConnector">
            <parameter name="url" eval="concat(property('baseurl'), 'input.csv')"/>
        </connector>
        <serializer type="CSVSerializer">
            <parameter name="header">true</parameter>
        </serializer>
        <processors xmlns:e="http://software.in2p3.fr/lavoisier/entries.xsd" xmlns:ns="http://software.in2p3.fr/lavoisier/tables.xsd">
            <element in="ns:tables" out="e:entries">
                <element-ignore in="ns:table">
                    <element-ignore in="ns:row">
                        <element in="ns:column" if="@label='Profit'" out="e:entry">
                            <attribute-ignore in="label"/>
                        </element>
                    </element-ignore>
                </element-ignore>
            </element>
        </processors>
        <cache type="FileCache">
            <trigger type="ViewCreatedTrigger"/>
            <trigger type="DeltaTimeTrigger">
                <parameter name="hours">1</parameter>
            </trigger>
        </cache>
        <renderers>
            <renderer type="HTMLRenderer">
                <parameter name="template">html/profit.html</parameter>
            </renderer>
        </renderers>
    </view>

Data view "profit_invoices"
    <view name="profit_invoices">
        <connector type="StringConnector">
            <parameter name="content">{"invoices":["128","243","187","306","247","203"]}</parameter>
        </connector>
        <serializer type="JSONSerializer"/>
        <processors xmlns:e="http://software.in2p3.fr/lavoisier/entries.xsd">
            <element-ignore in="object">
                <set variable="profit">view('profit')/e:entries/e:entry</set>
                <set variable="position">0</set>
                <element in="invoices">
                    <element in="item">
                        <set variable="position">$position + 1</set>
                        <attribute-create out="profit">$profit[$position]</attribute-create>
                        <element-create-as-parent out="invoices">
                            <text/>
                        </element-create-as-parent>
                    </element>
                </element>
            </element-ignore>
            <element in="invoices">
                <element in="item">
                    <element-create>view('ratio', entries(entry('profit',../@profit), entry('invoices',../invoices/text())))</element-create>
                </element>
            </element>
        </processors>
        <pre-renderers>
            <title>'Profit / Number of invoices'</title>
            <row foreach="/invoices/item">
                <column unit="$" order="descending">@profit</column>
                <column unit="nb">invoices</column>
                <column unit="$" label="ratio">(round(ratio * 100) div 100)</column>
            </row>
        </pre-renderers>
        <renderers>
            <renderer type="CsvRenderer">
                <parameter name="separator">;</parameter>
            </renderer>
        </renderers>
    </view>

Data view "ratio"
    <view name="ratio">
        <argument name="profit"/>
        <argument name="invoices"/>
        <connector type="XMLConnector">
            <parameter name="content" eval="new_element('ratio', 100 * $profit div $invoices)"/>
        </connector>
    </view>

Role "super-user"