Spring Web Service (2.0) using xmlbeans as XML marshaling

When I googled “spring webservice xmlbean”, I found this link that provides an example to use xmlbeans as xml marshalling when implementing a spring web service.

The example is based on Spring web service version older than 2 (spring ws 1.5.6 with spring 2.5.6), while now (at the time of writing this post) Spring is now on version 3 and Spring-WS on version 2. When I tried the example, I can not make it work straight away on the new Spring/Spring-WS version due to some changed java classes and/or configurations.

The latest Spring-WS Tutorial does provide an example on how to develop web services using the latest spring-ws version (version 2). However, the simple example does not cover XML marshaling (e.g. xmlbeans).

So the purpose of this post is to provide an example that use spring-ws version 2 to implement web service with xmlbeans as XML marshaling. This example is based on Developing Spring Web Services with XML Marschalling – XMLBeans example

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
	attributeFormDefault="unqualified" elementFormDefault="qualified"
	<xs:element name="GetTemperaturesRequest">
				<xs:element name="city" type="xs:string" />
				<xs:element maxOccurs="5" minOccurs="1" name="date" type="xs:date" />
	<xs:element name="GetTemperaturesResponse">
				<xs:element maxOccurs="5" minOccurs="1" name="TemperatureInfo">
							<xs:element name="min" type="xs:float" />
							<xs:element name="max" type="xs:float" />
							<xs:element name="average" type="xs:float" />
						<xs:attribute name="city" type="xs:string" use="optional" />
						<xs:attribute name="date" type="xs:date" use="optional" />

Using xmlbeans, we can create the java classes for this xml schema. The following command generates the jar file  temperature.jar that we need when we create the service end point.

scomp -out temperature.jar temperature.xsd

Then we create the plain web service interface

package com.mytechtip.robin.springws2example;
import java.util.Date;
import java.util.List;
public interface TemperatureService {
    public List<TemperatureInfo> getTemperatures(String city, List<Date> date);

and a sample implemetation.

package com.mytechtip.robin.springws2example;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
public class TemperatureServiceImpl implements TemperatureService {
	private Random rand = new Random(7);
	public List<TemperatureInfo> getTemperatures(String city, List<Date> dates) {
		List<TemperatureInfo> temperatures = new ArrayList<TemperatureInfo>();
		// Just return some random data
		for (Date date : dates) {
			temperatures.add(new TemperatureInfo(city, date, rand.nextInt(15),
					rand.nextInt(15) + 15, (rand.nextInt(30) + 15) / 2.0));
		return temperatures;

This service requires a  data model “TemperatureInfo”, which is a plain java object:

package com.mytechtip.robin.springws2example;
import java.io.Serializable;
import java.util.Date;
public class TemperatureInfo implements Serializable {
	private static final long serialVersionUID = 1L;
	private String city;
    private Date date;
    private double min;
    private double max;
    private double average;
    public TemperatureInfo() {}
    public TemperatureInfo(String city, Date date, double min, double max, double average) {
        this.city = city;
        this.date = date;
        this.min = min;
        this.max = max;
        this.average = average;
    // Some getter and setter methods
    // ...

Based on the above, we create the end point – “TemperatureMarshallingEndpoint”

package com.mytechtip.robin.springws2example;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.ws.server.endpoint.annotation.Endpoint;
import org.springframework.ws.server.endpoint.annotation.PayloadRoot;
import com.mytechtip.robin.springws2Example.temperature.schemas.GetTemperaturesRequestDocument;
import com.mytechtip.robin.springws2Example.temperature.schemas.GetTemperaturesRequestDocument.GetTemperaturesRequest;
import com.mytechtip.robin.springws2Example.temperature.schemas.GetTemperaturesResponseDocument;
import com.mytechtip.robin.springws2Example.temperature.schemas.GetTemperaturesResponseDocument.GetTemperaturesResponse;
public class TemperatureMarshallingEndpoint {
	private static final String namespaceUri = "http://robin.mytechtip.com/springws2example/temperature/schemas";
	private TemperatureService temperatureService;
	public void setTemperatureService(TemperatureService tempService) {
		this.temperatureService = tempService;
	@PayloadRoot(localPart = "GetTemperaturesRequest", namespace = namespaceUri)
	public GetTemperaturesResponseDocument getTemperatures(
			GetTemperaturesRequestDocument request) {
		GetTemperaturesRequestDocument requestDoc = request;
		GetTemperaturesRequest in = requestDoc.getGetTemperaturesRequest();
		List<Date> dates = new ArrayList<Date>();
		for (Calendar calendar : in.getDateArray()) {
		List<TemperatureInfo> infos = temperatureService.getTemperatures(
				in.getCity(), dates);
		GetTemperaturesResponseDocument responseDoc = GetTemperaturesResponseDocument.Factory
		GetTemperaturesResponse response = responseDoc
		for (TemperatureInfo info : infos) {
			GetTemperaturesResponse.TemperatureInfo out = response
			out.setAverage((float) info.getAverage());
			Calendar calendar = Calendar.getInstance();
			out.setMax((float) info.getMax());
			out.setMin((float) info.getMin());
		return responseDoc;

We need to configure the web application and the spring framework to make the web service work.  So the web.xml looks like this.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"

and we have the corresponding spring context configuration (“springws2example-servlet.xml” under WEB-INFO) for the servlet “springws2example”  defined in “web.xml”. NOTE: we need to make sure the schema “temperature.xsd” is under  folder “WEB-INF” so it can automatically generate the “WSDL” file, which can be accessed from “http://<host>:<port>/<context-root>/services/temperature.wsdl”.

<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
	xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
  http://www.springframework.org/schema/web-services http://www.springframework.org/schema/web-services/web-services-2.0.xsd
  http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd">
	<context:component-scan base-package="com.mytechtip.robin.springws2example" />
	<sws:annotation-driven />
	<bean class="com.mytechtip.robin.springws2example.TemperatureServiceImpl"></bean>
		<property name="marshaller" ref="marshaller" />
		<property name="unmarshaller" ref="marshaller" />
	<bean id="marshaller" class="org.springframework.oxm.xmlbeans.XmlBeansMarshaller">
	<sws:dynamic-wsdl id="temperature" portTypeName="TempeatureService"
		locationUri="/services/temperature" >
		<sws:xsd location="/WEB-INF/temperature.xsd" />

Since I don’t use maven, I need to manually put the following list of required jars into WEB-INFO/lib



Pentaho Community Edition (Data Ingegration)

Pentaho provides two different editions: Community Edition and Enterprise Edition. Community Edition is free and is what i want to discuss.

Pentaho seems to provide more comprehensive coverage of BI than Eclipse BIRT and Jaspersoft. It has the following components:

  • Data Integration – Kettle
  • Analysis Service (OLAP) – Mondrian
  • Reporting
  • Data Mining – Weka
  • Dashboard
  • Large Volume Data Handling (through Hadoop)

Since there is already a comparison of reporting functionality between Pentaho, Eclipse BIRT and JasperReports, I am not going to get deep into its reporting functionality.

The component that I’ve tried is the data integration. It helps me do the some data integration tasks without writing my own custom code. Here I just give a brief introduction to this component.

Data Integration – Kettle
Data Integration is the first thing i tried when I picked up Pentaho. It has a GUI tool (called Spoon) that is built with Eclipse RCP. With the GUI tool, it’s very easy to define a data integration process.

There are two main elements in the pentaho data integration process: Transformation and Job. Transformation, as the name suggestions, is a process that does the data manipulation including data exportation, cleansing, format changing, importation and etc. Job may contain one or more transformation and adds more sanity checks (such as if a file exists) and utilities (e.g., emailing the result).

Both Transformation and Job are made up of steps. Pentaho already includes many types of steps that performs the most common tasks. There steps serve as bricks that you can use to build up the whole data integration process. In the GUI, you can easily use drag and drop to define the steps and hops to form the process. You can even preview the transformed data in some steps to make sure they are doing the right thing.

From the Spoon GUI tool welcome page, you can find a “Get Started” document that helps you build the first working example. In addition, the pentaho community web site provides useful documentation on how to use this data integration tool.


Eclipse BIRT and JasperReports

Neither Eclipse BIRT or Jasper seems to support the whole functionality of BI. Their main focus is “reporting“, which is a core part of BI. Both of them are written in Java.

Eclipse BIRT has two components: A report designer and a report engine. the report designer interacts with user to generate an XML report design using its report design engine; while report engine reads the data and the XML report design to generate reports in different types of format (HTML, PDF, Excel, Word, and etc). The following chart from the eclipse BIRT website illustrates well.
Eclipse BIRT Arch

Let’s come to Jasper. I use Jasper because there are two different things associated with Jasper: JasperReports and Jaspersoft. JasperReports is the open source java reporting library while Jaspersoft is a collection of software including JasperReports. Apparently, Jaspersoft is heading the BI direction straightforward as it includes more enterprise BI functions such as ETL, analysis, dashboard and etc. It does not seem Jaspersoft is open source, so we here only talk about JasperReports.

JasperReports claims to be “the world’s most popular open source reporting engine”. From it’s website, it says:

It is entirely written in Java and it is able to use data coming from any kind of data source and produce pixel-perfect documents that can be viewed, printed or exported in a variety of document formats including HTML, PDF, Excel, OpenOffice and Word.

Similar to Eclipse BIRT, JasperReports has a JasperReports Engine that reads data from a wide range of data sources (DB, XML, CSV and etc) and generates reports in different formats. This process is controlled via some configuration files or run time parameters, so it can be customized to an extend. The following is the chart from the JasperReports website that shows the process.
JasperReports Architecture

It seems the bare JasperReports does not have a component that allow users to design reports (there’s an additional one called iReport for report design), which Eclipse BIRT has. Apparently, an additional report designer make the software more user friendly. However, in another sense, the lightweight of JasperReports makes it easier to be embedded/integrated in a bigger software system.

Stack Overflow has a question asking about the difference of the two tools: BIRT vs JasperReports and someone gives a link that compares BIRT, JasperReports and even Pentaho in terms of their reporting features.


Open Source Business Intelligence Software and Tools

Business Intelligence (BI) is not a new term. It’s nowadays a term that covers a set of fields such as Business reporting, OLAP, Decision Making Support, Data Analysis and Mining, and etc.

There are a lot of tools available for this big area. A Wikipedia page (http://en.wikipedia.org/wiki/Business_intelligence_tools) lists most BI related software. Of course, in this area, we hardly miss big players such as IBM, Oracle, Microsoft and etc. As always, these big software vendors offer proprietary software.

Apart from them, there are open source BI tools available as well. As I am new in this BI area, I am not sure how these open source BI tools are used in the market (if they are used) or they are just for academic purposes. I am also not sure if the BI market allows this type of open source tools to grow or they will be knocked out by the big players. Anyway, these questions are not important in this post. Here what I want to do is to briefly check some major open source business intelligence tools. Let’s get it started.

Eclipse BIRT and Jasper
Please check this post: Eclipse BIRT and Jasper

This will be discussed later

This will be dicussed later


Lucene IndexWriter optimize() behavior change since version 3.0.3

Lucene is a powerful full-text index and search development tool written in JAVA. Over more than ten years, Lucene has evolved to version 3 (stable version).

Recently, I upgraded Lucene library (the core jar file) from version 3.0.2 to version 3.0.3 (which was released in December 2010) in my project. The purpose of the upgrading is just for keeping up and sticking with a more bug-free release.

However, after upgrading, I noticed that the optimized index folder contains more index files than previously using version 3.0.2. That means the index merge during IndexWriter.optimize() stops at some point. I am not sure if the un-merged index file may cause any performance degradation during index search, but i am not satisfied with the fact that many files stay in the index folder (although it’s not too many).

After reading the change document for Lucene 3.0.3 release, I realized that some changes had been made to avoid high disk usage during indexing. The original change log item from Lucene 3.0.3 is stated as follows:

LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = 0.1), which means any time a merged segment is greater than 10% of the index size, it will be left in non-compound format even if compound format is on. This change was made to reduce peak transient disk usage during optimize which increased due to LUCENE-2762.

Since my index is not very very big and I don’t care about the ‘peak transient disk usage”,  I still want the index to be created and optimized in a cleaner way.  This means the merge should be still continued to form a whole compound format.

Obviously I now need to change the default Lucene indexing and optimizing behavior by adding extra code in my project. The following is my tweak:

    // an IndexWriter instance created through method getWriter
    IndexWriter writer = getWriter(dir);
    // The tweak starts here
    MergePolicy mp = writer.getMergePolicy();
    if (mp instanceof LogByteSizeMergePolicy) {
        LogByteSizeMergePolicy lbsmp = (LogByteSizeMergePolicy) mp;

Compiled, deployed, run. Hooray! The cleaner index folder is back!