EWM REST API to Access Existing Work Item Queries

After looking into the EWM OSLC CM API, including the OSLC Query API, a natural question that comes up is: Is it possible to take advantage of the existing EWM work item queries? Is it possible to use that query mechanism and run existing work item queries to get the result set back?

I had to look into that for a customer and an internal project and where experiencing some difficulties. So I think it is valuable to share my experiences and examples that finally worked for me.

Context of the blog post is the series

This is the series of planned posts I intent to publish over time. Most of the examples will be EWM based, but quite a lot of the content applies to more ELM applications.

External Links

I used at least the following links for exploring this mechanism.

EWM Stored Query API Introduction

EWM provides its own REST API that supports finding and running predefined, shared or personal work item queries that are stored in the repository. The API allows to query for all the stored work item queries available. The API allows to run such a query and to access the result information to present the work items and their data. This API is EWM only as far as I can tell. It is not documented to be available for any of the other products. It is also important to note, this API is not an OSLC API, even if the query URL contains oslc as part of the path.

The examples for how to use built in query are nice to have, but I found the documentation was lacking essential information. Maybe it is jut me and it comes naturally to all the native cloud wizz kids. I struggled to get the examples to work. One pattern I have now seen several times in documentation and examples is a lack of information about required and supported headers and how to URL encode query parameters. I will try to explain what worked for me. The documentation has recently been updated and now contains more details compared to when I first explored it.

For this specific part of the API, my observation was that getting the query parameters wrong, often resulted in the query returning results, but not the expected results. Usually the result set way way bigger than expected in those cases.

One important behavior that is very sensitive to the Accept header is how the information is returned and presented in the responses. There are two completely different pattern.

  • Only the URLs to the results are available in the response. To get any details about the result referenced by the URL, it is necessary to query that URL. This corresponds to the situation for OSLC queries without or an empty oslc.select statement.
  • Additional information is provided inline in the response. It is not necessary to query the result URI to be able to access the most relevant information. This corresponds to the situation for OSLC queries with an oslc.select=* statement.

When I looked at the API first, I missed information about the valid headers. As far as I can tell, the valid headers are now updated in the wiki page. The supported and working headers are mentioned below as well.

Headers and performance

Which header to use seems to be a quite unimportant difference, but it can have a huge impact on the communication performance. Lets create a Gedankenexperiment. Lets assume we perform a request to the server. We want to search in the response data to get the URI to a query with specific property values like a project area, query name and creator. Lets assume the following characteristics for the request:

  • Result set size is 130 items
  • Paging size is 50 items

To process the response with the properties inline it is necessary to run 3 requests at most. If the desired information is on page 1, only one request would be necessary. Worst case all related data needs to be sent and received.

To process the data without the properties inline it is necessary to load a page. Then it is necessary to run another request for each of the URLs in the result set page and process the received data. In worst case this requires 3 + 130 = 133 requests. Worst case all related data needs to be sent and received as well.

Why is that a problem? Worst case, if the desired data is in the last link processed, both methods have required to transfer the same amount of data. In many cases one could argue the 1st approach would even require more data to be transferred on average than in the 2nd approach.

However, in my experience the information transfer between two network nodes is significantly slower than the time that would be required in a server to get the desired information. Each message that needs to be sent causes a considerable amount of overhead and delay. It is usually cheaper to send fewer messages than many small ones that transfer a similar amount of data, just because of the overhead required for each message. In addition all the subsequent requests for the details cause activity on the server which is likely to cause a lot more overhead compared to collecting all the information for the 3 initial request(s).

As an real world example, when I started to look into this, I did not know which headers where available. I used the headers I had often used e.g. ‘application/rdf+xml’. In addition, the query that I created had an issue. Instead of only returning the queries for a specific user in a specific project area, it returned all queries for the whole server. The responses only contained the query URLs. To get the name and creator of the query each URL had to be requested. It was also not obvious why the result set was so large and I had no code to recognize that results were related to project areas other than the expected. The server I ran against is on another continent and I was in my home office. I stopped the experiment after an hour or two.

At the end I was able to create solution that performs well and reduces overhead to a limit. But it was no means a simple process. I reached out to someone for hints. As a customer I would not have had the opportunity. This is why I try to finally share this information.

Stored Query API URL

I was not able to find a way to discover the URI to get the work item queries. The Wiki page mentions to find the simpleQuery tag in the project areas service provider, but I seem to be unable to find it, so I need to construct it based on the wiki page. From the Resource Oriented API page I can deduce two possible forms for the URL. Assuming PublicURI being the public URI of the CCM server e.g. ‘https://elm.example.com:9443/ccm’ the URL for the Stored Query Collection is either

PublicURI + '/oslc/queries'

or

PublicURI + '/oslc/queries.xml'

Both seem to work. Choose one you like and that works for you.

Query Stored Queries

The request below gets all the stored queries for a repository.

Get all stored queries for the repository

Note that the only Accept headers documented to be valid at the moment are:

Accept text/xml
Accept text/json

It is necessary to be authenticated to the ccm server to be able to perform the operation.

The image below shows a part of the response to the query above.

The response to the query for all stored queries

The first section shows the query for the next page in the attribute oslc_cm:next and the total result count in oslc_cm:toatlCount.

The rest of the information is one element rtc_cm:Query for each query descriptor. The element contains the query URI as rdf:resource, the identifier for the query, the Query name/title and description and the project area the query belongs to. The most important information is the rtc_cm:results element with the URL to execute the query. As an example the URL below:

https://elm.example.com:9443/ccm/oslc/queries/_87dqoFpmEeukW7cqqDjAuA/rtc_cm:results

Additional information is the date of the last modification and the user that created the query.

Note that the result is either XML or JSON. The tools for RDF that were used in some of the previous posts using the Python rdflib do not work in this context. Instead the Python code shown in the blog uses xml.dom.minidom to access the information in case of XML. The JSON code examples shown in previous posts using the json library can be used as a reference. This post will mostly use the Accept header text/xml. The code depends on the following libraries.

The main Libraries used to process the stored queries API.

Some example code for processing the data in Python will be shown below.

Narrow Down Stored Queries

It is usually not desirable to query for all stored queries for all project areas. A real production server can have hundreds of project areas, team areas and users that can have predefined or shared or personal queries defined. The result could be thousands to ten thousands of queries. To collect and transfer all that data is a challenge and will drain server resources as well.

It is possible to filter for all the properties that make up a query, identifier, name, creator, owning project area. The code below computes the query parameters from the parameters passed and creates a query section for the query URL.

Create the query parameters to narrow down the query results.

As an example the query that is created could look like below. The condition can narrow down to the project area, the user and the title, dependent on how much information is provided. If enough information is given, this would usually only return one hit. Note that this is a string terminated by the singe quote and the double quote characters inside are important. The query narrows down to one project area, the query name ‘All’ and the creating user.

'rtc_cm:projectArea="_8e5qfFpmEeukW7cqqDjAuA" and dc:title="All" and dc:creator="https://elm.example.com:9443/jts/users/ralph"'

How to discover the project area UUID has been explained in previous posts e.g. about discovery. The code that was used here is the same used in the other posts.

The code blow shows how the query URL is composed.

Python code to compose the query parameters and to create the query URL

The parameters provided are the same as above, the project area, the query name and the query creator. The creator can be provided in different ways. It is possible to provide a user URI like

'https://elm.example.com:9443/jts/users/ralph'

It is also possible to provide the variable string below.

'{currentUser}'

The query mechanism replaces this variable with the URI of the user that runs the request. For example, the query below would return the query named All created by the current user.

'rtc_cm:projectArea="_8e5qfFpmEeukW7cqqDjAuA" and dc:title="All" and dc:creator="{currentUser}"'

The last three lines in the image above compose the query URL. First the base query URL is composed. This is the part of the query that returns every stored query. Then the query parameter section is composed by adding ‘?oslc_cm.query=’ and then adding the URL encoded query parameters. The resulting query URL looks like below.

'https://elm.example.com:9443/ccm/oslc/queries.xml?oslc_cm.query=rtc_cm%3AprojectArea%3D%22_8e5qfFpmEeukW7cqqDjAuA%22+and+dc%3Atitle%3D%22All%22+and+dc%3Acreator%3D%22%7BcurrentUser%7D%22'

Execute the Query for Stored Queries

The code below executes the query to search for stored queries.

The first line runs the query that was just created using the function findQueryResources().

Execute the query to find stored queries

This function is shown in the image below. It uses the Accept header ‘text/xml’. While there is a valid query URL for a page, it executes that query.

The code that executes the query for the stored queries and collects the results

It calls qeryQueryInfoXML() to process the page result data. This code is shown below. The function builds up two arrays, one containing the stored query URL and one the associated query name. It also gets the query URL for the next page, if there is any, and returns that.

Get the information about the queries from the response

All query pages are iterated and the results added to the respective arrays.

Evaluate the result and

At the end of executing findQueryResources, there is a list of query names and an associated list of query URLs. Dependent on the query parameters given, it could return many or no result. The code here as shown below focuses on using the query name, but it would be easy enough to extract more information per stored query to use different criteria to look into the result set. The code tries to find the first stored query returned with the given name.

Find the query to run and execute it

If a stored query has been found, that query is finally executed in the function execute_query_xml(). The code for this function is shown below. Again, this uses the Accept header ‘test/xml’ and the results are processed as XML.

Process each result page of the stored query

The code iterates the query pages and is similar to the code already shown above. The code passes each query result page to the function processQueryResultInlineXML(). The code is shown below. It also looks very similar to the code we have already seen. The difference is that the code below now works on work item resource information and not on query information.

Processing one page of the stored query with work item data inline.

The code analyzes the oslc_cm:Collection and gets the total result count and the next page (if there is one). Then it analyzes the inline XML data for all the work items. For each work item it gets some of the available information and prints that. It returns the work item URIs, the totals and the next page etc.

Executing Stored Query Using JSON

The code here shows a JSON based implementation of the functions to execute the stored query after finding it. The function execute_query_json() does the same the function execute_query_xml() does and looks very similar. It is only using JSON as format to parse the data. The Accept header used below is ‘text/json’.

Process all query pages in JSON

The function getQueryResultDetailsInlineJson() is called to process each page. Like its XML twin it gets the total count and the next page (if available). Then it gets details about the embedded inline work item data and logs it. Like the XML based code, the function builds an array with the work item URIs for the page. It also returns additional information such as the link to the next page.

Process one query page with JSON work item data inline

More information

Please check the section Getting Resource Representations to understand additional ways to get work item data representations. Also check the section Getting Partial Representations to understand options to limit the data that needs to be transferred.

CSV Export

The query descriptor URL provided in the search by the tag rtc_cm:Query, can also be used in a browser to get the query displayed. As an example see the information below.

<rtc_cm:Query rdf:resource="https://elm.example.com:9443/ccm/resource/itemOid/com.ibm.team.workitem.query.QueryDescriptor/_pdT5AATBEeyiDIlAjMF45Q">

Opening the URL in the rdf:resource results in the following display in the browser.

The query descriptor URL is opened in the browser and executes the query.

It is also possible to use that URL to download the query result as CSV file. By adding the query parameter media type e.g.

?_mediaType=text/csv

to the query descriptor URL, the query can be changed to export the data as CSV file. As an example the URL below exports the query result to CSV.

https://elm.example.com:9443/ccm/resource/itemOid/com.ibm.team.workitem.query.QueryDescriptor/_pdT5AATBEeyiDIlAjMF45Q?_mediaType=text/csv

By pasting the URL in the browser, the CSV download is triggered. It is also possible to use the URL above in tools like cURL.

Summary

This blog post has described the steps that are necessary to find existing work item queries and how to run such queries to get the data. I hope that the examples here in my blog help users out there with their work.

Using Work Item Queries for Automation

A lot of desirable custom automation for Rational Team Concert is for managing work items. Especially as the number of work items grow over time. I found it very useful to be able to run work item queries and use them to feed my automation. This post shows how to do this.

The Wiki Page QueryDevGuide shows more example code and also how to create queries using the API.

Queries are related to Expressions which are content of the posts Using Expressions for Automation and Using an Expression to Synchronize Attributes for Work Items of a Specific Type.

I am working with some fairly big IBM internal projects that have very mature processes and, due to the huge amount of items, desire a lot of automation. The code in this post originates from working with Susan Hanson who came up with the majority of the code. I found it so useful that I use it in several places.

License and how to get started with the RTC API’S

As always, our lawyers reminded me to state that the code in this post is derived from examples from Jazz.net as well as the RTC SDK. The usage of code from that example source code is governed by this license. Therefore this code is governed by this license, which basically means you can use it for internal usage, but not sell. Please also remember, as stated in the disclaimer, that this code comes with the usual lack of promise or guarantee. Enjoy!

If you just get started with extending Rational Team Concert, or create API based automation, start with the post Learning To Fly: Getting Started with the RTC Java API’s and follow the linked resources.

You should be able to use the following code in this environment and get your own automation or extension working.

To keep it simple this example is, as many others in this blog, based on the Jazz Team Wiki entry on Programmatic Work Item Creation and the Plain Java Client Library Snippets. The example in this blog shows RTC Client API.

Find A Query

If trying to use a work item query for automation, it is necessary to find the query first. To find a query there are basically two scenarios to consider. A query can be a personal query of the user that runs the automation, or it can be a shared query that was predefined and shared by a project area or team area. The first code block below shows how to find a personal query for a given query name provided as string.

public static IQueryDescriptor findPersonalQuery(IProjectArea projectArea, 
		String queryName, IProgressMonitor monitor)
		throws TeamRepositoryException {
	// Get the required client libraries
	ITeamRepository teamRepository = (ITeamRepository)projectArea.getOrigin();
	IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(
		IWorkItemClient.class);
	IQueryClient queryClient = workItemClient.getQueryClient();
	// Get the current user.
	IContributor loggedIn = teamRepository.loggedInContributor();
	IQueryDescriptor queryToRun = null;
	// Get all queries of the user in this project area.
	List queries = queryClient.findPersonalQueries(
		projectArea.getProjectArea(), loggedIn,
		QueryTypes.WORK_ITEM_QUERY,
		IQueryDescriptor.FULL_PROFILE, monitor);
	// Find a query with a matching name
	for (Iterator iterator = queries.iterator(); iterator.hasNext();) {
		IQueryDescriptor iQueryDescriptor = (IQueryDescriptor) iterator.next();
		if (iQueryDescriptor.getName().equals(queryName)) {
			queryToRun = iQueryDescriptor;
			break;
		}
	}
	return queryToRun;
}

The code to call the operation for a personal query “All Work Items” would look as below.

IQueryDescriptor query = findPersonalQuery(projectArea, "All Work Items", monitor);

It is also possible to access the shared queries. The code would look as follows.

public static IQueryDescriptor findSharedQuery(	IProjectArea projectArea, 
		List sharingTargets, String queryName,  IProgressMonitor monitor)
		throws TeamRepositoryException {
	// Get the required client libraries
	ITeamRepository teamRepository = (ITeamRepository)projectArea.getOrigin();
	IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(IWorkItemClient.class);
	IQueryClient queryClient = workItemClient.getQueryClient();
	IQueryDescriptor queryToRun = null;
	List queries = queryClient.findSharedQueries(projectArea.getProjectArea(),
		sharingTargets, QueryTypes.WORK_ITEM_QUERY,
	IQueryDescriptor.FULL_PROFILE, monitor);
	// Find a query with a matching name
	for (Iterator iterator = queries.iterator(); iterator.hasNext();) {
		IQueryDescriptor iQueryDescriptor = (IQueryDescriptor) iterator.next();
		if (iQueryDescriptor.getName().equals(queryName)) {
			queryToRun = iQueryDescriptor;
			break;
		}
	}
	return queryToRun;
}

To call the operation it is necessary to provide a list of sharing targets. The sharing targets can be any project area or team areas in the project area. The code to call the operation above would look as below.

List sharingTargets = new ArrayList();
// Add desired sharing targets
sharingTargets.add(projectArea);
IQueryDescriptor sharedQuery = findSharedQuery(projectArea, sharingTargets, "Done Stories", nomitor);

Run A Query

Once the query is found, you can run it and get the result set. There are basically two possible ways to get the query results.

  1. It is possible to get the results unresolved, which basically means you only get an IAuditableHandle in the result and have to resolve the work item from it to get the data.
  2. It is possible to get the results as work items, resolved with a specific load profile.

*Update*

WARNING!

To avoid calamities with many other users screaming at you, because the server is down, please be very careful if using resolved query results. This is especially true when running with a FULL_PROFILE and overriding the query result limit and running queries with huge amounts of work items. In this case it is smarter to use the unresolved results and pay the performance penalty of resolving the work items individually.

The code to get the unresolved results using the IQueryClient looks as follows.

IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(IWorkItemClient.class);
IQueryClient queryClient = workItemClient.getQueryClient();
IQueryResult unresolvedResults = queryClient.getQueryResults(query);

The code to get the resolved results is very similar and presented below. The difference is that it is necessary to provide a load profile to get the results resolved to the load profile.

IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(IWorkItemClient.class);
IQueryClient queryClient = workItemClient.getQueryClient();
// Set the load profile
ItemProfile loadProfile = IWorkItem.SMALL_PROFILE;
IQueryResult<IResolvedResult> resolvedResults = queryClient.getResolvedQueryResults(query, loadProfile);

The load profile basically defines the attributes that are loaded and available in the work item. com.ibm.team.workitem.common.model.IWorkItem defines several profiles.

ItemProfile SMALL_PROFILE
ItemProfile DEFAULT_PROFILE
ItemProfile MEDIUM_PROFILE
ItemProfile LARGE_PROFILE=
/** The full item profile. */
ItemProfile FULL_PROFILE

I usually use SMALL_PROFILE or FULL_PROFILE. The SMALL_PROFILE provides PROJECT_AREA_PROPERTY, ID_PROPERTY, TYPE_PROPERTY, SUMMARY_PROPERTY, OWNER_PROPERTY. It is possible to create a custom load profile using createExtension and providing the attributes/properties required. For example:

IWorkItem.SMALL_PROFILE.createExtension(Arrays.asList(new String[] { STATE_PROPERTY, PRIORITY_PROPERTY, SEVERITY_PROPERTY, CREATOR_PROPERTY, WORKFLOW_SURROGATE_PROPERTY, DESCRIPTION_PROPERTY, }));

To limit the data to read, use a profile with minimal data to load, especially if running on big queries.

Query Result Size Limit

When processing the query result, you need to be aware that the CCM application has a “Maximum Query Result Set Size”, which is typically set to 1000 items. No work item query run by a user through any UI will provide more items than defined by this limit. The reason basically is to prevent the server from being overloaded by massive amounts of database accesses caused by users.


The limit also applies to the API, however, in the API it is possible to get around it to get the full result set. To do that, you have to change the limit of the result set. It is necessary to do that before you do anything else with the result set. The limit can not be changed after you accessed the results.

The code to change the limit for unresolved results looks like below:

((QueryResultIterator) unresolvedResults).setLimit(Integer.MAX_VALUE);

The code to change the limit for a resolved result set is simpler.

resolvedResults.setLimit(Integer.MAX_VALUE);

Processing the Query Result

The following method shows how to process the unresolved results

public static void processUnresolvedResults(IProjectArea projectArea , IQueryResult results,
		ItemProfile profile, IProgressMonitor monitor)
		throws TeamRepositoryException {
	// Get the required client libraries
	ITeamRepository teamRepository = (ITeamRepository)projectArea.getOrigin();
	IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(IWorkItemClient.class);
	IAuditableCommon auditableCommon = (IAuditableCommon) teamRepository.getClientLibrary(IAuditableCommon.class);
	long processed = 0;
	while (results.hasNext(monitor)) {
		IResult result = (IResult) results.next(monitor);
		IWorkItem workItem = auditableCommon.resolveAuditable(
			(IAuditableHandle) result.getItem(), profile, monitor);
		// Do something with the work item here
		processed++;
	}
	System.out.println("Processed results: " + processed);
}

After adding some logic to process the work item, the method can be called like below:

processUnresolvedResults(projectArea, unresolvedResults, loadProfile, monitor);

The following method can be used to process the resolved results.

public static void processResolvedResults(IProjectArea projectArea,IQueryResult<IResolvedResult> resolvedResults,
		 IProgressMonitor monitor)
		throws TeamRepositoryException {
	// Get the required client libraries
	ITeamRepository teamRepository = (ITeamRepository)projectArea.getOrigin();
	IWorkItemClient workItemClient = (IWorkItemClient) teamRepository.getClientLibrary(IWorkItemClient.class);
	long processed = 0;
	while (resolvedResults.hasNext(monitor)) {
		IResolvedResult result = resolvedResults.next(monitor);
		IWorkItem workItem = result.getItem();
		// do something with the work item
		processed++;
	}
	System.out.println("Processed results: " + processed);
}

After adding some logic to process the work item, the method can be called like this:

processResolvedResults(projectArea, resolvedResults, monitor);

Process Paged Results

It is possible to paginate the resolved results and run on the returned pages. The code below runs some code on pages in parallel.

resolvedPResults.setPageSize(100);
ArrayList jobs = new ArrayList();
while(resolvedPResults.hasNext(null)){
	ResolvedPageJob job = new ResolvedPageJob(teamRepository, workItemClient, resolvedPResults.nextPage(null));
	jobs.add(job);
	job.setUser(true);
	// run as System Job;
	//job.setSystem(true);
	job.schedule();
}
for (ResolvedPageJob resolvedPageJob : jobs) {
	try {
		resolvedPageJob.join();
	} catch (InterruptedException e) {
	// TODO Auto-generated catch block
		e.printStackTrace();
	}
}

The Job class would look like this.

public static class ResolvedPageJob extends Job {
	private List<IResolvedResult> fPage;
	private ITeamRepository fTeamRepository;
	private IWorkItemClient fWorkItemClient;

	public ResolvedPageJob( ITeamRepository teamRepository, IWorkItemClient workitemClient, List<IResolvedResult> results) {
		super("Run Page");
		fTeamRepository=teamRepository;
		fWorkItemClient=workitemClient;
		fPage=results;
	}

	protected IStatus run(IProgressMonitor monitor) {
		try {
			processResolvedResultsPaged(fTeamRepository, fWorkItemClient,fPage);
		} catch (TeamRepositoryException e) {
			System.out.println("Exception: " + e.getLocalizedMessage());
			return new Status(Status.ERROR,"Me","TeamRepositoryException",e);
		}
		return Status.OK_STATUS;
	}
}

And the method to process the paged resolved results would look like below.

public static void processResolvedResultsPaged(ITeamRepository teamRepository,
		IWorkItemClient workItemClient,
		List page)
		throws TeamRepositoryException {
	long processed = 0;
	for (IResolvedResult resolvedResult : page) {
		IWorkItem workItem = resolvedResult.getItem();
		// do something with the work item
		processed++;
	}
	System.out.println("Processed results: " + processed);
}

Some Observations

I optimized the code a bit by reducing the client libraries to be looked up. I basically got the client libraries in a main method and passed them to the methods.

Then I ran the methods against a test system that had 16000 work items, randomly unresolved and resolved. Some observations I made:

  1. Running with resolved query results was typically quicker than running against unresolved result sets. The bigger the number of work items the smaller the difference.
  2. For running parallel jobs I got a mixed bag. Sometimes it was a lot slower and sometimes it was quicker.

My experiments ran against a test system running on the same machine and using Derby as DB.

To avoid calamities with many other users screaming at you, because the production server is down, please be very careful if using resolved query results. This is especially true when running with a FULL_PROFILE and overriding the query result limit and running queries with huge amounts of work items. In this case it is smarter to use the unresolved results and pay the performance penalty of resolving the work items individually.

Summary

Using the API as described above allows you to run automation against work item queries, for example using WorkItemOperation as presented in Uploading Attachments to Work Items. I will describe some scenarios in follow up posts, for example how to “Synchronize Attributes”. I hope that these code examples will help users out there that have a need for more automation.