SCM Utils – SCM data secure sharing, statistics and more


Last year, I was involved in a customer situation where we had issues with the EWM (RTC) Eclipse client. The customer reported client crashes and also performance/load issues. All these issues where intermittent and very hard to approach. It took a while and a lot of work to get them ironed out.

The performance or client load issue was apparently related to the customers data and its characteristics. I was able to reproduce it at the customer. We needed to be able to reproduce this issue in the development labs. The customer could not share their source code. The only way to share the data was to randomize the information that needed to be protected. Some kind of automation was also required, because we had to handle thousands of SCM components.

So I took some existing source code and wrote a small utility to help providing the data to enable our support reproducing the issue in the lab.

Initial Scope

Initially the tool provided commands supporting the following use cases:

  1. Export the components and their current content from a repository workspace to disc, randomizing the file content but keeping the names and structure intact.
  2. Importing the data into a repository workspace in a different repository recreating the components, their hierarchy and their content from the exported data.
  3. Converting an existing load rule for the original repository to match the component ID’s in the new repository to allow using the load rule.

This allowed us to share the data of the customer in a way that we were able to reproduce the issue in the lab.

To better understand the data, I began to use the tools capability to iterate the information in a repository workspace or stream to collect statistical information. This resulted in commands supporting the following use cases:

  1. Analyze a repository workspace or stream and provide statistics about its components and the contained folders and files.
  2. Analyze all streams in a repository to provide the statistics for each and the totals.
  3. Analyze a sandbox folder and its substructure for the same statistical information.

The information is printed on screen and provided as CSV file, allowing for drill down.

Download and License

The SCM Utils are provided as is, with no warranty or support, under MIT license. The source code is provided here. The repository also contains a comprehensive documentation of the commands and the parameters.

Working with the source code and building

Get the code either downloading the zip file or as local cloned GIT repository. Import all the projects into Eclipse e.g. in the GIT perspective.

Import all projects

There should be three projects as shown below in the Java perspective.

The three projects

The project com.ibm.js.team.supporttools.framework contains a small framework that is used by the tool. The project com.ibm.js.team.supporttools.scmutils contains the code for the SCM Utils. The project ewm-scm-utils contains the readme and all the other projects. I would suggest to close this project.

The projects show errors, because the projects are missing the Plain Java Client Libraries. In addition this tool uses Maven and Maven requires a JDK and an execution environment for it. Make sure to have one installed and activated. The tool was developed with Java 8.

Configure a JDK and set it in the preferences.

JDK is installed and selected

Make sure to activate an execution environment.

Execution Environment configured

Download the plain java client libraries from the All Downloads page of the EWM download. The Doc download for the Plain Java Client Libraries is not needed. Unzip the Plain Java client Libraries into a folder. Now create a user library for the Plain Java Client Libraries. In the preferences navigate to Java>Build Path>User Libraries. Click the New… button. Enter the name PlainJavaApi and click OK. Click on Add External JARs… Browse to the folder where you unpacked the Plain Java Client Libraries. Select all JAR files and click Open. Then click OK.

The PlainJavaApi user library.

There might be still errors, because Maven is missing dependencies.

Open the project com.ibm.js.team.supporttools.framework. Right click on the file pom.xml and select Run As > Maven clean. After that finished right click on the file pom.xml again and select Run As > Maven install. Do the same for com.ibm.js.team.supporttools.scmutils.

Run Maven Clean and Maven Install on both projects.

The code should now compile.

Working tool

The file README.md contains a help/description of the commands and parameters. The tool prints a help when it is run without or wrong parameters.

The code ships with example launches you can explore. To do that click on the triangle close to the bug symbol (or the triangle close to the run symbol). Open the Debug Configurations…. Search for the Java node and unfold it. Select a configuration and check the program arguments.

The available Debug/Run Configuration launches.

You can change the configurations to fit your data or create new ones. When you create new ones, consider storing the launches in a new project. Please do not push your changes to the GIT repository.

Build a runnable jar

The project com.ibm.js.team.supporttools.scmutils contains the file ReadMe – HowToRelease.txt. Follow the description in this file to create a runnable JAR. use the provided batch scripts to make starting the JAR easier.

Note, do not use the option to zip the libraries into the jar file. This results in terrible start times for the tool.

Exporting repository workspaces

The original use case that initiated writing this tool, requires a short explanation. The steps are in general:

  1. Run a exportScmWorkspace to export the data and structure.
  2. Run a importScmWorkspace to import the data where desired.
  3. If using loadrules based on component UUIDs, use convert convertLoadrule to replace the old UUID’s with the new UUID’s.

The command exportScmWorkspace exports the data of a repository workspace. It exports each component in its own zip file. The folder and file structure of the component is kept. The file and folder names are preserved. The file content is randomized by default. The file size is kept, but every character is replaced with a random number. The data is stored in a folder which can be set. In addition to the zip files a JSON file is created that contains the component names, ID’s and information about component hierarchy (sub component).

The optional parameter -exportmode can be used to change the behavior

-exportmode preserve keeps the file content as is.

-exportmode obfuscate replaces the content with random content from a text document containing code lines.

Example:

-command exportScmWorkspace -url "https://elm.example.com:9443/ccm/" -user myadmin -password myadmin -workspaceConnection "Hierarchical JKE Banking Integration Stream" -outputFolder "C:\aTemp\ScmExportRandomize"

This shows the following on the console

Executing Command exportScmWorkspace

The resulting export folder looks like

The JSON file contains information required for the import operation.

The structural and content information.

The information can now be used to import the data in any repository, including the repository.

Importing repository workspaces

The information that was exported can be imported into any Jazz SCM system. This is done using the command importScmWorkspace. The command requires the data of a previous export in a folder. In addition it requires a name for a repository workspace to import to, a project area name. The repository workspace is created during the import. The command has two special flags/switches

  1. reuseExistingWorkspace If the workspace to import to already exists, reuse it for import
  2. skipUploadingExistingComponents if a component with the same name already exists, skip the component upload.

These flags have been created to allow recovery from import errors. We ran into out of memory situations that we were unable to resolve. If an import fails with a component, rename the component that failed and remove it from the repository workspace. Then restart the import with these two switches and the import will skip over the components already imported and continue importing the missing components. This can be repeated until the import finally succeeded. Once the import of the components has succeeded, the component hierarchy is recreated.

The parameter componentNameModifier can be used to provide a prefix to be added to the name of each component that is created. This has several benefits. It makes the process repeatable with different prefixes. It also makes it easy to identify which components have been imported.

Example:

-command importScmWorkspace -url "https://elm.example.com:9443/ccm/" -user ralph -password ralph -inputFolder "C:\aTemp\ScmExportRandomize" -workspaceConnection "Imported Hierarchical Workspace" -projectarea "Formal" -componentNameModifier "IBMTestDefault_" -reuseExistingWorkspace -skipUploadingExistingComponents

This shows the following on the console

Importing components create and upload
Import components, recreate component structure

The resulting repository workspace contains all the components and the hierarchy:

Imported components in created repository workspace.

Note that the import stores mapping information between the component UUID’s of the component that are imported and the component that the data was imported in. The data is stored in a JSON file in the input folder.

The UUID mapping for the imported component.
The UUID mapping file

Load rule operations

The command convertLoadrule uses the folder containing the information for an export and an import and a load rule file as input and uses the mapping information to replace the source UUID’s in the load rule with the target UUID’s and saves the new load rule.

The command flattenLoadrule converts a load rule that has deep hierarchy removing the hierarchy. It Iterates a load rule and modifies pathPrefix entries for sandboxRelativePath. The modification replaces all / by _ except for the first /. This creates a flat load rule from a load rule that has hierarchy.

The command was created to try to understand if the hierarchy has impact on the load performance.

Analysis of streams and sandboxes

Another set of commands performs an analysis on the structure of repository workspaces, streams and sandboxes. They iterate the object, its content and substructure, and collect statistical information. The information is displayed and stored in CSV files. The CSV files allow to do more analysis later. The analysis currently collects obvious information such as file sizes and number of files per folder and less obvious information such as hierarchy depth, encoding and file extensions.

The following commands are available at the moment:

  1. analyzeScmRepository Iterates the whole repository and analyses the streams it has access to. Can be limited to scopes such as project or team areas. Collects the totals for the scope and uses the the capabilities of the commands below.
  2. analyzeScmWorkspace Analyses one repository workspace or stream.
  3. analyzeSandbox Analyses a folder on the local disk e.g. a sandbox. Excludes special folders such as .jazz3 or .metadata. The list of excluded folders is currently hardcoded.

Example:

-command analyzeScmRepository -url "https://elm.example.com:9443/ccm/" -user ralph -password ralph  -outputFolder="C:/aTemp/ScmAnalyzeRepository" -connectionOwnerScope "JKE Banking (Change Management)"

The analysis prints some of the data e.g. for each Stream and each component in the stream.

Collected data for each stream and each component in the stream

The aggregated data is printed for each stream.

Aggregated data for a stream

During the process, the analysis command generates csv files that are stored in the output folder.

CSV files as result

For each stream that is analyzed, a csv file is created. The csv file for the stream has all the data for the components in the stream and aggregated at stream level. The CSV file can be opened with Excel.

The final csv file created named _repository.csv contains the statistics for each stream that was analyzed and aggregated statistics across all analyzed streams. The individual csv files are accessible using a link.

Repository statistics

All csv files that are generated contain multiple sheets.the sheets to the right contain range statistics for the files and the extensions of this files for the analyzed context. The range metrix partitions the ranges in stripes with different top limits (and bottom limits). For each range it counts the the number of files that have a size that fits in a range. It also collects the file extensions associated to files in each range.

Stream operations

An additional set of operations has been recently added.

  1. uploadToStream Uploads a folder and its content as component to a stream and baselines the content. The folder name is used as the component name. The component is created if it does not yet exists. Ownership and visibility of the component is the project area. The component is added to the stream if it is not yet in it. All changes are contained in one change set. When a build result UUID is provided as optional parameter the command will publish the URIs for the stream, the baseline and the component as external links to the build result.
  2. downloadComponentBaseline Downloads the content of a component selected by a baseline into a local file system folder. The component name is created as folder and the content of the component is loaded into that folder.

Creating custom commands

It is possible to add custom commands to the SCMUtils. Adding custom commands to the SCMUtils explains how this in details.

Summary

The code is available, so it can be enhanced. The existing code covers several interesting areas in the RTC SCM API and can be used for inspiration. As always, I hope the SCM Utils will help someone out there.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.