By Walter Lindsay, Director of Solution Architecture, Liaison Technologies
Someone looking to become a Liaison customer asked me to write about creating complex maps. As I often say when speaking with customers and prospects, “We always take votes.”
We will here discuss two perspectives on what a map is, and two perspectives on the nature of mapping and then we will touch on complexity in mapping.
A map defines how to construct a result give an input and a context. By context I mean the current date and time, the state of external lookup tables, or anything else the map depends on. This is a mathematician’s sense of the map – it is a function that, given a set of inputs, produces a set of predictable outputs.
In another sense, a map is a program that converts data.
The fact that it is easier to think of maps the second way has shaped the history of data mapping technology. Many products, especially older ones, are programmer tools, are at the core facades over code. Maps are code in this sense.
The problem is that, while I can read code, it is hard to discern the intent of the person who wrote the code. Just because code adds two values doesn’t reveal why the coder added the two values. Perhaps he or she was computing the average of a set of numbers. Perhaps he or she was just adding two values so the result could be written out.
Liaison Delta was designed to build maps and to define process steps in the ECS lightweight middleware server, and it does its job superbly. Those process steps are often inherently code. Delta is oriented toward the second definition without being enslaved to it (e.g. Delta lets you rearrange “looping” behavior without rewriting maps; many older products cannot because they represented maps as code and looping was written as for loops etc. in the mapping code the product manipulated).
Contivo was designed to build maps and model data. Contivo was designed to model data and to generate maps based on those models. It of necessity involves code, but it mostly involves declarative constructs that expose intent. For instance, if a mapper handles qualifiers in an EDI (EDI meaning), IDoc, XML, or other document using a Contivo “virtual group,” the existence of that virtual group has semantic meaning that the product can use to reason about the nature of interfaces and maps.
Second, the process of mapping can be thought of in two ways. First, it can be thought of as converting data with semantic meaning from one form into another. Let’s call this the data architect’s view. Constructs in the data mean something; converting data from one form to another represents that meaning in the output format.
Alternatively, mapping is constructing the desired output given the input data. You might call this a pragmatic view or a test-driven development view. For instance, my children and wife only sing “Happy Birthday” to me as a quintet on my birthday. They might sing it to me in a larger group, such as with friends or family members, on other days near my birthday. Given this, an application would always write out the “sings happy birthday as a quintet” date and “birthdate” with the same value. Yes, only one actually means “birthdate,” but either is good enough. They will always have the same value – they always sing Happy Birthday to me on my birthday. From a pragmatic perspective, the two date fields are equivalent because they always hold the same value.
Often the cost of “good enough” is so much lower than the cost of “semantically correct” that figuring out the difference is not cost effective. Until someone changes the application, and then the problem is readily exposed and fixed, often still at a much lower cost than trying to figure out what is semantically equivalent at the start.
Each perspective on mapping has a valuable place. It all depends on what you are trying to achieve.
Let us now discuss complexity in mapping. Some mapping problems are much more complex than others. For instance, to translate into good French “We drove 500 miles for $40. 50 mpg is pretty good!” we need to convert miles to kilometers, dollars to Euros, miles per gallon into kilos per liter – as well as translate the words. That is much more complex than translating “We drove to Paris.”
First, complex mapping is much easier if you have worked with both the input and output format or have expertise in this domain (such as, if you have worked with many invoices, working with another is eaiser). If so, you bring “semantic equivalence” knowledge to the table. Often, people with that subject matter expertise are asked to create “mapping specs,” a process which I believe unnecessarily time-consuming. Cutting and pasting, or typing, path strings into spreadsheets is inherently inefficient. Sometimes an individual can create such a spec; sometimes a team needs to collaborate to create such a spec. We prefer letting users register the specs in Contivo, so that the information flows transparently to the mappers. In fact, we have some customers who use Contivo to create maps, which are then coded by mappers into an ESB or some other tool. That way the business analysts, data architects, etc. have an easy-to-use way to create and test specifications, and mappers get high-quality and error-free specifications.
Second, complex mapping is often easier if you have data samples. Providing a data sample or samples to someone is implicitly saying “this is what my data looks like.” This allows a human to realize that, for instance, a given source and a given target both contain social security numbers, first names, etc. Such patterns are recognizable to the human eye. If you can collect input-output data sets, such that a given input produces a given outputs, you are better off, and Contivo Map Intuition might be extremely helpful. But if you have only representative samples, which are not matched, that too can be extremely valuable.
Third, data mappers have often commented that “looping is the hardest part of mapping.” For instance, if an XML Schema describing T-shirts has at the top styles, and under style it describes colors, then is an element which tells whether the shirt is a man’s or woman’s shirt, with the size under that—all of which can loop (have a minOccurs > 1). However, applications might never generate data that group men’s and women’s T-shirts apart, so the “man’s or woman’s” group will always have a single instance, but the others might have multiple instances. If this document is to be transformed to something with a very different hierarchy, figuring out the relationship between the source and target hierarchy can be difficult. Map Intuition or having sample data values can, in the Contivo world, help mappers identify the relationship quickly. And Contivo lets users rapidly reorganize the looping behavior, so they can rapidly experiment and maps can be created quickly.
Fourth, mappings which require manipulation or business logic tend to require code. I am not here describing merely reformatting the way a date value is represented. I am talking about essentially mini-programs embedded in or invoked from the map. Map Intuition can help identify the rules, if you have matched data sets. Some users have built up a library of special functions needed repeatedly in their business that can be used in maps. At other times, a business analyst may describe in the map what happens, but in the end a mapper may need to write some code.
For parts of maps which involve either looping or business logic rules, making those parts readily visible so that QA can focus there can be very helpful. Generating a mapping report, or generating test data, can communicate where testing should focus.
Another kind of complexity worth mentioning is complexity in the interface, in the data format. For instance, messages created from a canonical model, either by a standards body or from within a company, tend to have large schemas even for small message sizes. Either paring down the schemas or communicating to users how to use the messages is thus needed to manage the complexity. Mappers are often programmers who don’t know the data, and so organizations need ways to share knowledge about the data. This opens up an entire additional discussion about complexity, which will have to happen another time (but some Liaison white papers and webinars have described this well).
Complexity. We here discussed mapping complexity and communicating the complex nature of data, two topics central to data integration.