1. Problem
You are consuming an inbound
message in a flat file structure and must represent the data in an XML
schema. The inbound flat file contains records that are both positional
and delimited.
NOTE
Delimited files
contain characters (such as commas) that separate the data. Files that
are positional in nature contain data items that are a predefined length
within the file. The physical position of the data defines what the
data represents.
2. Solution
The solution outlined in this
recipe consumes an inbound flat file schema message that may have a
structure similar to the flat file shown in Listing 1.
Additionally, this recipe outlines the steps required to manually
create a schema for a flat file. BizTalk 2010 also includes a Flat File
Wizard for creating flat file schemas
NOTE
The number bar at the top of Listing 1
is included for reference only and is not part of the file content. The
number bar is for counting the position of the characters.
Example 1. CustomerSalesOrder.txt
123456789012345678901234567890123456789012345678912345678901234567890 ORDER2004-10-24 SoldTo Shauna Marie 1223 Buttercup Lane Seattle WA 98155 ShipTo Jen Schwinn 3030 Moby Road Kent WA 98110 ITEMS,ITEM111-AA|Grape|1|2.00|Bottle of Pop,ITEM111-AB|Cola| 1|2.00|Bottle of Pop
|
Additionally, the outbound BizTalk schema may have a structure similar to the XML file shown in Listing 2.
Example 2. CustomerSalesOrder.xml
<ns0:order xmlns:ns0="http://Schema_Chapter_Project.CustomerSalesOrder"> <date>10/24/2004</date> <customerHeader> <customerType>SoldTo</customerType> <fullName>Shauna Marie</fullName> <street>1223 Buttercup Lane</street> <city>Seattle</city> <state>WA</state> <postal>98155</postal> </customerHeader> <customerHeader> <customerType>ShipTo</customerType> <fullName>Jen Schwinn</fullName> <street>3030 Moby Road</street> <city>Kent</city> <state>WA</state> <postal>98110</postal> </customerHeader> <items> <item> <productId>ITEM111-AA</productId> <productName>Grape</productName> <quantity>1</quantity> <unityPrice>2.00</unityPrice> <description>Bottle of Pop</description> </item> <item> <productId>ITEM111-AB</productId> <productName>Cola</productName> <quantity>1</quantity> <unityPrice>2.00</unityPrice> <description>Bottle of Pop</description> </item> </items> </ns0:order>
|
Follow these steps to create the flat file schema:
Create a new BizTalk schema, and select the Flat File Schema template.
Determine
the structure and layout of your message schema. The structure and
layout of the message schema will largely determine how the inbound
document is parsed. In the XML sample in Listing 2, all data fields were defined as string elements. The customerHeader, items, and item nodes are defined as records.
Select the root node (the orders
node in this example), and specify the child delimiter to be a carriage
return and a line feed (CRLF). The most straightforward way to set the
delimiter to a CRLF is by setting the child delimiter type to be Hexadecimal.
Specify the child delimiter property to be 0x0D 0x0A.
Set child order to be Infix.
Set tag identifier to read ORDER. The tag identifier property tells the schema where the data begins for the message. The children data for the root node of order are delimited by commas that appear in the middle of the data.
Based on the fact that there are two instances of customerHeader information, the max cardinality property for customerHeader record must be set to 2.
Set the structure for customerHeader to positional,
since all of the child elements that represent customer information are
related in a positional format within the flat file. Each child node
that exists under customerHeader must
have the position defined for the length of the data and the offset for
where that value begins in the file. The way the value is represented
starts from the left of the data element. For the length and offset of
each element, see Table 1.
Set the next node tag value to ITEMS since Items is the next heading in the flat file.
Identify the delimiter as a comma, and set the child-order to prefix, since each item will be prefixed with a comma to indicate the beginning of that item.
Make sure that child delimiter type is set to character. Select item, and make sure the child delimiter is set to the pipe character (|), since the attributes for the items are delimited by the pipe character in the flat file.
Set the pipe character to infix, since each line has pipe characters set in between the delimited characteristics.
Table 1. Customer Header Child Elements
Element | Positional Length | Offset |
---|
customerType | 7 | 0 |
fullName | 16 | 0 |
street | 20 | 0 |
city | 14 | 0 |
state | 3 | 0 |
postal | 5 | 0 |
To test the flat file output of the schema, follow these steps:
Verify that Generate Instance Output Type on your schema reads Native. The Native property allows the schema to generate the native file format, which is a flat file, for that schema.
Right-click the schema, and select Generate Instance. You should see the default generated flat file.
To test the flat file schema to see the XML generated based on the provided flat file, follow these steps:
Verify that Generate Instance Output Type on your schema reads XML. The XML property allows the schema to process the inbound flat file and translate that flat file to an XML representation.
Right-click the schema, and select Validate Instance. You should see the XML version of the processed flat file.
3. How It Works
BizTalk is capable of processing both positional and delimited data, either in individual files or in a single file. The child delimiter
is the key concept to keep in mind when creating a flat file schema.
Any parent-level record that contains child elements or attributes must
define whether the data in the flat file for those child records is
delimited or positional and how the data is delimited.
Based on the layout of the
destination message schema, you should consider the following when
dealing with records versus dealing with child elements and attributes:
Records:
If you use records to group child elements or attributes, consider how
the child records will be demarcated. Will the child data be delimited,
or is the child data positional? In the example in Listing 2-1,
each line of data is delimited by a CRLF. Knowing that each line of
data is delimited by a CRLF aids in determining whether the output
schema must support that specific delimiter. The basic line delimiter
information points to the need of specifying a delimiter of a CRLF for
the parent record of the output schema.
Tag identifiers:
Records may contain tag identifiers to distinguish one type of record
from another record. A tag value also allows you to identify where data
begins in the file.
Positional elements/attributes: In the XML example in Listing 2-2, the customerHeader
data is stored in positional format. For each child node, you must
provide the offset (where to start reading the data) and the length for
that data item. Additionally, the parent record must specify that the
child data structure is Positional.
Delimited elements/attributes: The flat file example in Listing 2-1
shows multiple items occurring on the same line delimited by the pipe
(|) character. The attributes related to a single item are then further
delimited by the comma character. The item's parent record must specify
that the child data structure is Delimited. Additionally, the child delimiter for the item's parent record must specify that each item is delimited by a pipe character.
Cardinality for records: By default, BizTalk sets the cardinality field for records and elements/attributes to a default value of 1. The value of 1
means that you expect to receive a maximum and minimum of one instance
of that record and associated child values. If you expect an inbound
flat file to contain more than a single record instance, you must change
the max occurs value to a number equal to unbounded or the number of instances you expect to receive.
Wrap characters:
If the incoming data contains characters that have been identified as
delimiting characters (for example, commas), those characters can be
ignored through the use of wrap characters. For example, if the record
contained the name Shauna, Marie and you wanted to have the comma included as part of the name, you could define a wrap character of " (double quote) and enclose the name within the wrap character: "Shauna, Marie". BizTalk will treat any special characters defined within a set of wrap characters as field-level data.
Escape characters:
The purpose of escape characters is very similar to that of wrap
characters. Escape characters specify the character to be used to escape
reserved characters, and then the reserved characters will be treated
as literal characters in a message.
If the records are delimited, you must determine how the records are delimited. For managing CRLF type transactions, child delimiter type is set to hexadecimal, and the delimiter is set to 0x0D 0x0A. If the delimiter is a character value like a comma, set child delimiter type to character. The other key consideration for using delimiters is defining the child-order of the delimiter:
If the delimiter appears after the set of data, the child-order of the delimiter is postfix.
If the delimiter appears before the set of data, the delimiter is prefix.
If the delimiter appears in the middle of the set of data, the delimiter is infix.
The default child-order for a record is conditional default. The conditional default value means that if a tag value has been set for that record, then the child-order will be prefix. Otherwise, the child-order will be infix.
If the records are
positional, you must determine the spacing of the delimited data. In the
example, it was assumed that there were no spaces between the data
(offsets set to 0) and the beginning of the data fields started at the
left of each data value.
Another key consideration to
keep in mind is the cardinality of the data groupings. When a BizTalk
schema is created, by default, the cardinality is set to 1
(even if no cardinality value is explicitly set for a field). In the
example, keeping the cardinality of the items set to the default would
have caused some data to be lost when parsing both the customerHeader values and the item values. In the example, the cardinality of max value was changed to 2 to account for both the shipTo and soldTo values.