A major problem that many have discovered is that
accommodating extremely large (200MB+) files can be a major performance
bottleneck. The shame is that in many cases the documents that are being
retrieved are simply going to be routed to another outbound source.
This is typical of the Enterprise Service Bus (ESB) type of architecture
scenario
In short, an ESB is software that is used to link internal and partner
systems to each other—which basically is what BizTalk is designed to do
out of the box. For these types of architectures, large files are
generally routed through the ESB from an external party to an internal
party or from internal to internal systems. Most times, the only logic
that needs to be performed is routing logic. In many cases, this logic
can be expressed in a simple filter criteria based on the default
message context data, or by examining data elements within the message,
promoting them, and then implementing content-based routing. Also in
many cases, the actual message body's content is irrelevant beyond
extracting properties to promote. The performance bottleneck comes into
play when the entire file is received, parsed by the XMLReceive
pipeline, and then stored into the Messagebox. If you have ever had to
do this on a 200MB file, even though it works, there is a nasty impact
to the CPU utilization on your BizTalk and SQL Server machines, where
often the machines' CPU usage goes to 100% and the system throughput
essentially goes down the drain.Now imagine having to process
10 or 20 of these per minute. The next problem is going to be sending
the file. The system will essentially take this entire performance hit
all over again when the large file needs to be read from SQL Server out
of BizTalk and sent to the EPM. You can quickly see how this type of
scenario, as common as it is, most often requires either significant
hardware to implement or a queuing mechanism whereby only a small number
of files can be processed at a time.
You'll find a simple solution in BizTalk Server's capability to natively understand and use streams.
The following examples show a decoding component that will receive the
incoming message, store the file to disk in a uniquely named file, and
store the path to the file in the IBaseMessagePart.Data
property. The end result will be a message that only contains the path
to the text file in its data, but will have a fully well-formed message
context so that it can be routed. The component will also promote a
property that stores the fact that this is a "large encoded message."
This property will allow you to route all messages encoded using this
pipeline component to a particular send port/pipeline that has the
corresponding encoding component. The encoding component will read the
data element for the path to the file, open up a file stream object that
is streaming the file stored to disk, set the stream to the 0 byte
position, and set the IBaseMessagePart.Data property to the FileStream.
The end result will be that the file is streamed by the BizTalk runtime
from the file stored on the disk and is not required to pass through
the Messagebox. Also, performance is greatly improved, and the CPU
overhead on both the BizTalk Server host instance that is sending the
file and the SQL Server hosting the BizTalk Messagebox is essentially
nil.
The partner to this is the
sending component. In many scenarios, BizTalk is implemented as a
routing engine or an Enterprise Service Bus. This is a fancy way of
saying that BizTalk is responsible for moving data from one location
within an organization to another. In many cases, what does need to be
moved is large amounts of data, either in binary format or in text
files. This is often the case with payment or EDI-based systems in which
BizTalk is responsible for moving the files to the legacy system where
it can process them. In this scenario, the same performance problem (or
lack of performance) will occur on the send side as on the receive side.
To account for this, the examples also include a send-side pipeline
component that is used to actually send the large file to the outbound
destination adapter.
1. Caveats and Gotchas
The solution outlined
previously works very well so long as the issues described in the
following sections are taken into account. Do not simply copy and paste
the code into your project and leave it at that. The solution provided
in this section fundamentally alters some of the design principles of
the BizTalk Server product. The most important one of these is that the
data for the message is no longer stored in the Messagebox. A quick list
of the pros and cons of the proposed solution is provided here:
Pros:
Provides extremely fast access for moving large messages
Simple to add new features
Reusable across multiple receive locations
Message containing context can be routed to orchestration, and data can be accessed from the disk
Cons:
No ability to apply BizTalk Map
No failover via Messagebox
Custom solution requiring support by developer
Need a scheduled task to clean up old data
1.1. Redundancy, Failover, and High Availability
As was stated earlier, the
data for the large message will no longer be stored in SQL Server. This
is fundamentally different from how Microsoft designed the product. If
the data within the message is important and the system is a
mission-critical one that must properly deal with failovers and errors,
you need to make sure that the storage location for the external file is
also as robust as your SQL Server environment. Most architects in this
situation will simply create a share on the clustered SQL Server shared
disk array. This share is available to all BizTalk machines in the
BizTalk Server Group, and since it is stored on the shared array or the
storage area network (SAN), it should be as reliable as the data files
for SQL Server.
1.2. Dealing with Message Content and Metadata
A good rule of thumb for this
type of solution is to avoid looking at the message data at all costs
once the file has been received. Consider the following: assume that you
have received your large file into BizTalk and you need to process it
through an orchestration for some additional logic. What happens? You
will need to write .NET components to read the file and manually parse
it to get the data you need. The worst-case scenario is that you need to
load the data into an XMLDom or something similar. This will have
performance implications and can negate the entire reason for the
special large-file handling you are implementing.
If you know you are going to
need data either within an orchestration or for CBR, make sure you write
the code to gather this data within either the receiving or sending
pipeline components. Only open the large data file at the time when it
is being processed within the pipeline if you can. The best approach is
to promote properties or create custom distinguished fields using code
from within the component itself, which you can access from within
BizTalk with little performance overhead.
1.3. Cleaning Up Old Data
If you read through the code in the section "Large Message Encoding Component (Send Side),"
you will notice that there is no code that actually deletes the message
from the server. There is a good reason for this. Normally you would
think that once the message has flowed through the send pipeline it
would be okay to delete it, but this is not true. What about a send-side
adapter error? Imagine if you were sending the file to an FTP server
and it was down; BizTalk will attempt to resend the message after the
retry period has been reached. Because of this, you can't simply delete
the file at random. You must employ a managed approach.
The only real solution to this
would be to have a scheduled task that executes every few minutes that
is responsible for cleaning up the data directory. You will notice that
the name of the file is actually the InterchangeID GUID for the message flow. The InterchangeID
provides you with a common key that you can use to query each of the
messages that have been created throughout the execution path. The
script that executes needs to read the name of the file and use WMI to
query the Messagebox and determine whether there are any suspended or
active messages for that Interchange. If there are, it doesn't delete
the file; otherwise, it will delete the data file.
1.4. Looping Through the Message
As stated previously, if you do
know you will need the data within the message at runtime, and this
data is of an aggregate nature (sums, averages, counts, etc.), only loop
through the file once. This seems like a commonsense thing, but it is
often overlooked. If you need to loop through the file, try to get all
the data you need in one pass rather than several. This can have
dramatic effects on how your component will perform.
2. Large Message Decoding Component (Receive Side)
This component is to be used on
the receive side when the large message is first processed by BizTalk.
You will need to create a custom receive pipeline and add this pipeline
component to the Decode stage. From there, use the SchemaWithNone
property to select the desired inbound schema type if needed. If the
file is a flat file or a binary file, then this step is not necessary,
because the message will not contain any namespace or type information.
This component relies on a property schema being deployed that will be
used to store the location to the file within the message context. This
schema can also be used to define any custom information such as counts,
sums, and averages that is needed to route the document or may be
required later on at runtime.
Imports System
Imports System.IO
Imports System.Text
Imports System.Drawing
Imports System.Resources
Imports System.Reflection
Imports System.Diagnostics
Imports System.Collections
Imports System.ComponentModel
Imports Microsoft.BizTalk.Message.Interop
Imports Microsoft.BizTalk.Component.Interop
Imports Microsoft.BizTalk.Component
Imports Microsoft.BizTalk.Messaging
Imports Microsoft.BizTalk.Component.Utilities
Namespace Probiztalk.Samples.PipelinesComponents
<(CategoryTypes.CATID_PipelineComponent), _
System.Runtime.InteropServices.Guid("89dedce4-0525-472f-899c-64dc66f60727"), _
ComponentCategory(CategoryTypes.CATID_Decoder)> _
Public Class LargeFileDecodingComponent
Implements IBaseComponent, IPersistPropertyBag, IComponentUI, _
Global.Microsoft.BizTalk.Component.Interop.IComponent, IProbeMessage
Private _OutBoundFileDocumentSpecification As SchemaWithNone = _
New Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone("")
Private _InboundFileDocumentSpecification As SchemaWithNone = _
New Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone("")
Private _ThresholdSize As Integer = 4096
Private resourceManager As System.Resources.ResourceManager = _
New System.Resources.ResourceManager( _
"Probiztalk.Samples.PipelineComponents.LargeFileDecodingComponent", _
[Assembly].GetExecutingAssembly)
Private Const PROPERTY_SCHEMA_NAMESPACE = _
"http://LargeFileHandler.Schemas.LargeFilePropertySchema"
Private _FileLocation As String
'<summary>
'this property will contain a single schema
'</summary>
<Description("The inbound request document specification. " & _
"Only messages of this type will be accepted by the component.")> _
<DisplayName("Inbound Specification")> _
Public Property InboundFileDocumentSpecification() As _
Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone
Get
Return _InboundFileDocumentSpecification
End Get
Set(ByVal Value As _
Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone)
_InboundFileDocumentSpecification = Value
End Set
End Property
'<summary>
'this property will contain a single schema
'</summary>
<Description("The Large File Message specification." & _
"The component will create messages of this type.")> _
<DisplayName("Outbound Specification")> _
Public Property OutBoundFileDocumentSpecification() As _
Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone
Get
Return _OutBoundFileDocumentSpecification
End Get
Set(ByVal Value As _
Global.Microsoft.BizTalk.Component.Utilities.SchemaWithNone)
_OutBoundFileDocumentSpecification = Value
End Set
End Property
<Description("Threshold value in bytes for incoming file to determine" & _
"whether or not to treat the message as large. Default is 4096 bytes")&> _
<DisplayName("Threshold file size")> <DefaultValue(4096)> _
Public Property ThresholdSize() As Integer
Get
Return Me._ThresholdSize
End Get
Set(ByVal value As Integer)
Me._ThresholdSize = value
End Set
End Property
<Description("Directory for storing decoded large messages." & _
"Defaults to C:\Temp.")> _
<DisplayName("Large File Folder Location")> _
Public Property LargeFileFolder() As String
Get
Return Me._FileLocation
End Get
Set(ByVal value As String)
Me._FileLocation = value
End Set
End Property
'<summary>
'Name of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Name() As String Implements _
Global.Microsoft.BizTalk.Component.Interop.IBaseComponent.Name
Get
Return resourceManager.GetString("COMPONENTNAME", _
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Version of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Version() As String Implements _
Global.Microsoft.BizTalk.Component.Interop.IBaseComponent.Version
Get
Return resourceManager.GetString("COMPONENTVERSION", _
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Description of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Description() As String Implements _
Global.Microsoft.BizTalk.Component.Interop.IBaseComponent.Description
Get
Return resourceManager.GetString("COMPONENTDESCRIPTION", _
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Component icon to use in BizTalk Editor
'</summary>
<Browsable(False)> _
Public ReadOnly Property Icon() As IntPtr Implements _
Global.Microsoft.BizTalk.Component.Interop.IComponentUI.Icon
Get
Return CType(Me.resourceManager.GetObject("COMPONENTICON", _
System.Globalization.CultureInfo.InvariantCulture), _
System.Drawing.Bitmap).GetHicon
End Get
End Property
'<summary>
'Gets class ID of component for usage from unmanaged code.
'</summary>
'<param name="classid">
'Class ID of the component
'</param>
Public Sub GetClassID(ByRef classid As System.Guid) _
Implements _
Global.Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.GetClassID
classid = New System.Guid("89dedce4-0525-472f-899c-64dc66f60727")
End Sub
'<summary>
'not implemented
'</summary>
Public Sub InitNew() _
Implements _
Global.Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.InitNew
End Sub
'<summary>
'Loads configuration properties for the component
'</summary>
'<param name="pb">Configuration property bag</param>
'<param name="errlog">Error status</param>
Public Overridable Sub Load( _
ByVal pb As Global.Microsoft.BizTalk.Component.Interop.IPropertyBag, _
ByVal errlog As Integer) _
Implements _
Global.Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.Load
Try
Me._ThresholdSize = ReadPropertyBag(pb, "ThresholdSize")
Catch
Me._ThresholdSize = 4096
End Try
Try
Me._FileLocation = ReadPropertyBag(pb, "FileLocation")
Catch
Me._FileLocation = "C:\Temp"
End Try
Try
Me.InboundFileDocumentSpecification = New _
SchemaWithNone( _
ReadPropertyBag(pb, "InboundFileDocumentSpecification"))
Catch
Me.InboundFileDocumentSpecification = New SchemaWithNone("")
End Try
Try
Me.OutBoundFileDocumentSpecification = New _
SchemaWithNone( _
ReadPropertyBag(pb, "OutboundFileDocumentSpecification"))
Catch
Me.OutBoundFileDocumentSpecification = New SchemaWithNone("")
End Try
End Sub
'<summary>
'Saves the current component configuration into the property bag
'</summary>
'<param name="pb">Configuration property bag</param>
'<param name="fClearDirty">not used"<\param>
'<param name="fSaveAllProperties">not used"<\param>
Public Overridable Sub Save( _
ByVal pb As Global.Microsoft.BizTalk.Component.Interop.IPropertyBag, _
ByVal fClearDirty As Boolean, ByVal fSaveAllProperties As Boolean) _
Implements Global.Microsoft.BizTalk.Component.Interop. _
IPersistPropertyBag.Save
WritePropertyBag(pb, "ThresholdSize", Me._ThresholdSize)
WritePropertyBag(pb, "FileLocation", Me._FileLocation)
WritePropertyBag(pb, "InboundFileDocumentSpecification", _
_InboundFileDocumentSpecification.SchemaName)
WritePropertyBag(pb, "OutboundFileDocumentSpecification", _
_OutBoundFileDocumentSpecification.SchemaName)
End Sub
'<summary>
'Reads property value from property bag
'</summary>
'<param name="pb">Property bag"<\param>
'<param name="propName">Name of property"<\param>
'<returns>Value of the property"<\returns>
Private Function ReadPropertyBag( _
ByVal pb As Global.Microsoft.BizTalk.Component.Interop.IPropertyBag, _
ByVal propName As String) As Object
Dim val As Object = Nothing
Try
pb.Read(propName, val, 0)
Catch e As System.ArgumentException
Return val
Catch e As System.Exception
Throw New System.ApplicationException(e.Message)
End Try
Return val
End Function
'<summary>
'Writes property values into a property bag.
'</summary>
'<param name="pb">Property bag."<\param>
'<param name="propName">Name of property."<\param>
'<param name="val">Value of property."<\param>
Private Sub WritePropertyBag( _
ByVal pb As Global.Microsoft.BizTalk.Component.Interop.IPropertyBag, _
ByVal propName As String, ByVal val As Object)
Try
pb.Write(propName, val)
Catch e As System.Exception
Throw New System.ApplicationException(e.Message)
End Try
End Sub
'<summary>
'The Validate method is called by the BizTalk Editor during the build
'of a BizTalk project.
'</summary>
'<param name="obj">An Object containing the
'configuration properties."<\param>
'<returns>The IEnumerator enables the caller to enumerate through a
'collection of strings containing error messages. These error messages
'appear as compiler error messages. To report successful property _
'validation, the method should return an empty enumerator."<\returns>
Public Function Validate(ByVal obj As Object) As _
System.Collections.IEnumerator Implements _
Global.Microsoft.BizTalk.Component.Interop.IComponentUI.Validate
'example implementation:
'ArrayList errorList = new ArrayList();
'errorList.Add("This is a compiler error");
'return errorList.GetEnumerator();
Return Nothing
End Function
'<summary>
'called by the messaging engine when a new message arrives
'checks if the incoming message is in a recognizable format
'if the message is in a recognizable format, only this component
'within this stage will be execute (FirstMatch equals true)
'</summary>
'<param name="pc">the pipeline context"<\param>
'<param name="inmsg">the actual message"<\param>
Public Function Probe(ByVal pc As _
Global.Microsoft.BizTalk.Component.Interop.IPipelineContext, _
ByVal inmsg As Global.Microsoft.BizTalk.Message.Interop.IBaseMessage) _
As Boolean Implements Global.Microsoft.BizTalk.Component. _
Interop.IProbeMessage.Probe
Dim xmlreader As New Xml.XmlTextReader(inmsg.BodyPart.Data)
xmlreader.MoveToContent()
If (_InboundFileDocumentSpecification.DocSpecName = _
xmlreader.NamespaceURI.Replace("http://", "")) Then
Return True
Else
Return False
End If
End Function
'<summary>
'Implements IComponent.Execute method.
'</summary>
'<param name="pc">Pipeline context"<\param>
'<param name="inmsg">Input message"<\param>
'<returns>Original input message"<\returns>
'<remarks>
'IComponent.Execute method is used to initiate
'the processing of the message in this pipeline component.
'</remarks>
Public Function Execute(ByVal pContext As IPipelineContext, _
ByVal inmsg As IBaseMessage) _
As Global.Microsoft.BizTalk.Message.Interop.IBaseMessage _
Implements Global.Microsoft.BizTalk.Component.Interop.IComponent.Execute
'Build the message that is to be sent out but only if it is greater
'than the threshold
If inmsg.BodyPart.GetOriginalDataStream.Length > Me._ThresholdSize Then
StoreMessageData(pContext, inmsg)
End If
Return inmsg
End Function
'<summary>
'Method used to write the message data to a file and promote the
'location to the MessageContext.
'</summary>
'<param name="pc">Pipeline context"<\param>
'<param name="inmsg">Input message to be assigned"<\param>
'<returns>Original input message by reference"<\returns>
'<remarks>
'Receives the input message ByRef then assigns the file stream to
'the messageBody.Data property
'</remarks>
Private Sub StoreMessageData(ByVal pContext As IPipelineContext, _
ByRef inMsg As IBaseMessage)
Dim FullFileName As String = _FileLocation + _
inMsg.MessageID.ToString + ".msg"
Dim dataFile As New FileStream(FullFileName, FileMode.CreateNew, _
FileAccess.ReadWrite, FileShare.ReadWrite, 4096)
Dim myMemoryStream As Stream = inMsg.BodyPart.GetOriginalDataStream
Dim Buffer(4095) As Byte
Dim byteCount As Integer
'Not really needed, just want to initialize the data within
'the message part to something.
'Proper way to do this would be to create a separate XML
'schema for messages which have been encoded using the
'encoder, create a new empty document which has an element
'named "FilePath" and set the value of the element
'to FullFileName. But at least this way we can see the value in
'the document should we need to write it out
Dim myStream As New MemoryStream(UTF8Encoding.Default. _
GetBytes(FullFileName))
If myMemoryStream.CanSeek Then
myMemoryStream.Position = 0
Else
'Impossible to occur, but added it anyway
Throw New Exception("The stream is not seekable")
End If
byteCount = myMemoryStream.Read(Buffer, 0, 4096)
While myMemoryStream.Position > myMemoryStream.Length - 1
dataFile.Write(Buffer, 0, 4096)
dataFile.Flush()
byteCount = myMemoryStream.Read(Buffer, 0, 4096)
End While
dataFile.Write(Buffer, 0, byteCount)
dataFile.Flush()
dataFile.Close()
inMsg.BodyPart.Data = myStream
inMsg.Context.Promote("LargeFileLocation", _
PROPERTY_SCHEMA_NAMESPACE, FullFileName)
'Useful for CBR operations - i.e. route all messages that are _
'large to a specific send port.
inMsg.Context.Promote("IsEncoded", PROPERTY_SCHEMA_NAMESPACE, True)
End Sub
End Class
End Namespace