BizTalk 2009 : How to Tune Each Subsystem (part 2)

7/26/2011 11:45:10 AM

3. File Tuning

Several issues can occur with the file adapter. File adapter-related issues are usually the result of NetBIOS limitations or the polling agent. Microsoft's support articles recommend to increase the MaxMpxCt and the MaxCmds registry keys at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanworkstation\parameters to 2048 on the BizTalk Server as well as the file server holding the file share.^[]

^[] The support article "'The Network BIOS Command Limit Has Been Reached' Error Message in Windows Server 2003, in Windows XP, and in Windows 2000 Server" can be found at support.microsoft.com/?id=810886.

BizTalk Server 2009 can encounter problems when a polling notification and a file change notification occur at the same time. This problem can be avoided by disabling FRF (File Receive Functions) polling through the File Receive Location property pages.

3.1. File Tuning: Batch Files

When dealing with large flat files that generate thousands of subdocuments in an envelope, isolate the File Receive Adapter in a separate host. Set the batch size to 1 and the thread-pool size for that host to 1. This will reduce the number of files you are processing in one transaction from 20 to 1 and single thread the host. The batch size property could be set on the receive location property page. To set the thread-pool size to 1, set the MessagingThreadsPerCpu property, which defines the number of threads per CPU for the thread pool, on the host's property pages and create the MessagingThreadPoolSize registry key, which defines the number of threads per CPU in the thread pool, under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BTSSvc {$HostName}. The respective default values for both properties are 2 and 10. Setting those two values to 1 and dedicating a single thread in the host's thread pool to message processing ensures that multiple large flat files will not be competing for system resources within that host and that all the host's memory resources will be dedicated to processing the large message.

NOTE

If you have multiple receive locations receiving large flat files as well as smaller ones, group them under different receive handlers running on different hosts. This ensures that the tuning performed on the host instances running the File Receive handler for large-flat-file processing does not affect the rest of the file receive functions processing smaller files. It is recommended to partition those receive handlers on different servers within the BizTalk Server Group by interleaving host instances on the different servers to ensure they are not competing for the same system resources.

When supporting large interchanges in BizTalk Server 2009, multiple smaller interchanges utilize the CPU processor more efficiently than fewer large interchanges. As a general guideline, use the following formula to determine the maximum size of an interchange for any given deployment (number of CPU processors):

Maximum number of messages per interchange <= 200,000 / (Number of CPUs × Batch-Size × MessagingThreadPoolSize)

So, for example, a BizTalk host running on a four-CPU server tuned for large-flat-file processing, having a batch size of 1 and a single thread in its messaging thread pool, would be able to process an infinite number of interchanges as long as each interchange contains a maximum of 50,000 messages (200,000 divided by 4). Thus, MessagingThreadPoolSize is set to 1.

3.2. Parsing and Persistence

Persistence affects the overall system performance. Message parsing affects performance due to the incurred persistence points in the process. To tune the BizTalk solution and minimize the number of persistence points, change the Large Message Threshold and Fragment Size property of the BizTalk Server Group. The default value for this property is 1MB, meaning that each 1MB read from the message will result in a fragment being persisted to the Messagebox. To further elaborate, as stated in the white paper "BizTalk Server 2006 Runtime Improvements" (Microsoft, 2005):^[]

^[] Copyright © 2005 by Microsoft Corporation. Reprinted with permission from Microsoft Corporation.

In previous releases of BizTalk Server, mapping of documents always occurred in-memory. While in-memory mapping provides the best performance, it can quickly eat up resources when large documents are mapped. For this reason, BizTalk Server 2006 introduced support for large message transformations. A different transformation engine is used when transforming large messages so that memory is utilized in an efficient manner. When dealing with large messages, the message data is buffered to the file system instead of being loaded into memory using the DOM (Document Object Model). This way the memory consumption remains flat as memory is used only to store the cashed data and indexes for the buffer. However, as the file system is used, there is expected performance degradation when comparing with in-memory transformation. Because of the potential performance impact, the two transformation engines will coexist in BizTalk Server 2006.

When message size is smaller than a specified threshold, the in-memory transformation will be used. If message size exceeds the threshold then the large message transformation engine is used. The threshold is configurable using the registry

* DWORD 'TransformThreshold'

* 'HKLM\\Software\\Microsoft\\BizTalk Server\\3.0\\Administration'.

If the solution handles a low number of large messages, increase this value to a large value like 5MB. If the solution handles a high number of small/medium messages, set this value to 250K. You will need to experiment with this setting to find the optimum value for your solution and messages. Increasing the Large Message Threshold and Fragment Size property for the BizTalk Server Group results in fewer persistence points, in turn causing fewer round-trips to the database and faster message processing. The drawback of this approach is higher memory utilization, as fragments kept in memory now are much larger in size. To compensate for the expected higher memory utilization by the large message fragments, control the number of large message buffers that are created by the BizTalk host. You can do so by creating a MessagingLMBufferCacheSize (DWORD) registry key under System\CurrentControlSet\Services\BTSSvc<HostName> and setting its value to 5.

By controlling the number of large message buffers, you are hedging the risk of having the host run into low memory situations due to large message processing without incurring the penalty of constant round-trips to the Messagebox (Wasznicky, 2006).^[]

^[] Copyright © 2006 by Microsoft Corporation. Reprinted with permission from Microsoft Corporation.

4. Latency

The time taken to process a message is dependent on how often the different BizTalk Server components pick up work items from the Messagebox. This interval affects the rate at which received messages are being published to the Messagebox as well as the rate at which they are being picked up from the Messagebox for processing or delivery. To deliver enterprise capabilities such as fault tolerance and scalability, the distributed BizTalk Server agents have to communicate asynchronously through the Messagebox. This asynchronous communication scheme means that the agents have to check the Messagebox for state updates to pick up new items for processing and update the Messagebox at appropriate points in the process. This polling process contributes to the inherent latency of BizTalk solutions. If the end-to-end processing time per business transaction under low loads is unacceptable, you might want to look into tuning the interval at which the different agents check the Messagebox. By default, the MaxReceiveInterval is set to 500 msecs. You can reset this interval to a value as low as 100 msecs by modifying it in the adm_ServiceClass table for the XLANG/s, Messaging Isolated, and Messaging In-Process hosts. If the overall environment is experiencing high loads on the database while the overall end-to-end business transaction processing speed is acceptable, you can increase the MaxReceiveInterval and check whether that improves the overall environment's stability.

5. Throttling

Throttling is the mechanism by which the runtime engine prevents itself from thrashing and dropping dead when exposed to a high load. A properly throttled engine takes up only the amount of load that it can handle, and detects a stressed situation quickly and mitigates the situation accordingly.

5.1. Before BizTalk Server 2009

In BizTalk 2004, throttling BizTalk Server includes manipulating entries in the adm_Service-Class table. Manipulating that table manually is now deprecated as it was troublesome and originally undocumented. Throttling BizTalk Server 2004 manually usually leads to more problems if inexperienced administrators start manipulating it, as it is mostly a trial-and-error exercise.

Manipulating the adm_ServiceClass table affects the entire BizTalk Server Group, not just a specific host instance. The configuration settings are not host specific and hence are not useful in any configuration with multiple hosts. If different servers in the BizTalk Server Group have different hardware configurations, having the same settings across different hardware is not the best approach. Other problem areas in BizTalk Server 2004 are as follows:

The stress detection mechanism in BizTalk 2004 is grossly dependent on user input, namely the low and high watermark numbers in the adm_ServiceClass table.
The configuration parameters are not exposed to the user through the UI.
The inbound throttling heuristic (session count based) is not very effective because XLANG does not factor this at all, and all the sessions are shared across all the service classes.
The agent's memory-based throttling policy has two major drawbacks:
- First, it looks into the global memory and does not take into account the local memory usage. So if the server has more memory than 2GB, it might not be throttling properly, as the maximum amount of memory that a host instance can consume is 2GB on a Windows 32-bit platform. So, while the server could still have free memory that is not being consumed by other services, a particular host might be running out of the memory that it could consume without throttling.
- Second, while enforcing throttling due to low memory condition, the agent does not do anything to improve the memory usage situation, other than elevating the stress level. Once it enters a stress mode due to high memory condition, no measure is taken for it to come out of this stage, and hence it remains in this state for a long time. As the system starts again, it reloads all dehydrated orchestrations, resulting in an elevated rate of resource consumption leading to the same situation that caused the throttle in the first place (Wasznicky, 2006).^[]
  ^[] Copyright © 2006 by Microsoft Corporation. Reprinted with permission from Microsoft Corporation.

5.2. Throttling Goals for BizTalk Server

One of the Microsoft development team's objectives for BizTalk Server was to get around the nuances of throttling configuration. The target was a system that avoids using user-input parameters for detecting stress condition—a system with heuristics that include monitoring of resources (e.g., memory, threads, database sessions), utilization, and progress of work items against submitted work items. This would allow the system to function automatically without the administrator having to deal with the unknowns surrounding the various control knobs for the watermark numbers and other settings in the adm_ServiceClass table. Some parameters still have to be configured manually. The bright side is that they can be set and manipulated through the administration UI, and they have out-of-box valid settings. Those parameters are now at host level rather than group level.

The aim is to eventually communicate throttling actions to the system administrator through Event Logs and performance counters. Currently only the inbound throttling is communicated through the Event Log.

If the system is throttled due to lack of a particular resource, the engine proactively tries to mitigate the situation by releasing that particular resource so that it comes out of the stress situation. For example, under low memory, cache should be shrunk and MSMQ instances should be dehydrated.

Unlike BizTalk Server 2004, BizTalk Server 2006 and BizTalk 2009 throttling takes into account process memory, in addition to global memory. All components follow uniform throttling policies to ensure a fair distribution of resources.

5.3. Auto-Throttling in 2009

BizTalk Server 2009 auto-throttling consists of a set of load detection algorithms and mitigation plans. Table 3 highlights those algorithms.

Table 3. BizTalk Server 2006 Auto-Throttling Mechanisms (Wasznicky, 2006)^[]
Detection	Mitigation	Affected Components	Monitors
Compare Message Delivery Rate with the Message Completion Rate. When the latter falls short, it is an indication of the fact that messages are being pushed at higher rate than the service can handle.	Throttle message delivery so that the delivery rate comes down and becomes at par with the completion rate.	XLANG All outbound transports	Need to monitor Message Delivery Rate and Message Completion Rate.
Compare the Publishing Request Rate with Publishing Completion Rate. When the latter falls short, it is an indication of the Messagebox being unable to cope with the load.	Block the publishing threads to slow down the publishing rate AND/OR indicate service class to slow down publishing.	XLANG All inbound transports	Need to monitor entry and exit of Commit Batch call.
Process memory exceeds a threshold.	Throttle publishing if batch has steep memory requirement. Throttle delivery. Indicate service to dehydrate/shrink cache.	XLANG All transports	Monitor Private Bytes.
System memory exceeds a threshold.	Throttle publishing if batch has steep memory requirement. Throttle delivery.	XLANG All transports	All transports memory.
Database sessions being used by the process exceed a threshold count.	Throttle publishing.	XLANG All inbound transports	Monitor average session usage per Messagebox.
Any host message queue size, the spool size, or the tracking data size exceeds a particular host-specific threshold in database.	Throttle publishing if batch is going to create more records in the database than delete.	XLANG All inbound transports	Monitor queue size against respective threshold.
Process thread count exceeds a particular threshold.	Throttle publishing. Throttle delivery. Indicate service to reduce thread-pool size.	XLANG All transports	Monitor threads per CPU.
Number of messages delivered to a service class exceeds a particular threshold count.	Throttle delivery.	XLANG All outbound transports	This is needed for send port throttling where the EPM expects only a limited number of messages at a time.

^[] Copyright © 2006 by Microsoft Corporation. Reprinted with permission from Microsoft Corporation.

To perform this auto-throttling, the server uses the configurable parameters detailed in Table 4.

Table 4. BizTalk Server 2006 Auto-Throttling Parameters (Wasznicky, 2006)^[]
Name	Type	Description	Default Value	Min Value	Max Value
Message Delivery Throttling Configuration
Sample-space size	Long	Number of samples that are used for determining the rate of the message delivery to all service classes of the host. This parameter is used to determine whether the samples collected for applying rate-based throttling are valid or not. If the number of samples collected is lower than the sample size, the samples are discarded because the system is running under a low load and hence no throttling may be required. Thus this value should be at par with a reasonable rate at which messages can be consumed under a medium load. For example, if the system is expected to process at 100 docs per second in a medium load, then this parameter should be set to (100 × sample window duration in seconds). If the value is set too low, the system may overthrottle on low load. If the value is too high, there may not be enough samples for this technique to be effective. Zero indicates rate-based message delivery throttling is disabled.	100	0	N/A
Sample-space window	Long	Duration of the sliding time window (in milliseconds) within which samples will be considered for calculation of rate. Zero indicates rate-based message delivery throttling is disabled.	15,000	1,000	N\A
Overdrive factor	Long	Percent factor by which the system will try to overdrive the input. That is, if the output rate is 200 per second and the overdrive factor is 125%, the system will allow up to 250 (200 × 125%) per second to be passed as input before applying rate-based throttling. A smaller value will cause a very conservative throttling and may lead to over-throttling when load is increased, whereas a higher value will try to adapt to the increase in load quickly, at the expense of slight underthrottling.		125	100
Maximum delay	Long	Maximum delay (in milliseconds) imposed for message delivery throttling. The actual delay imposed is a factor of how long the throttling condition persists and the severity of the particular throttling trigger. Zero indicates message delivery throttling is completely disabled.		300,000	0
Message Publishing Throttling Configuration
Sample-space size	Long	Number of samples that are used for determining the rate of the message publishing by the service classes. This parameter is used to determine whether the samples collected for applying rate-based throttling are valid or not. If the number of samples collected is lower than the sample size, the samples are discarded because the system is running under a low load, and hence no throttling may be required. Thus this value should be at par with a reasonable rate at which messages can be consumed under a medium load. For example, if the system is expected to publish 100 docs per second in a medium load, this parameter should be set to (100 × sample window duration in seconds). If the value is set too low, then the system may overthrottle on low load. If the value is too high, there may not be enough samples for this technique to be effective. Zero indicates rate-based message publishing throttling is disabled.	100	0	N/A
Sample-space window	Long	Duration of the sliding time window (in milliseconds) within which samples will be considered for calculation of rate. Zero indicates rate-based message publishing throttling is disabled.	15,000	1,000	N/A
Overdrive factor	Long	Percent factor by which the system will try to overdrive the input. That is, if the output rate is 200 per second and the overdrive factor is 125%, the system will allow up to 250 (200 × 125%) per second to be passed as input before applying rate-based throttling. A smaller value will cause a very conservative throttling and may lead to overthrottling when load is increased, whereas a higher value will try to adapt to the increase in load quickly, at the expense of slight underthrottling.	125	100	N/A
Maximum delay	Long	Maximum delay (in milliseconds) imposed for message publishing throttling. The actual delay imposed is a factor of how long the throttling condition persists and the severity of the particular throttling trigger. Zero indicates message publishing throttling is completely disabled.	300,000	0	N/A
Other Configuration and Thresholds
Delivery queue size	Long	Size of the in-memory queue that the host maintains as a temporary place-holder for delivering messages. Messages for the host are dequeued and placed in this in-memory queue before finally delivering to the service classes. Setting a large value can improve low-latency scenarios since more messages will be proactively dequeued. However, if the messages are large, the messages in the delivery queue would consume memory and hence a low queue size would be desirable for large message scenarios to avoid excessive memory consumption. The host needs to be restarted for this change to take effect.	100	1	N/A
Database session threshold	Long	Maximum number of concurrent database sessions (per CPU) allowed before throttling begins. Note that the idle database sessions in the common per-host session pool do not add to this count, and this check is made strictly on the number of sessions actually being used by the host. This is disabled by default and may be enabled if the data-base server is low end compared to the host servers. Zero indicates session-based throttling is disabled.	0	0	N/A
System memory threshold	Long	Maximum system-wide physical memory usage allowed before throttling begins. This threshold can be presented either in absolute value in MB or in percent-available format. A value of less than 100 indicates a percent value. Throttling based on this factor is equivalent to yielding to other processes in the system that consume physical memory. Zero indicates system memory-based throttling is disabled.	0	0	N/A
Process memory threshold	Long	Maximum process memory (in MB) allowed before throttling begins. This threshold can be presented either in absolute value in MB or in percent-available format. A value of less than 100 indicates a percent value, and when a percent value is specified, the actual MB limit is dynamically computed based on the total virtual memory that the host can grow to (limited by the amount of free physical memory and page file; and on 32-bit systems, this is further limited by the 2GB address space). The user-specified value is used as a guideline, and the host may dynamically self-tune this threshold value based on the memory usage pattern of the process. This value should be set to a low value for scenarios having large memory requirement per message. Setting a low value will kick in throttling early on and prevent a memory explosion within the process. Zero indicates process-memory-based throttling is disabled.	25%	0	N/A
Thread threshold	Long	Maximum number of threads in the process (per CPU) allowed before throttling begins. The user-specified value is used as a guideline, and the host may dynamically self-tune this threshold value based on the memory usage pattern of the process. The thread-based throttling is disabled by default. In scenarios where excessive load can cause an unbounded thread growth (e.g., custom adapter creates a thread for each message), this should be enabled. Zero indicates thread-count-based throttling is disabled.	0	0	N/A
Message count in database threshold	Long	Maximum number of unprocessed messages in the database (aggregated over all Messageboxes). This factor essentially controls how many records will be allowed in the destination queue(s) before throttling begins. In addition to watching the destination queues, the host also checks the size of the spool table and the tracking-data tables and ensures they do not exceed a certain record count (by default, 10 times the message-count threshold). Zero indicates database-size-based throttling is disabled.	50,000	0	N/A
In-process message threshold	Long	Maximum number of in-memory inflight messages (per CPU) allowed before message delivery is throttled. In-process messages are those that are handed off to the transport manager/XLANG engine, but not yet processed. The user-specified value is used as a guideline and the host may dynamically self-tune this threshold value based on the memory usage pattern of the process. In scenarios where the transport may work more efficiently with fewer messages at a time, this value should be set to a low value. Zero indicates in-process message-count-based throttling is disabled.	1,000	0	N/