Distributed file system (DFS) and SharePoint
facilitate data sharing in large organizations, but in very different
ways. DFS is a feature whose main benefit is to replicate file shares to
remote offices and to provide a consistent Universal Naming Convention
(UNC) pathname to file shares regardless of location in a network.
SharePoint, on the
other hand, provides access to data through team Web sites. SharePoint
sites can store files and documents, but they also provide version
control, bulletin boards, calendaring, and many other features.
Planning a DFS Deployment
DFS is a feature in
Windows Server 2008 that facilitates access to shared files in a large
network. As part of your overall network planning for data sharing and
collaboration, you should consider your network needs for file sharing,
review the features offered by DFS, and then determine whether this
feature can meet those needs.
Reviewing DFS Concepts and Features
DFS enables an
organization to build a single hierarchical view of file shares that
remains consistent across sites in a large network. Users access DFS
shares by specifying an alias pathname that remains identical regardless
of location. With DFS, shared files are replicated among multiple
servers so that by specifying the same pathname, users throughout the
network access a local copy of the hosted files. When permissions allow
changes to a file or folder, changes made to the local copy are also
replicated to other DFS servers.
Important: DFS fundamentals
If
you are not familiar with basic concepts related to DFS, be sure to
view the introductory Flash demonstration named Dfs.swf, which you can
access by visiting http://www.microsoft.com/windowsserver2003/evaluation/demos/dfs.html. Although this demonstration was created for Windows Server 2003, the fundamental concepts about DFS have not changed.
DFS is made up of the following network elements:
Namespace The virtual view of shared folders in an organization. A namespace is made up of the remaining elements on this list.
Namespace server
A namespace server hosts a namespace. A namespace server can be a
standalone server, a domain member server, or a domain controller.
Namespace root
The namespace root is the starting point of the namespace. A
domain-based namespace can be hosted on multiple namespace servers to
increase the availability of the namespace.
Folder A container in a namespace that redirects clients to a folder target.
Folder targets A location separate from a folder in which data and content is stored.
The elements that make up a DFS namespace are illustrated in Figure 1.
When
you create a new namespace, you can create it as either a domain-based
namespace or a standalone namespace. A domain-based namespace is
published to Active Directory Domain Services (AD DS) and supports the
file replication and built-in fault tolerance features. A standalone
namespace stores its configuration information in the Registry of the
namespace target that hosts it. Standalone namespaces do not integrate
with AD DS and are stored on a single namespace server. Standalone
namespaces do not support file replication.
When you create a
namespace in Windows Server 2008 mode, two enhancements are added.
First, Windows Server 2008 domain-based namespaces support increased
scalability (more than 5000 folders). In addition, Windows Server 2008
namespaces support access-based enumeration. (With access-based
enumeration, users can see on a file server only the files and folders
for which the users have proper permissions.)
To create a
domain-based namespace in Windows Server 2008 mode, your servers and
domain will need to meet the following requirements:
DFS Component Technologies
In Windows Server 2008, DFS is based on two underlying technologies: DFS Namespaces and DFS Replication.
DFS Namespaces
allow administrators to group shared folders located on different
servers and present them to users as a virtual tree of folders known as a
namespace. A namespace provides numerous benefits, including increased
availability of data, load sharing, and simplified data migration.
DFS
Replication is a multimaster replication engine that supports
replication scheduling and bandwidth throttling. DFS Replication uses a
compression protocol called Remote Differential Compression (RDC), which
can be used to efficiently update files over a limited-bandwidth
network. RDC detects insertions, removals, and rearrangements of data in
files, thereby enabling DFS Replication to replicate only the changes
when files are updated. Another important feature of DFS Replication is
that in choosing replication paths, it leverages the Active Directory
site links configured in Active Directory Sites and Services.
Figure 2
illustrates how DFS Namespaces and DFS Replication work together. In
step 1, client computers contact a namespace server and receive a
referral. In step 2, client computers access the first server provided
by their referrals. The actual targets on the hosting servers are
replicated with each other to allow local referrals.
More Info: DFS
For a full introduction to DFS, read “Overview of the Distributed File System Solution in Windows Server 2003 R2,” available at http://go.microsoft.com/fwlink/?LinkId=55315.
Although this paper deals with the version of the distributed file
system in Windows Server 2003 R2, the underlying concepts are the same
as those in Windows Server 2008.
DFS Namespaces Advanced Settings and Features
You can customize or
enable the following settings and features in DFS Namespaces as
necessary to design a DFS Namespaces solution for your organization.
Referral Ordering
A
referral is an ordered list of targets, transparent to the user, that a
client receives from a domain controller or namespace server when the
user accesses the namespace root or a folder with targets in the
namespace. The client caches the referral for a configurable period of
time.
Targets in the
client’s Active Directory site are listed first in a referral. (Targets
given the target priority “first among all targets” will be listed
before targets in the client’s site.) The order in which targets outside
of the client’s site appear in a referral is determined by one of the
following referral ordering methods:
You can set referral
ordering on the namespace root, and the ordering method applies to all
folders with targets in the namespace. You can also override the
namespace root’s ordering method for individual folders with targets.
Failover and Failback
Client failover in DFS
Namespaces is the process in which clients attempt to access another
target server in a referral after one of the servers fails or is removed
from the namespace. Client failback is an optional feature that enables
a client to fail back to a preferred, local server after it is
restored.
Failback occurs only when a
client has failed over to a more expensive server (in terms of site
link cost) than the server that is restored. If the restored server has
the same cost as the server that the client is currently connected to,
failback does not occur to the restored server. For example, if there
are two servers (Server 1 and Server 2) in the client’s site and Server 1
fails while the client is connected to it, the client will fail over to
Server 2. However, the client will not fail back to Server 1 when it is
restored because both servers are located in the same site and
therefore are associated with the same site link cost.
Note: Site link costs
You can view site link costs by using the Active Directory Sites and Services snap-in.
Target Priority
You can assign a
priority to individual targets for a given namespace root or folder.
This priority determines how the target is ordered in a referral. The
options are:
It is important to note
that setting target priority on a target will result in that target
always being present in a referral, even in cases where you set the
Exclude Targets Outside Of The Client’s Site option on the folder
associated with the target.
Redundant Domain-Based Namespace Servers
Multiple namespace servers
can host a domain-based namespace to increase the availability of the
namespace. Putting a namespace server in remote or branch offices also
allows clients to contact a namespace server and receive referrals
without having to cross expensive wide area network (WAN) connections.
Namespace Scalability Mode
To maintain a
consistent domain-based namespace across namespace servers, it is
necessary for namespace servers to periodically poll AD DS to obtain the
most current namespace data. If your organization will use more than 16
namespace servers to host a single namespace, it is recommended that
you enable namespace scalability mode. When this mode is enabled,
namespace servers running Windows Server 2003 and Windows Server 2008 do
not send change notification messages to other namespaces servers when
the namespace changes nor do they poll the PDC emulator every hour.
Instead, they poll their closest domain controller every hour to
discover updates to the namespace. (Regardless of whether namespace
scalability mode is enabled, changes to the namespace are always made on
the PDC emulator.)
Note: Root scalability mode
Namespace scalability mode was known as root scalability mode in Windows Server 2003.
DFS Replication Advanced Settings and Features
You can customize or
enable the following settings and features in DFS Replication as
necessary to design a DFS Replication solution for your organization.
RDC
RDC, which is the basis
for DFS replication, is a protocol that can be used to efficiently
update files 64 KB or larger over a limited-bandwidth network. RDC
detects insertions, removals, rearrangements of data in files regardless
of file type, enabling DFS Replication to replicate only the
changes when files are updated. To compute the changes to replicate,
RDC typically works on an older version of the file with the same name
that exists at the appropriate location in the replicated folder tree on
the receiving member.
In earlier versions of
Windows Server the protocol used to replicate files among folders in a
DFS namespace was File Replication Service (FRS). Unlike RDC, FRS copied
only entire files, not portions of files. As a result, DFS in earlier
versions of Windows is much more bandwidth-intensive than in Windows
Server 2008 networks. This change in technology in Windows Server 2008
provides a huge improvement in DFS replication performance, especially
across WAN links. Therefore, when planning for DFS, you should plan to
upgrade your DFS servers if DFS replication will occur across WAN links.
Note: RDC and small files
RDC is not used on files
smaller than 64 KB; in this case the file is compressed before it is
replicated. You can also disable RDC on connections that are in a LAN
where network bandwidth is not contended.
Cross-File RDC
An additional function of
RDC, known as cross-file RDC, can be used to further reduce bandwidth
usage. Cross-file RDC is useful when a file exists on the sending member
and not the receiving member but similar files exist on the receiving
member. Instead of replicating the entire file, DFS Replication can use
portions of files that are similar to the replicating file to minimize
the amount of data transferred over the WAN. Cross-file RDC can use
multiple files as candidate files for RDC seed data.
Replication Schedule and Bandwidth Throttling
DFS Replication
supports replication scheduling and bandwidth throttling in 15-minute
increments during a seven-day period. When specifying a replication
window, you choose the replication start and stop times as well as the
bandwidth to use during that window. The settings for bandwidth usage
range from 16 kilobits per second (Kbps) to 256 megabits per second
(Mbps) as well as full (unlimited) bandwidth. You can configure a
default schedule and bandwidth that applies to all connections between
members and optionally create a custom schedule and bandwidth for
individual connections.
Because members of a
replication group are often located in different time zones, it is
important to consider the time zones of the sending and receiving
members when you set the schedule. The receiving member initiates
replication by interpreting the schedule either in Coordinated Universal
Time (UTC) or in the receiving member’s local time, depending on which
setting you choose. You can choose this setting for the replication
group schedule and for custom schedules on individual connections.
Replication Filters
You
can configure file and subfolder filters to prevent files and
subfolders from replicating. Both types of filters are set on a
per-replicated folder basis. You exclude subfolders by specifying their
name or by using the asterisk (*) wildcard character. You exclude files
by specifying their name or by using the asterisk (*) wildcard character
to specify file names and extensions.
Staging Folder
DFS Replication
uses staging folders to act as caches for new and changed files to be
replicated from sending members to receiving members. Each replicated
folder uses its own staging folder, and each staging folder has a
configurable quota. The quota, which governs when files are purged based
on high and low watermarks, must be carefully set based on each
replicated folder’s replication activity and the disk space available on
the server.
Conflict And Deleted Folder
DFS Replication uses a
last writer wins method for determining which version of a file to keep
when a file is modified on two or more members and each member has not
seen the other’s version. The losing file is stored in the Conflict And
Deleted folder on the member that resolves the conflict. The Conflict
And Deleted folder can also be used to store files that are deleted from
replicated folders. Each Conflict And Deleted folder has a quota that
governs when files are purged for cleanup purposes.
Disabled Memberships
A membership
defines the relationship between each replicated folder/member pair.
Each membership has a status, either enabled or disabled. If you do not
want a replicated folder to be replicated to certain members, you can
disable the memberships for those members. Doing so allows you to
replicate folders to only a subset of replication group members.
Overview of the DFS Design Process
If you decide to implement DFS, you can use the following general outline to plan your DFS design:
Identify data to replicate.
Make initial namespace decisions.
Design the replication topology.
Plan for high availability and business continuity.
Design the namespace hierarchy and functionality.
Design replication schedules and bandwidth throttling.
Review performance and optimization guidelines.
Plan for DFS Replication deployment.
More Info: Designing DFS
For a detailed description of DFS planning and design, visit http://technet.microsoft.com and search for an article entitled “Designing Distributed File Systems.”