Business requirements drive
your replication configuration and method. In addition, nailing down all
the details of the business requirements is the hardest part of a data
replication design process. After you have completed the requirements
gathering, the replication design usually just falls into place from it
easily. The requirements gathering is highly recommended to get a
prototype up and running as quickly as possible to measure the
effectiveness of one approach over the other. You must understand
several key aspects to make the right design decisions, including the
following:
What is the number of sites, and what is the site autonomy in the scope (location)?
Which sites have the master data (data ownership)?
What is the data latency requirement (by site)?
What types of data accesses are being made (by site)?
Reads
Writes
Updates
Deletes
This
information needs to include exactly what data and data subsets that
drive filtering are needed for the data accesses (by site).
What is the volume of activity/transactions, including the number of users (by site)?
How many machines do you have to work with (by site)?
What are the available processing power (CPU and memory) and disk space on each of these machines (by site)?
What are the stability, speed, and saturation level of the network connections between machines (by site)?
What is the dial-in, Internet, or other access mechanism requirement for the data?
What potential subscriber or publisher database engines are involved?
Figure 1
shows the factors that contribute to replication designs and the
possible data replication configuration that would best be used. It is
only a partial table because of the numerous factors and many
replication configuration options available. However, it gives a good
idea of the general design approach described here. Perhaps 95% of user
requirements can be classified fairly easily. The other 5% might take
some imagination in determining the best overall solution. Depending on
the requirements that need to be supported, you might even end up with a solution using something like database mirroring or other distribution techniques.
Data Characteristics
You need to analyze the
underlying data types and characteristics thoroughly. Issues such as
collation or character set and data sorting come into play. You must be
aware of what they are set to on all nodes of your replication
configuration. SQL Server 2008 does not convert the replicated data and
might even mistranslate the data as it is replicated because it is
impossible to map all characters between character sets. It is best to
look up the character set “mapping chart” for SQL Server replication to
all other data target environments. Most are covered well, but problems
arise with certain data types, such as image, timestamp, and identity.
Sometimes, using the Unicode data types at all sites is best for
consistency. Following is a general list of issues to watch out for in
this regard:
Collation consistency across all nodes of replication.
Time stamp column data in replication. It might not be what you think.
identity, uniqueidentifier, and guid column behavior with data replication.
text or image data types to heterogeneous subscribers.
Missing
or unsupported data types because of prior versions of SQL Server or
heterogeneous subscribers as part of the replication configuration.
Maximum row size limitations between merge replication and transactional replication.
Figure 2 lists further SQL Server 2008 replication object limitations.

Note
If you have triggers on
your tables and you want them to be replicated along with your table,
you might want to add the line of code NOT FOR REPLICATION so that the trigger code isn’t executed redundantly on the subscriber side.