I'm wondering if the technology exists to use SAN replication for sql server 2005 disaster recovery.
I have a bunch of prod servers I want to add to a san, I then want to have another bunch of servers at a dr site connected to another san.
Is there a technolgy ( non sql ) to enable full ( real time ) san replication of the data ( for SQL Server databases ).
I don't need alternate suggestions, this solution has been proposed to my clients, I don't think it's available, any confirmation one way or another would be very helpful, thanks.
Yes,
I am working with the HP tecnology Continuos Access, you can replicate the data on asyncrono or syncrono way. The answer is: If you have one hardware solution with high availibility ( duplicate cache, duplicate fibra,etc.) , you can use this solution as disaster solution to SQL, Exchange, etc. and it is a solution for the future ( it is independent of the software and version).
|||Hmm I'll have word with HP, I tried to speak with EMC without much success, the one person I did get to speak with didn't think it was possible.
SAN replication only really gives availability I still need to be able to protect against data damage or loss, still it's a start thanks.
|||EMC does have a long-distance SAN-based replication technology, but it does not come cheap. It is based on their Symmetrix line of storage arrays.
So, the answer is "yes, for a price".
Kevin Farlee
SQL Storage PM
|||You should take a look at this whitepaper SQL Server 2000 I/O Basics http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx#EADAE to get an understanding of some of the considerations. Be aware that simply replicating the changes on the disks might leave you with a version of the data on the second site that never really existed at the primary site, meaning that there's a possibility that SQL Server would not be able to recover the database once the files were attached to SQL Server. Since SQL Server relies on writes to the disks occuring in a certain order, you need some form of 'consistency group' mechanism that allows you to group the data and log files together so that the writes are properly ordered on the secondary as well.
There are geographically-dispersed failover clusters that have some storage-level replication for the SQL Server disks. You can check http://www.microsoft.com/windows/catalog/server/default.aspx for potential solutions (click on the Hardware tab, then on 'Cluster Solutions > Geographically Dispersed Cluster Solutions).
|||Hi Don,
I talked several times with you on IT Forum and PASS (Barcelona) above this topic. We are using :
StorageWorks EVA5000 / CLX EVA for Windows / Secure Path / ProLiant BL45p with HPBALCF 105 and StorageWorks
But the name of the technology that uses CLX EVA to replicate between EVAs is Continuos Acess.
With this solution, do you see the problems that you comment in your message with SQL?
|||Yes the solution is available today - we are currently developing a prototype for SAN based replication for Disaster Recovery.
We have a EMC clariion CX500 and are using RepliStore...
Real time replication is limited by the money you wish to spend for the "Pipe" / bandwidth and the # of transactions...
|||I'm beginning to think this is overkill for what the business actually need. As a DBA I'm pretty well up on HA solutions but this is going beyond my knowledge. I hadn't actually thought about the sequence of i/o , with applications that work with mutliple databases that could be a nightmare. The proposal was that the DR SQL Servers would be on-line, but what you're saying is that SAN replication would only replicate the datafiles, in DR I'd actually then have to attach all the databases to the replicated files so my DR site wouldn't even be warm.
I should say that I'm interpreting a proposal, not to necesscarily dismiss it, but I have to understand the proposal ( and if it's actually possible/practical ) if I'm going to recommend we move forward with it.
Thanks for starting this form btw .. just what I was looking for!
|||As you can see from just these replies there are several different products and configurations from several vendors, and the options are continually changing. So within SQL Server, rather than try to test every combination from many different vendors, we make sure we document how our product works and what its requirements are so that customers and vendors can determine which of the various products and configurations are appropriate for use with SQL Server. And of course, we have our own built-in features for much of this space (such as Replication and Database Mirroring) which are designed with our requirements in mind and tested to extreme levels.
Hopefully the various vendors strive to understand how SQL Server's needs differ from mere file-level replication. If they really are concerned with ensuring your data is reliably replicated while addressing our requirements, they likely have customer SQL Server-specific whitepapers and well-tested configurations that can provide the appropriate guarantees. However it is also up to the buying customer to be aware of the requirements (as described in our SQL Server IO Basics whitepaper) and do their own questioning and testing so they know they can trust the solution they ultimately choose.
Regarding 'continuous' replication forms, realize that SQL Server has requirements regarding write-ordering across devices. Any solution that does not take the relationship of devices into account can have the data in the files appear slightly differently at the secondary than at the primary. This might lead to an unrecoverable database at the secondary. 'Continuous' may also mean either synchronous or asynchronous, depending on the solution/vendor. Synchronous is great, but it probably will slow down operations on the primary since write have to occur locally as well as remotely before they can be acknowledged and the transaction committed locally; so for larger distances synchronous solutions cause unacceptable response times. Async solutions write locally and then write remotely at some 'later' time (perhaps sub-second, but still at a later time); this means there's information recorded at the primary that might not yet be recorded remotely when the primary fails. This means that committed transactions can be lost (in addition to the possible write-ordering consideration from above).
There's also considerations with block-level or track-level replication: Does the write at the secondary cause the data/log data to be rewritten, resulting in a possible loss of previously committed changes.
I've also heard vendors say that there's no data loss in the general case. But of course it's precisely in the disaster case when you need this to have no data loss, not the general case when everything is going well.
Anyway, strive to understand the vendor's solution, compare it to SQL Server's requirements, test it 'on paper' to find any theoretical issues, and test it hands-on to see if it can survive the scenarios you need it to. And of course, train your people, document your processes, and periodically test your failure processes so you know they work and so that there aren't surprises when a true disaster occurs and your most-junior person is trying to get your billion-dollar system back online at your secondary.
|||I'm testing san 2 san replication on 2 different sites (300+ miles between them). Xiotech, product to replicate san disks called kashya. It is a driver, the primary site sees SAN, to set up the DR side you'll need to disable the driver, then bring ms sql exactly as it is set up on the primary side exept of all the databases, just your system databases need to be in the right place, so when the replication starts anf the primaru site fails (simulated by posix library kill in our case), then the sql on DR side actually can start and don't complain about things.
Kashya claims as many other venders to support consistency groups, have 0 data loss and work as fast as possible etc. Will see very soon....
|||I'm in a sensitive position both politically and professionally, as a SQL DBA contractor I'm seen as something of a sql server "expert" and politically I have to be aware of the people around me involved in the proposals and business plans. The whole project revolves around server consolidation and the introduction of a SAN, although I'm not totally sure it will all happen at once. The proposed soution shows database mirroring for 100 databases, which I understand is not really practical on top of san replication .. your new forum was just waht I needed - thanks. I appreciate the advice.|||The project is similar, but most of the production is sql 2000 and I deal with VLDB, some of them produce their own size in transaction logs under 8h so it’s a nice one.
The task would be sensitive to amount of data to replicate and how busy your databases are and how thick is your channel between 2 data centers. How many databases 1 or 100 – doesn’t matter, att the business is happening under ms sql server any ways. And of course choose your SAN is, LUN, NICS, routers, OS on servers carefully and make your self nice 100 paged tech spec, it will save you head one day if thing will turn not the way you’d like them to be J. If it’s EMC even do not think about getting any way from their supported configurations, they will drop the support.
With sql 2005 I’m looking at their database mirroring. On other side I will not have to support plenty legacy applications in sql 2005 that have no clue about living somewhere else (not fault tolerant, do not reconnect automatically, use getdate() for timestamps so will go crazy when all that mumbo-jumbo will fail over on a server in a different time zone and the list goes on).
Honestly, when we are talking DR I’d worry about your applications first. Can they handle a simple fail over like cluster failover, can they handle DR comes next, are you going to have the same domain, the same or different server names, what are you running on that ms sql instance can have nice binary attributes that contain plenty configuration information in every reports meta data for example atc. So how to replicate a ms sql database is not a question, the question is if that thing that uses it is going to be in peace with DR environment. And if the answer is no, you are not there yet, your client is not there yet to be exact.
When you will talk san 2 san replication in many cases you will get warm stand by, and will have to deploy your applications on both sides at the same time, so if there is no good configuration management, I’d say solve that problem first or the maintenance will bite you good… at least. And warm stand by is not so easy for a business to swallow because they have to keep the clone of your production in some other place in some other state with all the w2 admins in place and you are telling them that all that stuff will get busy if and only if your main site will get some nasty X days power outage when your generators are out of juice or earth quake or hurricane or something and if not there is no way to load balance 2 sites because it’s whole different song J.
With sql 2000 one does not have plenty options when it comes to DR: replication, log shipping, VDI like solution (same log shipping in profile) san 2 san replication 9depends what are they using inside) , with sql 2005 there is one more option – mirroring, didn’t try it yet, don’t know how good is it, it’s in my plan soon enough.
If your business can tolerate 5-15 min data loss, maybe you are better of with log shipping to a hot standby or some similar implementation of ‘pure man’s cluster’ J, of course you will not flood your channel in process. Many san 2 san replication venders claim they can do stuff faster and transfer less then any software on top of them and it as sync, not as async (log shipping, sql replication etc. custom stuff sitting on VSS, VDI). Be aware of commercials… esp white paper stuff. This is geared not for you to buy it’s for them to sell it to you. To ‘flush’ ms sql cache they will have to call one of 2 things: VDI (ms backup api like VERITAS and plenty 3-rd party venders do) or VSS (same 3-rd party venders but applicable to server 2003), in other words they will have to flush cache, and mo often they take a snapshot more often they will have to do it, at least transaction log they will have to flush and system databases or when your files grow and your master is not aware of it yet the restore on DR side if not going to look pretty.
san 2 san replication can translate into some form replicate disk blocks replication, ask for a nice trip to the venders site, have your scripts and databases to test ready and sit there and start breaking their system. This will show you what is for real and what a commercial indeed is. This can save you and your client some good money. As a contractor once I was involved into a database project on san when they even didn’t go as far as replication even, they had trouble setting up their clusters on the hardware purchased because EMC specs were not checked to begin with, it was not very nice, they ended up paying for 64bit clusters and were forced to run on the monsters 32bit os with all the consequences and deal with plenty other troubles... if the hardware is already on site and the vender was not ‘handled’ properly there is not so much you as a contractor can do, some things just weren’t meant to work together and you are the one who has to say it and they are not going to like it… I do not like cases like that one, so the homework comes first.
|||
Apps are something else, I don't even want to go there. As my client is a bank any loss of data would be unacceptable. I have three issues really, HA for the databases for the business, DR to cover site loss and business continuity, which is HA and DR rolled into one. I have a number of proposed solutions and requirements to cover these. It's the replication of data between SAN's which interests me as this is the proposed solution for almoat all scenarios. I'm unconvinced ( as you might guess ! ) as I don't think it actually meets any of my requirements.
I understand where Don is coming from and I'm grateful for all assistance and input - but has anyone implemented SAN replication with sql server and tested it in failover ? It doesn't seem to me to be the case.
My experiences with SAN's to date have not been good and a colleague had absolute nightmares where the Vendor would not agree with his requests for configuration and dismissed the Siebel SAN doc as being largely incorrect. In the end he was right, they were wrong, but his employer lost a bundle of money and repuatation, still that's an aside.
|||0 data loss can be guatanteed in 2 cases:
distributed database and your apps are running distributed transactions. consequences you know, they have to be developed that way.
sql 2005 mirroring when you tell it to commit if and only if the data were delivered on the otehr side. will slow down your oltp.
Replication assumes data loass, it's async no matter what white papper sais. All the solutions I know are taking snapshots. so get yourself a simple test, kill your server, and see what does restore do for you, how many transactions were rolled forward, how many were rolled back. Document that one. Then repeat the processs and start your replication afte the server is killed. On the other side you will see who much data did you loose if any in comparison with the restore on the same side. identity insert is not a bad thing or sequential int (works faster :) ) to test your data consistency. In your case I'd test very short transactions like under 5 statenets (one statement works one row).
In my case on xiotech san I do expect data loss. How much - tests will show.
|||Take a look at http://www.microsoft.com/sql/alwayson.mspx .. it's a new site (just announced here at TechEd in Boston) where we list solutions from various vendors that we've reviewed. We point to a whitepaper that they provide that describes how their solution addresses the requirements that SQL Server has on storage systems, including those that use some sort of replication to provide HA or DR capabilities.
There are synchronous solutions that do provide zero data loss, not acknowledging the disk write until it's recorded both locally and remotely. Of course there will likely be some amount of hit on the response time, but if the remote site isn't too remote the app can likely tolerate it.
There are some asynchronous systems, but of course there can be some amount of data loss when the local write is acknowledged before it's sent to the remote site, if a crash occurs at just the wrong time, but with appropriate consistency mechanisms you at least have a guarantee that the remote site has a version of the data/log that actually existed at some point at the local site, and is therefore valid and recoverable.
Don
No comments:
Post a Comment