前些时候参加了一些关于跨数据中心的系统可靠性讨论,结合之前发生的各项云计算事故,特别是Amazon EBS事故的反思,设计并实现跨数据中心的系统可靠性对云计算技术和服务的发展有很大的意义。系统的可靠性有多种级别,包括实时备份和异步灾备,前者需要系统做特殊的设计,保证每次更新都被互相同步;后者则较为简单,只需晚上异步同步白天的业务数据即可。
让我们看看目前的跨DC可靠性的现状。
一、Amazon AWS服务:应用和平台配合完解决可靠性,但是可以提供异步复制。AWS有Zone的概念,Zone可以看做是DC中的集群,是近处的DC,Zone之上是Region,Region可以看做是远程DC。
Data stored in Amazon S3, Amazon SimpleDB, or Amazon Elastic Block Store is redundantly stored in multiple physical locations as part of normal operation of those services and at no additional charge. Amazon S3 and Amazon SimpleDB ensure object durability by storing objects multiple times across multiple datacenters on the initial write and then actively doing further replication in the event of device unavailability or detected bit-rot.
二、OpenStack:目前正在做跨3个DC的演示平台,基于开源软件
Dell Inc. (Nasdaq: DELL), Equinix, Inc.(NASDAQ: EQIX) and Rackspace Hosting, Inc. (NYSE: RAX) today announced that they have collaborated to develop an OpenStack™ cloud demonstration and test environment. The demo environment will be available in three data centers: the Equinix International Business Exchange™ (IBX®) in Silicon Valley, Calif., and Ashburn, Va., and the Rackspace data center in Chicago, Ill. The companies also plan to collaborate on additional demo environments in Equinix data centers in Europe and Asia in the second quarter of 2011. The demonstration environment allows organizations to easily and rapidly assess applications on the open-source cloud platform, simplifying the evaluation process and speeding the deployment of OpenStack proof of concepts (POCs). It is the first demonstration of an open standard platform where geographically dispersed OpenStack clouds can offer customers the ability to move applications and workloads between them.
三、Rackspace:目前不提供跨DC的可靠性
四、HBase:目前不提供跨DC的部署,但是提供Bulk Load工具,或可提供异地灾备
Bulk uploader needs to be able to tolerate myriad data input types. Data will likely need massaging and ultimately, if writing HRegion content directly into HDFS rather than going against hbase API – preferred since it’ll be dog slow doing bulk uploads going against hbase API – then it has to be sorted. Using mapreduce would make sense.
Look too at using PIG because it has a few LOAD implementations – from files on local or HDFS – and some facility for doing transforms on data moving tuples around. Would need to write a special STORE operator that wrote the data sorted out as HRegions direct into HDFS (This would be different than PIG-6 which is about writing into hbase via API).
Also, chatting with Jim, this is a pretty important issue. This is the first folks run into when they start to get serious about hbase.
五、MongoDB:据说可以提供多个DC的数据同步,由于MongoDB是客户端可靠性,因此应用需要保证可靠性。
In version 1.6, 1.8, and when more sophisticated setups are not necessary, basic configuration choices can be sufficient to ensure good performance across multiple data centers. Primary plus DR site:Use one site, with one or more set members, as primary. Have a member at a remote site with priority=0。Multi-site with local reads:Another configuration would be to have one member in each of three data centers. One node arbitrarily becomes primary, the others though are secondaries and can process reads locally.
六、Linode:底层不提供可靠性保证,需要应用提供,包括同步和异步的复制
That sounds like a lot of work, doing it all manually. You’re probably better off coming up with something simple but automated. For example, nightly syncing via rsync. Run a cron job on your primary that, through SSH, initiates a blocking (consistent, if your database access is transactional) database dump, rsyncs it and any website changes to the secondary box, and initiates a database import on the secondary box. In terms of more timely updates for when you post an article, if you’ve got some sort of article posting script, you can just have it execute the script that the nightly cron job executes (or if you have no article posting script, initiate it yourself). The sync should be pretty fast since little will have changed, and while the import on the secondary box might take a while, that shouldn’t matter since your primary box doesn’t need to wait on that.
相关信息: