/etc/hosts文件的内容是
172.2.9.220 rhcs1
172.2.9.221 rhcs2
172.2.9.222 rhcs_vip
cluster.conf文件的内容是
fence设备是采用的手动管理,参照的是史应生的《红帽集群 (高可用性) 配置,管理和维护-最强版》
现在的问题是,我启动第一台机器,集群能起来,但是当我起第二台机器的时候集群就断开了,以下是日志文件信息
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] entering GATHER state from 11.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] Creating commit token because I am the rep.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] Saving state aru 1d high seq received 1d
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] Storing new sequence id for ring 80
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] entering COMMIT state.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] entering RECOVERY state.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] position [0] member 172.2.9.220:
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] previous ring seq 124 rep 172.2.9.220
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] aru 1d high delivered 1d received flag 1
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] position [1] member 172.2.9.221:
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] previous ring seq 120 rep 172.2.9.221
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] aru a high delivered a received flag 1
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] Did not need to originate any messages in recovery.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] Sending initial ORF token
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] CLM CONFIGURATION CHANGE
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] New Configuration:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] r(0) ip(172.2.9.220)
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] Members Left:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] Members Joined:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] CLM CONFIGURATION CHANGE
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] New Configuration:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] r(0) ip(172.2.9.220)
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] r(0) ip(172.2.9.221)
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] Members Left:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] Members Joined:
Aug 28 02:18:08 rhcs1 openais[2788]: [CLM ] r(0) ip(172.2.9.221)
Aug 28 02:18:08 rhcs1 openais[2788]: [SYNC ] This node is within the primary component and will provide service.
Aug 28 02:18:08 rhcs1 openais[2788]: [TOTEM] entering OPERATIONAL state.
Aug 28 02:18:08 rhcs1 openais[2788]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart
Aug 28 02:18:08 rhcs1 gfs_controld[2820]: groupd_dispatch error -1 errno 11
Aug 28 02:18:08 rhcs1 gfs_controld[2820]: groupd connection died
Aug 28 02:18:08 rhcs1 gfs_controld[2820]: cluster is down, exiting
Aug 28 02:18:33 rhcs1 ccsd[2779]: Unable to connect to cluster infrastructure after 30 seconds.
Aug 28 02:19:03 rhcs1 ccsd[2779]: Unable to connect to cluster infrastructure after 60 seconds.
Aug 28 02:19:33 rhcs1 ccsd[2779]: Unable to connect to cluster infrastructure after 90 seconds.
Aug 28 02:20:03 rhcs1 ccsd[2779]: Unable to connect to cluster infrastructure after 120 seconds.
有时候集群有正常,但是我执行 clusvadm -r rhcs1 e -m rhcs2想把服务切到第二台上面去,报错:说rhcs2服务不存在。
然后就是fence设备的配置问题,在system-config-cluster上添加fence设备,指定用manul fence之后,然后在添加到两个节点。做了这些操作之后还要做什么吗?
求高手指教。。