根据官方文档:Hbase 0.90.x只可以运行在Hadoop0.20.x,不可以运行于hadoop0.21.x (0.22.x也不行).HBase运行在没有持久同步功能的HDFS上会丢失数据。Hadoop 0.20.2 和 Hadoop0.20.203.0就没有这个功能。现在只有branch-0.20-append补丁有这个功能[1]. 现在官方的发行版都没有这个功能,所以你要自己打这个补丁。推荐看Michael Noll 写的详细的说明,Buildingan Hadoop 0.20.x version for HBase0.90.2.
你还可以用Cloudera'sCDH3. CDH 打了这个补丁 (CDH3betas 就可以满足; b2, b3, or b4).
因为Hbase建立在Hadoop之上,所以他用到了hadoop.jar,这个Jar在lib里面。这个jar是hbase自己打了branch-0.20-append 补丁的hadoop.jar. Hadoop使用的hadoop.jar和Hbase使用的必须一致。所以你需要将 Hbaselib目录下的hadoop.jar替换成Hadoop里面的那个,防止版本冲突。比方说CDH的版本没有HDFS-724而branch-0.20-append里面有,这个HDFS-724补丁修改了RPC协议。如果不替换,就会有版本冲突,继而造成严重的出错,Hadoop会看起来挂了。
下载hbase-0.92.0.tar.gz,放在/usr目录下解压。在每个节点机器中都将配置文件添加到/etc/profile里,更改hbase文件夹的所有者。
编辑hbase_env.sh,设置hadoop和java的目录,设置log目录
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/usr/local/hadoop-0.20.2/pids
export JAVA_HOME=/usr/java/jdk1.7.0_05/
export HADOOP_INSTALL=/usr/local/hadoop-0.20.2
export HBASE_CLASSPATH=/usr/local/hadoop-0.20.2/conf
exportHBASE_LOG_DIR=/home/hadoop/log/hbase-log
编辑 conf/hbase-site.xml:
<?xmlversion="1.0"?>
<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode/hbase</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the clusterwill be in. Possible values are
false: standalone and pseudo-distributed setups with managedZookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (seehbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property fromZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>namenode,datanode1,datanode2,datanode3</value>
<description>Comma separated list ofservers in the ZooKeeper Quorum.
For example,"host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local andpseudo-distributed modes
of operation. For a fully-distributed setup, this should be set toa full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set inhbase-env.sh
this is the list of servers which we will start/stop ZooKeeperon.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
<description>Property fromZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>Theport master should bindto.</description>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
设置ulimit
在文件/etc/security/limits.conf添加一行,如:
hadoop-nofile 32768
可以把hadoop替换成你运行Hbase和Hadoop的用户。如果你用两个用户,你就需要配两个。还有配nproc hard 和 soft limits. 如:
hadoop soft/hard nproc 32000
在/etc/pam.d/common-session加上这一行:
session required pam_limits.so
否则在/etc/security/limits.conf上的配置不会生效.
还有注销再登录,这些配置才能生效!
编辑RegionServer
所有的节点都运行RegionServer,除了第一个节点namenode,它要运行 HBase Master 和 HDFS namenode,格式如下:
namenode
datanode1
datanode2
datanode3
启动hbase /bin/start-hbase.sh
关闭hbase /bin/stop-hbase.sh
如果在hbase状态网页上出现提示:
You are currently running the HMaster without HDFS append supportenabled. This may result in data loss.
解决方法见:http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport
HDFS Sync Support
Overview
In order to provide durability of edits, HBase requires that yourHDFS installation supportsthesynccall. This callpushes pending data through the HDFS write pipeline and blocksuntil it has received an acknowledgement from all three nodes inthe pipeline. HBase uses this feature when writing edits to itswrite-ahead log (WAL) so that, if a region server should die, thedata may be recovered and replayed on other region servers.
What versions of HDFS support sync?
The necessary feature is available inthe0.20-appendbranchof HDFS, theunreleased0.21branch,and Cloudera's CDH3 release [https://docs.cloudera.com/display/DOC/HBase+Installation].
*NOTE:* Apache HDFS 0.20 does not support a working sync, even ifthedfs.support.appendflagis enabled. You *must* use one of the above versions of Hadoop tohave durable edits in HBase.
How can I enable sync?
To enable sync, first ensure that you have either compiled the0.20-append branch from Apache, or installed Cloudera's CDH3. Thenensure that you have setthedfs.support.appendflagtotrueinyourhdfs-site.xmlbothin HDFS's configuration as well as HBase's.hbase-site.xml
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>