红联Linux门户
Linux帮助

使用python客户端访问hive

发布时间:2016-09-24 08:31:30来源:linux网站作者:faith默默
linux和windows环境下均可。
 
1.python与hiveserver交互
#!/usr/bin/python2.7
#hive --service hiveserver >/dev/null 2>/dev/null&
#/opt/cloudera/parcels/CDH/lib/hive/lib/py
import sys
sys.path.append('C:/hadoop_jar/py')
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift.transport import TSocket
from thrift import Thrift
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
if __name__=='__main__':
try:
socket = TSocket.TSocket('10.70.50.111', 10000)
transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
sql = 'select * from test'
transport.open()
client.execute(sql)
with open('C:/Users/DWJ/Desktop/python2hive.txt','w') as out_file:
while client.fetchOne():
out_file.write(client.fetchOne())
transport.close()
except Thrift.TException, tx:
print'%s'%(tx.message)
其中,C:/hadoop_jar/py里的包来自于hive安装文件自带的py,如:/opt/cloudera/parcels/CDH/lib/hive/lib/py,将其添加到python中即可。
 
2.python与hiveserver2交互
#!/usr/bin/python2.7  
#hive --service hiveserver2 >/dev/null 2>/dev/null&  
#install pyhs2,first install cyrus-sasl-devel,gcc,libxml2-devel,libxslt-devel  
#hiveserver2 is different from hiveserver on authority
import pyhs2        
with pyhs2.connect(host='xx.xx.xx.xxx',port=10000,authMechanism="NOSASL",user='test',password='testdvlp',database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from test")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
其中,authMechanism的值取决于hive-site.xml里的配置
<name>hive.server2.authentication</name>
<value>NOSASL</value>
默认为NONE,另外还可以为’NOSASL’, ‘PLAIN’, ‘KERBEROS’, ‘LDAP’.
另外,在widows下运行时,安装pyhs2会报错,因为有依赖包sasl无法下载,可到http://www.lfd.uci.edu/~gohlke/pythonlibs/里面下载相应windows版的whl包进行安装即可成功。
 
3.两种通讯有一个共同点,就是必须启动hive服务器。
hive --service hiveserver
或者:
hive --service hiveserver2
如果出现如下错误:
使用python客户端访问hive
通过以下命令可查看端口使用情况:
netstat -apn|grep 10000
则表示10000端口已启动。若端口被占用,可重新定制端口:
hive --service hiveserver -p 10008
另外,有时连接成功后,执行client.execute(sql)一直无反应,既不报错,也无运行结果,这个还未找到原因。
 
本文永久更新地址:http://www.linuxdiyf.com/linux/24422.html