本文共 2233 字,大约阅读时间需要 7 分钟。
1、将Hive的配置文件拷贝到Spark的配置文件目录下,软拷贝硬拷贝皆可以
ln -s /opt/software/hadoop/hive110/conf/hive-site.xml /opt/software/hadoop/spark244//conf/hive-site.xml
2、拷贝jar包
cp /opt/software/hadoop/hive110/lib/mysql-connector-java-5.1.32.jar /opt/software/hadoop/spark244/jars/
3、启动Spark-shell
spark-shell --jars /opt/software/hadoop/spark244/jars/mysql-connector-java-5.1.32.jar
4、在Hive中建表-略
5、在Spark SQL中插入数据-略,此处直接查询数据库做演示
scala> spark.sql("show databases").show()
6、在Hive中查询数据即可看到在Spark中的操作
7、IDEA中集成
Maven搜索Spark-Hive,选第一个 » [2.4.4],找到对应的scala版本号
org.apache.spark spark-hive_2.11 2.4.4 mysql mysql-connector-java 5.1.31
8、把hive110/conf/hive-site.xml文件拷贝到resources资源包中
把第一个property中的hive仓库路径添加hdfs端口hdfs://192.168.221.140:9000
hive.metastore.warehouse.dir hdfs://192.168.221.140:9000/opt/software/hadoop/hive110/warehouse
9、mysql中创建Hive账号并赋予权限
mysql中输入以下命令:
grant all on *.* to 'root'@'%' identified by 'kb10';grant all on *.* to 'root'@'localhost' identified by 'kb10';flush privileges;
10、IDEA代码如下,即可连接成功
object HiveSpark{ def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[4]") .appName(this.getClass.getSimpleName) .enableHiveSupport() .getOrCreate() spark.sql("show databases").show() }}
做完以上步骤后,在回到虚拟机下使用beeline -u jdbc:hive2://192.168.221.140:10000
命令时,启动的是spark内置的beeline,因此无法启动,此时需要进入hive/bin目录下用bash启动即可
object ConnectSql{ def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[4]") .appName(this.getClass.getSimpleName) .enableHiveSupport().getOrCreate() //最后面是数据库名 val url = "jdbc:mysql://192.168.221.140:3306/exam" val tableName = "cron_test"//表名 // 设置连接用户、密码、数据库驱动类 val prop = new java.util.Properties prop.setProperty("user","root") prop.setProperty("password","kb10") prop.setProperty("driver","com.mysql.jdbc.Driver") // 取得该表数据 val jdbcDF = spark.read.jdbc(url,tableName,prop) jdbcDF.show //DF存为新的表 jdbcDF.write.mode("append").jdbc(url,"t2",prop) }}
转载地址:http://jwqb.baihongyu.com/