spark官网下载集成hadoop的spark包: spark-3.5.1-bin-hadoop3....
解压后 环境变量配置 SPARK_HOME
spark-defaults.conf 中增加一行配置(避免启动spark-sql报错hive元数据连不上): spark.sql.catalogImplementation=hive
打开paimon官网: https://paimon.apache.org/docs/master/spark/quick-start/
paimon-spark-3.5-0.9-SNAPSHOT.jar 下载放到 spark/jars 目录下
vi spark-sql-paimon.sh 内容如下(/tmp路径可替换为自己的家目录):spark-sql ... \--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \--conf spark.sql.catalog.paimon.warehouse=file:/tmp/paimon \--conf spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions./sbin/start-all.sh启动spark
./bin/spark-sql-paimon.sh 启动支持paimon catalog的spark-sqlpaimon建表测试:
create table my_table (k int,v string
) tblproperties ('primary-key' = 'k'
);
INSERT INTO my_table VALUES (1, 'Hi'), (2, 'Hello');
INSERT INTO my_table VALUES (1, 'Hi'), (3, 'tom');
SELECT * FROM my_table;
-- 这套可替代spark on hive,配置简单,且spark服务重启后不会丢表.