目录
简介
本文简单介绍如何使用spark程序读取parquet格式数据和以文本格式保存。
代码
代码如下:
val hdfsPath:scala.Predef.String = args(0) val savePath = args(1) val sparkConf = new SparkConf().setAppName(appName) val sqlContext = new SQLContext(new SparkContext(sparkConf)) val parquet = sqlContext.read.parquet(logPath) parquet.select(parquet("someField")).rdd.saveAsTextFile(savePath)
问题
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
解决办法
需要用scala.Predef.String
val hdfsPath:scala.Predef.String = args(0)
完整代码
import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} object ParseParquet { def main(args: Array[String]): Unit = { val hdfsPath:scala.Predef.String = args(0) val savePath = args(1) val sparkConf = new SparkConf().setAppName("Extract") val sqlContext = new SQLContext(new SparkContext(sparkConf)) val parquet = sqlContext.read.load(hdfsPath) parquet.select(parquet("field")).rdd.coalesce(10).saveAsTextFile(savePath) } }