Class DataFrameUtil


  • public final class DataFrameUtil
    extends Object
    Utility class for manipulating DataFrames.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> emptyEdges​(org.apache.spark.sql.SparkSession sparkSession)
      Create an empty Dataset of Rows for use as edges in a GraphFrame.
      static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> union​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds1, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds2)
      Carry out a union of two Datasets where the input Datasets may contain a different number of columns.
    • Method Detail

      • union

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> union​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds1,
                                                                                   org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds2)
        Carry out a union of two Datasets where the input Datasets may contain a different number of columns. The resulting Dataset will contain entries for all of the columns found in the input Dataset, with null entries used as placeholders.
        Parameters:
        ds1 - the first Dataset
        ds2 - the second Dataset
        Returns:
        the combined Dataset
      • emptyEdges

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> emptyEdges​(org.apache.spark.sql.SparkSession sparkSession)
        Create an empty Dataset of Rows for use as edges in a GraphFrame.
        Parameters:
        sparkSession - the spark session
        Returns:
        an empty Dataset of Rows with a src and dst column.