java.lang.Object
- org.apache.spark.sql.sources.BaseRelation
- - uk.gov.gchq.gaffer.sparkaccumulo.operation.handler.dataframe.AccumuloStoreRelation

All Implemented Interfaces:

org.apache.spark.sql.sources.PrunedFilteredScan, org.apache.spark.sql.sources.PrunedScan, org.apache.spark.sql.sources.TableScan
```
public class AccumuloStoreRelation
extends org.apache.spark.sql.sources.BaseRelation
implements org.apache.spark.sql.sources.TableScan, org.apache.spark.sql.sources.PrunedScan, org.apache.spark.sql.sources.PrunedFilteredScan
```
Allows Apache Spark to retrieve data from an AccumuloStore as a DataFrame. Spark's Java API does not expose the DataFrame class, but it is just a type alias for a Dataset of Rows. The schema of the DataFrame is formed from the schemas of the groups specified in the view.
If two of the specified groups have properties with the same name, then the types of those properties must be the same.
AccumuloStoreRelation implements the TableScan interface which allows all Elements to of the specified groups to be returned to the DataFrame.
AccumuloStoreRelation implements the PrunedScan interface which allows all Elements of the specified groups to be returned to the DataFrame but with only the specified columns returned. Currently, AccumuloStore does not allow projection of properties in the tablet server, so this projection is performed within the Spark executors, rather than in Accumulo's tablet servers. Once AccumuloStore supports this projection in the tablet servers, then this will become more efficient.
AccumuloStoreRelation implements the PrunedFilteredScan interface which allows only Elements that match the the provided Filters to be returned. The majority of these are implemented by adding them to the View, which causes them to be applied on Accumulo's tablet server (i.e. before the data is sent to a Spark executor). If a Filter is specified that specifies either the vertex in an Entity or either the source or destination vertex in an Edge then this is applied by using the appropriate range scan on Accumulo. Queries against this DataFrame that do this should be very quick.

Constructor Summary

Constructors
Constructor	Description
`AccumuloStoreRelation(Context context, List<Converter> converters, View view, AccumuloStore store, Map<String,String> options)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>`	`buildScan()`	Creates a `DataFrame` of all `Element`s from the specified groups.
`org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>`	`buildScan(String[] requiredColumns)`	Creates a `DataFrame` of all `Element`s from the specified groups with columns that are not required filtered out.
`org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>`	`buildScan(String[] requiredColumns, org.apache.spark.sql.sources.Filter[] filters)`	Creates a `DataFrame` of all `Element`s from the specified groups with columns that are not required filtered out and with (some of) the supplied `Filter`s applied.
`org.apache.spark.sql.types.StructType`	`schema()`
`org.apache.spark.sql.SQLContext`	`sqlContext()`

Methods inherited from class org.apache.spark.sql.sources.BaseRelation
needConversion, sizeInBytes, unhandledFilters

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - AccumuloStoreRelation
```
public AccumuloStoreRelation(Context context,
                             List<Converter> converters,
                             View view,
                             AccumuloStore store,
                             Map<String,String> options)
```
- Method Detail
  - sqlContext
```
public org.apache.spark.sql.SQLContext sqlContext()
```
    Specified by:
    
    sqlContext in class org.apache.spark.sql.sources.BaseRelation
  - schema
```
public org.apache.spark.sql.types.StructType schema()
```
    Specified by:
    
    schema in class org.apache.spark.sql.sources.BaseRelation
  - buildScan
```
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan()
```
    Creates a DataFrame of all Elements from the specified groups.
    
    Specified by:
    
    buildScan in interface org.apache.spark.sql.sources.TableScan
    
    Returns:
    
    An RDD of Rows containing Elements whose group is in groups.
  - buildScan
```
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns)
```
    Creates a DataFrame of all Elements from the specified groups with columns that are not required filtered out.
    Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
    
    Specified by:
    
    buildScan in interface org.apache.spark.sql.sources.PrunedScan
    
    Parameters:
    
    requiredColumns - The columns to return.
    
    Returns:
    
    An RDD of Rows containing the requested columns.
  - buildScan
```
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns,
                                                                    org.apache.spark.sql.sources.Filter[] filters)
```
    Creates a DataFrame of all Elements from the specified groups with columns that are not required filtered out and with (some of) the supplied Filters applied.
    Note that Spark also applies the provided Filters - applying them here is an optimisation to reduce the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").
    Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
    
    Specified by:
    
    buildScan in interface org.apache.spark.sql.sources.PrunedFilteredScan
    
    Parameters:
    
    requiredColumns - The columns to return.
    
    filters - The Filters to apply (these are applied before aggregation).
    
    Returns:
    
    An RDD of Rows containing the requested columns.

Class AccumuloStoreRelation

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.sql.sources.BaseRelation

Methods inherited from class java.lang.Object

Constructor Detail

AccumuloStoreRelation

Method Detail

sqlContext

schema

buildScan

buildScan

buildScan