Aerospike Connect for Spark Release Notes

  • 3.2.0
    Release Date: November 15, 2021
    • Supported until February 14, 2023.
    • Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
    • Minimum supported Aerospike Server version 5.0.

    New Features

    • [CONNECTOR-131] - Expression pushdown support in the Spark Connector.
    • [CONNECTOR-210] - Limit the write rate from Spark to Aerospike.
    • [CONNECTOR-260] - Create DataFrame API for AeroJoin functionality aerolookup.

    Improvements

    • This library is an uber shaded jar.
    • Update Client version to 5.1.8.

    Bug Fixes

    • [CONNECTOR-305] - Create one client instance per spark partition.

    Known Issues

    • This connector release shades all internal libraries. Please update application build files accordingly.
    • Spark connector stores spark DateType and TimestampType as long. In aeroJoin API calls convert aforementioned types to Longtype.
    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.

  • 3.1.0
    Release Date: July 21, 2021
    • Supported until October 21, 2022.
    • Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
    • Minimum supported Aerospike Server version 5.0.

    New Features

    • [CONNECTOR-247] - Spark connector should persist Map bins as K-Ordered.
    • [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
    • [CONNECTOR-142] - Data Sampling using the Spark Connector using aerospike.sample.size flag.
    • [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to aerospike.booleanbin in the documentation).

    Improvements

    • This library is an uber shaded jar.
    • Migrated from queryPartiton() call to ScanPartitions().
    • Update Client version to 5.1.5.
    • Migrated to Expressions for scans.
    • Pushdown support for Float & Double datatypes.

    Bug Fixes

    • [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag aerospike.write.batchsize to control write throughput.

    Known Issues

    • This connector release shades all internal libraries. Please update application build files accordingly.
    • Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert aforementioned types to Longtype.
    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.

    Updates

  • 3.0.2
    Release Date: June 2, 2021
    • Supported until September 2, 2022.
    • Tested with Apache Spark 3.0.0, Scala 2.12.11, & Python 3.7.
    • Minimum supported Aerospike Server version 5.0.

    Improvements

    • This library is an uber shaded jar.

    Bug Fixes

    • [CONNECTOR-208] - Spark connector with default timeout settings is timing out after 1 second.
    • [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
    • [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).

    Known Issues

    • This release does not support Aerospike 5.6 boolean bin and quota features.
    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • Streaming write does not work with Apache Spark 3.1.0.
    • Streaming update trait SupportsStreamingUpdate from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend in Spark 3.1.0.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 3.0.1
    Release Date: February 24, 2021
    • Supported until May 24, 2022.
    • [CONNECTOR-110] - Spark 3.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • Streaming write does not work with Apache Spark 3.1.0.
    • Streaming update trait SupportsStreamingUpdate from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend in Spark 3.1.0.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 3.0.0
    Release Date: February 18, 2021
    • Supported until May 18, 2022.
    • [CONNECTOR-103] - Extend support for Apache Spark 3.0.0 Data Source V2.

    New Features

    • Data Source V2 implementation for Apache Spark 3.0.0.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • Streaming write does not work with Apache Spark 3.1.0.
    • Streaming update trait SupportsStreamingUpdate from Spark 3.0.0 has been renamed to SupportsStreamingUpdateAsAppend in Spark 3.1.0.
    • We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 3.0.1.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 2.8.0
    Release Date: July 14, 2021
    • Apache Spark 2.4.8 is the last release in Spark’s 2.x.y branch. No more 2.x.y releases of Spark are expected, even for bug fixes. Therefore, 2.8.0 is the last version of Aerospike Connect for Spark 2.8 that will be compatible with that Spark branch. Aerospike has ceased developing new features to support Spark 2.x.y. However, bug fixes will be available until October 14, 2022. If you are using Apache Spark 2.4.x and Aerospike Connect for Spark 2.8.0 or earlier, please plan to move to Apache Spark 3.0.x and use Aerospike Connect for Spark version 3.x.y.
    • Supported until October 14, 2022.
    • Tested with Apache Spark 2.4.7, Scala 2.11.12, & Python 3.7.
    • Minimum supported Aerospike Server version 5.0.

    New Features

    • [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
    • [CONNECTOR-142] - Data Sampling using the Spark Connector using aerospike.sample.size flag.
    • [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to aerospike.booleanbin in the documentation).
    • [CONNECTOR-211] - Support partial updates of records using the aerospike.update.partial flag.

    Improvements

    • Migrated from queryPartiton() call to ScanPartitions().
    • Updated Spark version to 2.4.7.
    • Update Client version to 5.1.5.
    • Migrated to Expressions for scans.
    • Pushdown support for Float & Double datatypes.

    Bug Fixes

    • [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
    • [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).
    • [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag aerospike.write.batchsize to control write throughput.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • aerospike.write.mode flag overrides Apache Spark write mode.
    • Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert these types to long.

    Updates

  • 2.7.2
    Release Date: February 24, 2021
    • Supported until May 24, 2022.
    • [CONNECTOR-111] - Spark 2.x branch - Aerospike configuration passed from spark configuration should be accessible downstream.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 2.7.1
    Release Date: January 25, 2021
    • Supported until April 25, 2022.
    • [CONNECTOR-105] - Fixed a TLS issue in the Aerospike Spark 2.7.0 release.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • We have observed that a configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect. Fixed in version 2.7.2.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 2.7.0
    Release Date: January 19, 2021
    • Supported until April 19, 2022.
    • Datasource V2 implementation.
    • Tested with Aerospike Enterprise Edition Database version 5.2.0 & Apache Spark version 2.4.0.

    New Features

    • [CONNECTOR-96] - Upgrade DataSource APIs used in the Spark Connector to v2.
    • [CONNECTOR-101] - Spark Feature file verification expires one day early.

    Improvements

    • Aerospike datasource format can be specified with brevity.

    Known Issues

    • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
    • We have observed that configuration set using "spark.conf.set()" is not passed along to the connector, hence defaults are used by the connector, which may produce unintended results. Consider using .option() or .options() along with the read and write statements for the configuration to take effect.

    Updates

    • The default value of flag aerospike.partition.factor have changed from 12 to 8. Please update your application accordingly.

  • 2.6.0
    Release Date: October 29, 2020
    • Supported until January 29, 2022.
    • Support Writes in Spark SQL Format.
    • Tested with Aerospike Enterprise Edition Server version 5.2.0 & Apache Spark version 2.4.0.

    New Features

    • [CONNECTOR-94] - Support Writes in Spark SQL Format.

    Improvements

    • Aerospike datasource format can be specified with brevity.

  • 2.5.0
    Release Date: October 14, 2020
    • Supported until January 14, 2022.
    • Flexible schema support in spark, to read mixed data types from aerospike bin.
    • Tested with Aerospike Enterprise Edition Server version 5.2.0 & Apache Spark version 2.4.0.

    New Features

    • [CONNECTOR-85] - Support records with a different number of bins and types in a set.
    • [CONNECTOR-82] - Support pushdown of spark datetype and timestamptype.

    Improvements

    • Additional error handling to address underflow and overflow in Short, Int, and Float types.

  • 2.4.0
    Release Date: September 3, 2020
    • Supported until December 3, 2021.
    • Extended primary key types support.

    New Features

    • Introduced a flag aerospike.keyType, to hint primary key type during schema inference.

  • 2.3.1
    Release Date: July 16, 2020
    • Supported until October 16, 2021.
    • Fixed a broken API to create AerospikeConfig instance.

  • 2.3.0
    Release Date: June 19, 2020
    • Supported until September 19, 2021.
    • Nested updateByKey support and prioritizing __digest, __ttl, __generation filters.

    New Features

    • Record insertion can be done by nested updateByKey.
    • Spark Filters are rearranged such that __digest, __ttl, __generation are always in the beginning, if present.

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.
    • updateByKey only supports keys which are accepted by the Java client.

  • 2.2.0
    Release Date: May 12, 2020
    • Supported until August 12, 2021.

    New Features

    • Ability to extend aerospike partitions up to 32768 (2^15).
    • Ability to specify the target set for spark write operations through the aerospike.writeset flag.

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.
    • The default value of aerospike.partition.factor has changed to 12 from 0.
      • Previous to version 2.2, the number of aerospike partitions were computed by 4096 >> f, where f is the aerospike.partition.factor.
      • From version 2.2 onwards, the number of aerospike partitions will be computed by 2^f, where f is the aerospike.partition.factor.

  • 2.1.0
    Release Date: April 28, 2020
    • Supported until July 28, 2021.

    New Features

    • Added capability of streaming writes to Aerospike.

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.

  • 2.0.0
    Release Date: April 15, 2020
    • Supported until July 15, 2021.

    New Features

    • Ability to fine tune up to 4096 scan partitions concurrently.
      • This can be further tuned by setting the aerospike.partition.factor value appropriately.
    • TLS and LDAP support.
    • Ability to query multiple primary keys through connector.

    Improvements

    • Query engine improvements.
    • Ability to specify seed nodes through Aerospike configuration.
    • Ability to specify feature file from configuration or HDFS.
    • Improved error handling in case of write/save failure.
    • Ability to enable client-server compression in spark connector.
    • Ability to set records per second for scans.
    • Fixed issue of duplicate data accumulation in primary key call.

    Known Issues

    • Aerospike Connect for Spark version 2.0 and above is only compatible with Aerospike server version 4.9 and above.

  • 1.1.2
    Release Date: October 21, 2019

    New Features

    • Added explicit schema for saves.

    Known Issues

    • Primary key call will fetch mutiple copies of record, hence accumulating duplicate data.

  • 1.1.0
    Release Date: March 26, 2019
    • Initial Standalone Connector General Availability release.
    • Embedded Spark update.

    New Features

    • Spark 2.4.0 support.
    • Added dataset aeroIncrease function which enables dataset send add/increment operations to Aerospike server.

  • 1.0.0
    Release Date: March 12, 2019
    • Initial Embedded Spark General Availability release.

    New Features

    • Reading from Aerospike to a DataFrame/Dataset.
    • Saving a DataFrame/Dataset to Aerospike.
    • Spark SQL multiple filters pushed down to the Aerospike cluster.
    • Support for Geo points-within-region query using Aerospike.
    • Join a Spark Dataset that contains record keys to record data stored in Aerospike.