SIGN-IN

SPARK (Application)

Introduction

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Licensing

Apache 2.0 license

Availability Table

System Versions
dusky 2.3.0, 2.2.1, 2.1.1

Announcements

Mar 6, 2018: Redfin has been shutdown to be reinstalled with the Compute Canada software stack, therefore the sharcnet spark modules are no longer available on it. To compensate, the new sharcnet spark 2.2.1 and 2.3.0 modules have been installed on wobbie which has four 128GB memory 24core nodes and two 512GB memory 24core nodes.

[roberpj@wob97:~] module avail |& egrep '2714|364'
spark/python2714/2.2.1
spark/python2714/2.3.0
spark/python364/2.2.1
spark/python364/2.3.0

Change Log

Mar 6, 2018: Installed spark/python2714/2.3.0 and spark/python364/2.3.0 modules on dusky, iqaluk, mosaic, orca, vdi-centos6 and wobbie.
Feb 23, 2018: Installed spark/python2714/2.2.1 and spark/python364/2.2.1 modules on dusky, iqaluk, mosaic, orca, vdi-centos6 and wobbie.
Jun 01, 2017: Removed old legacy/flavorless spark/1.6.2 and spark/2.0.0 modules.
May 29, 2017: Install spark/python2713/2.1.1 on dusky, iqaluk, mosaic, orca, redfin, vdi-centos6 for single node parallel use in the threaded queue.
Aug 03, 2016: Installed spark/1.6.2 and spark/2.0.0 on iqaluk, mosaic, orca, redfin, vdi-centos6 for single node parallel use in the threaded queue.
Mar 16, 2016: Installed spark/1.6.1 on mosaic using the spark-1.6.1-bin-hadoop2.6.tgz binary package for use on single compute node in threaded or exclusive mode.
Sep 01, 2015: Installed spark/1.4.1 on mosaic using the spark-1.4.1-bin-hadoop2.6.tgz binary package for use on single compute node in threaded or exclusive mode.