This page explains how to install Histogrammar in different ways. Use only the instructions relevant to your situation.

Install from a public repository

Python

Histogrammar is available on PyPI, a publicly accessible Python repository with dependency management.

If you have superuser (root) access

To install the latest version of Histogrammar, use

sudo easy_install histogrammar

or

sudo pip install histogrammar

depending on whether you have pip installed (recommended). Some systems with both Python 2 and 3 use easy_install3 and pip3 to distinguish the Python 3 version.

On freshly minted Ubuntu machines, you can install pip with

sudo apt-get install python-setuptools
easy_install pip

If you do not have superuser access

pip install --user histogrammar

which installs it in ~/.local (Python knows where to find it).

For use in PySpark

PySpark uses both Histogrammar-Python (as an interface) and Histogrammar-Scala (for faster calculations). To use it, you need to download Histogrammar-Python as described immediately above, and launch PySpark with a request for Histogrammar-Scala:

pyspark --packages "io.github.histogrammar:histogrammar_2.12:1.0.11,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11"

Use _2.11 for compatibility with Spark 2.x (Scala 2.11).

In PySpark, you should be able to call

import histogrammar

where df is a DataFrame that you would like to use with Histogrammar. You can now call

h = df.hg_Bin(100, -5.0, 5.0, df["plotme"] + df["andme"])

to get a histogram h of Column expression df["plotme"] + df["andme"]. All of the processing is performed in Java with Spark’s DataFrame optimizations.

Java/Scala or Apache Spark

Histogrammar is available on Maven Central, a publicly accessible Java/Scala repository with dependency management.

Apache Spark

To use Histogrammar in the Spark shell, you don’t have to download anything. Just start Spark with

spark-shell --packages "io.github.histogrammar:histogrammar_2.12:1.0.11,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11"

and call

import org.dianahep.histogrammar._

on the Spark prompt.

Use _2.11 in both jar files for compatibility with Spark 2.x (Scala 2.11).

Java/Scala with Maven

To compile Histogrammar into a project with the Maven build tool, add

<dependency>
  <groupId>io.github.histogrammar</groupId>
  <artifactId>histogrammar_2.12</artifactId>
  <version>1.0.11</version>
</dependency>

to your <dependencies> section. Use _2.11 for compatibility with Scala 2.11.

Scala with sbt

To use Histogrammar in sbt console or to compile it into a project with the sbt build tool, add

libraryDependencies += "io.github.histogrammar" %% "histogrammar" % "1.0.11"

to your build.sbt file. The double-percent gets the appropriate version of Histogrammar for your version of Scala.

Quick start

In fact, the easiest way to start an interactive Scala session with histogrammar is simply to make the following build.sbt:

page.scalaversion := "2.12.13"
libraryDependencies += "io.github.histogrammar" %% "histogrammar" % "1.0.11"

and run sbt console. You don’t need to install Scala or anything other than sbt.