This page explains how to install Histogrammar in different ways. Use only the instructions relevant to your situation.
Install from a public repository
Python
Histogrammar is available on PyPI, a publicly accessible Python repository with dependency management.
If you have superuser (root) access
To install the latest version of Histogrammar, use
sudo easy_install histogrammar
or
sudo pip install histogrammar
depending on whether you have pip
installed (recommended). Some systems with both Python 2 and 3 use easy_install3
and pip3
to distinguish the Python 3 version.
On freshly minted Ubuntu machines, you can install pip
with
sudo apt-get install python-setuptools
easy_install pip
If you do not have superuser access
pip install --user histogrammar
which installs it in ~/.local
(Python knows where to find it).
For use in PySpark
PySpark uses both Histogrammar-Python (as an interface) and Histogrammar-Scala (for faster calculations). To use it, you need to download Histogrammar-Python as described immediately above, and launch PySpark with a request for Histogrammar-Scala:
pyspark --packages "io.github.histogrammar:histogrammar_2.12:1.0.11,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11"
Use _2.11
for compatibility with Spark 2.x (Scala 2.11).
In PySpark, you should be able to call
import histogrammar
where df
is a DataFrame
that you would like to use with Histogrammar. You can now call
h = df.hg_Bin(100, -5.0, 5.0, df["plotme"] + df["andme"])
to get a histogram h
of Column expression df["plotme"] + df["andme"]
. All of the processing is performed in Java with Spark’s DataFrame optimizations.
Java/Scala or Apache Spark
Histogrammar is available on Maven Central, a publicly accessible Java/Scala repository with dependency management.
Apache Spark
To use Histogrammar in the Spark shell, you don’t have to download anything. Just start Spark with
spark-shell --packages "io.github.histogrammar:histogrammar_2.12:1.0.11,io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11"
and call
import org.dianahep.histogrammar._
on the Spark prompt.
Use _2.11
in both jar files for compatibility with Spark 2.x (Scala 2.11).
Java/Scala with Maven
To compile Histogrammar into a project with the Maven build tool, add
<dependency>
<groupId>io.github.histogrammar</groupId>
<artifactId>histogrammar_2.12</artifactId>
<version>1.0.11</version>
</dependency>
to your <dependencies>
section. Use _2.11
for compatibility with Scala 2.11.
Scala with sbt
To use Histogrammar in sbt console
or to compile it into a project with the sbt build tool, add
libraryDependencies += "io.github.histogrammar" %% "histogrammar" % "1.0.11"
to your build.sbt
file. The double-percent gets the appropriate version of Histogrammar for your version of Scala.
Quick start
In fact, the easiest way to start an interactive Scala session with histogrammar is simply to make the following build.sbt
:
page.scalaversion := "2.12.13"
libraryDependencies += "io.github.histogrammar" %% "histogrammar" % "1.0.11"
and run sbt console
. You don’t need to install Scala or anything other than sbt.