If you're working with Apache Spark in Scala and want to use Bokeh to draw plots, read this page.

Author: Alexey Svyatkovskiy

Setting up

The examples on this page have been tested with Histogrammar 1.0.3. Any subsequent version should work. See the Installation instructions if you need to install it.

It also uses Apache Spark. You might already have access to a Spark cluster (and that’s why you’re here, after all), but if you don’t, you can install it yourself from Spark’s website. Spark can run on a single computer for testing, though its performance advantage comes from parallelizing across a network. The interface on a single computer is identical to the distributed version. For a single-computer installation, choose “pre-built for Hadoop 1.X” (you don’t need Hadoop to be installed, only Java).

If your Spark cluster is version 2.0 or later, start it with

spark-shell --packages "org.diana-hep:histogrammar-bokeh_2.11:1.0.3"

Otherwise, start it with

spark-shell --packages "org.diana-hep:histogrammar-bokeh_2.10:1.0.3"

Plotting a Histogram in Scala

First example of plotting a histogram with scala-bokeh uses Scala and artificial data for the sake of simplicity.

Start by importing the Histogrammar package and the plotting library:

import org.dianahep.histogrammar._
import org.dianahep.histogrammar.bokeh._

Generate artificial data:

val simple = List(3.4, 2.2, -1.8, 0.0, 7.3, -4.7, 1.6, 0.0, -3.0, -1.7)

Book two histograms:

val one = Histogram(5, -5, 8, {x: Double => x})
val two = Histogram(5, -3, 7, {x: Double => x})

Fill both histograms in one line of code using Label class:

val labeling = Label("one" -> one, "two" -> two)
simple.foreach(labeling.fill(_))

Start by plotting histogram one:

val plot_one = one.bokeh().plot()
save(plot_one,"scala_plot_one.html")

Configuring Bokeh Glyph attributes

By default, a line glyph of black color is plotted. One can easily turn this into a bar plot filled with red by passing arguments to bokeh() method as follows:

import io.continuum.bokeh._
val plot_one = one.bokeh(glyphType="histogram",fillColor=Color.Red).plot()
save(plot_one,"scala_plot_one.html")

Superimposing multiple glyphs on one plot

To superimpose two histograms booked and filled above on one plot, one create and configure a glyph for each of the histograms, and call the plot() method awhich ccepts variable length argument list, and therefore can take any number of glyphs.

val glyph_one = one.bokeh() //use default
val glyph_two = two.bokeh(glyphType="histogram",fillColor=Color.Red) //customize
val plot_both = plot(glyph_one,glyph_two)
save(plot_both,"scala_plot_both.html")

Plotting a stack of Histograms

Here is an example of how to make a stacked plot of histograms. Let us generate more artificial data, different from one and two:

val extra = List(3.2, 3.2, -2.1, 1.0, 1.3, -3.4, 0.6, 0.0, -1.0, 1.7)

and book a third histogram:

val three = Histogram(5, -3, 7, {x: Double => x})

Note: only histograms with the same binning can be stacked!

Now, fill it:

extra.foreach(three.fill(_))

Prepare a stacked histogram using a dedicated build() method, and plot it:

val s = Stack.build(two,three)
val glyph_stack = s.bokeh() //use defaults
val plot_stack = plot(glyph_stack)
save(plot_stack,"scala_plot_stack.html")

Plotting a Histogram in spark-shell

The next examples use the CMS public dataset as sample data. Load that into a Spark RDD with

import org.dianahep.histogrammar.tutorial.cmsdata
val events = cmsdata.EventIterator()
val dataset_rdd = sc.parallelize(events.toSeq)

It may take about 20-30 seconds to transfer all the data to your Spark cluster.

Following is an example of plotting a simple histogram with scala-bokeh in the interactive spark-shell (Spark context and SQL context are available as sc and sqlContext). Following assumes that Bokeh and histogrammar jars are included in the classpath:

import org.dianahep.histogrammar._
import org.dianahep.histogrammar.bokeh._

In this example, we plot muon quantities, so extract the muons to their own RDD:

val muons_rdd = dataset_rdd.flatMap(_.muons).filter(_.pz > 2.0)

After data extraction and transformation is completed, the histogram is booked and filled:

val p_histogram = Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)})
val final_histogram = muons_rdd.aggregate(p_histogram)(new Increment, new Combine)

Users are strongly encouraged to learn the syntax of Bokeh package, especially about Glyph and Plot abstractions. Plotting a one dimensional histogram can be done in two simple lines of code:

val myfirstplot = final_histogram.bokeh().plot()
save(myfirstplot,"myfirstplot.html")

The resulting plot is saved to an HTML file and can be viewed and interactively edited in a browser.

Configuring Bokeh Glyph attributes

The above example uses default parameters and styles for the histograms plotted. A number of the attributes can be configured, including glyph type (line or a marker), marker style (e.g. circle, diamond shape), sizes and colors of glyphs. Import Bokeh libraries to be able to configure glyph colors:

import io.continuum.bokeh._

val mysecondplot = final_histogram.bokeh(glyphType="circle",glyphSize=3,fillColor=Color.Blue).plot()
save(mysecondplot,"mysecondplot.html")

Example: superimposing multiple histograms on one plot

To superimpose two or more histograms on a single Bokeh plot one can simply create and customize glyphs for each of the histograms, and then use plot() method passing all of the glyphs as arguments like:

import io.continuum.bokeh._

val p_histogram1 = muons_rdd.aggregate(Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)}, {mu: Muon => mu.pz > 2.0}))(new Increment, new Combine)

val p_histogram2 = muons_rdd.aggregate(Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)}, {mu: Muon => mu.pz > 20.0}))(new Increment, new Combine)

val G1 = p_histogram1.bokeh()
val G2 = p_histogram2.bokeh(glyphType="circle",glyphSize=3,fillColor=Color.Blue)

val mythirdplot = plot(G1,G2)
save(mythirdplot,"mythirdplot.html")

Here, plot() method accepts variable length argument list, and therefore can take any number of glyphs. An alternative API:

def plot(xLabel:String, yLabel: String, glyphs: GlyphRenderer*)

also allows to configure axes titles.

Specifying a legend

Having a GlyphRenderer (a type of object returned by the bokeh() method) and a Plot (a type of object returned by the plot() method) objects one can easily put a Legend onto the plot using built-in Bokeh tools. For instance, given the histograms from the previous example:

val G1 = p_histogram1.bokeh()
val G2 = p_histogram2.bokeh(glyphType="circle",glyphSize=3,fillColor=Color.Blue)
val legend = List("curve1" -> List(G1),"curve2" -> List(G2))

val plots = plot(G1,G2)
val leg = new Legend().plot(plots).legends(legend)
plots.renderers <<= (leg :: _)
save(plots,"mythirdplot_legend.html")

Example: plotting a SparselyHistogram

Same API can be used to plot sparsely binned histograms.

Stack

Example: plotting a stack of histograms

Here is an example of how to make a stacked plot of two histograms. The most common use case in particle physics is to plot various simulated samples for the same final state. Here, a somewhat artificial case is considered when muon and jet momenta from the same sample are considered:

val jets_rdd = dataset_rdd.flatMap(_.jets).filter(_.pz > 3.0)

When booking the histograms, make sure the binning of the histograms to be stacked is the same, otherwise an exception will be thrown:

val p_histogram1 = muons_rdd.aggregate(Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)}))(new Increment, new Combine)

val p_histogram2 = jets_rdd.aggregate(Histogram(100, 0, 200, {jet: Jet => math.sqrt(jet.px*jet.px + jet.py*jet.py + jet.pz*jet.pz)}))(new Increment, new Combine)

val s = Stack.build(p_histogram1,p_histogram2)
val mystackplot = plot(s.bokeh(): _*)
save(mystackplot,"stackplot.html")