This quickstart provides a single-node standalone DataWave instance that you may use to follow along with the guided tour. It is also generally useful as a development tool, as it provides a consistent, repeatable process for deploying and testing your local DataWave build.

The quickstart provides setup and tear-down automation for DataWave, Hadoop, Accumulo, ZooKeeper, and Wildfly, and it includes many utility functions that will streamline most interactions with these services.

Before You Start

Prerequisites

  • Linux, Bash and an Internet connection to download tarballs
  • JDK 11 must be installed with JAVA_HOME and PATH configured accordingly
  • You should be able to ssh to localhost without a passphrase
    • Note that the quickstart’s Hadoop install will set up passphrase-less ssh for you automatically, unless it detects that you already have a private/public key pair generated

System Recommendations

  • OS: The quickstart has been tested on CentOS 7, RHEL 8 and Ubuntu 20.04
  • 8GB RAM + single quad-core processor (minimum)
  • Storage: To install everything, you’ll need at least a few GB free
  • ulimit -u (max user processes): 32768
  • ulimit -n (open files): 32768
  • vm.swappiness: 0

Get the Source Code

Clone the Repo

Docker Alternative

If you would prefer to run the DataWave Quickstart environment as a Docker container, skip the 4 steps described below. Go here instead and view the README file. Note that the Override Default Binaries section below is also relevant to the Docker image build.


Quickstart Setup

# Step 1
$ echo -e "activateDW() {\n source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh\n}" >> ~/.bashrc
# Step 2a
$ source ~/.bashrc
# Step 2b
$ activateDW
# Step 3
$ allInstall
# Step 4
$ datawaveWebStart && datawaveWebTest
# Setup is now complete

The commands above will complete the entire quickstart installation. However, it’s a good idea to at least skim over the sections below to get an idea of how the setup works and how to customize it for your own preferences.

To keep things simple, DataWave, Hadoop, Accumulo, ZooKeeper, and Wildfly will be installed under your DW_SOURCE/contrib/datawave-quickstart directory, and all will be owned by / executed as the current user.

Step 1: Update ~/.bashrc

1.1 Add the Quickstart Environment

This step ensures that your DataWave environment and all its services can be configured on-demand across your various bash sessions, as needed.

  # Step 1
  $ echo -e "activateDW() {\n source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh\n}" >> ~/.bashrc

The activateDW bash function when invoked in your bash session will source env.sh. That script, in turn, bootstraps each DataWave service via its respective {servicename}/bootstrap.sh script. The bootstrap scripts define supporting bash variables and functions, encapsulating configuration and basic start/stop functionality consistently for all services.

1.2 Override Default Binaries (Optional)

To override the quickstart’s default version of a particular installation binary, you may override the desired DW_*_DIST_URI value as shown below. URIs may be local or remote. Local file URI values must be prefixed with file://

  $ vi ~/.bashrc
     ...
     export DW_HADOOP_DIST_URI=file:///my/local/binaries/hadoop-x.y.z.tar.gz
     export DW_ACCUMULO_DIST_URI=http://some.apache.mirror/accumulo/2.1.x/accumulo-2.1.x-bin.tar.gz
     export DW_ZOOKEEPER_DIST_URI=http://some.apache.mirror/zookeeper/3.x/apache-zookeeper-3.x.y.tar.gz
     export DW_WILDFLY_DIST_URI=file:///my/local/binaries/wildfly-17.x.tar.gz
     export DW_MAVEN_DIST_URI=file:///my/local/binaries/apache-maven-x.y.z.tar.gz

     activateDW() {                                                # Added by Step 1
       source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh
     }
     ...

Step 2: Bootstrap the Environment

  $ source ~/.bashrc                                                    # Step 2a
  $ activateDW                                                          # Step 2b

When the activateDW function is invoked for the first time, tarballs for all services will be automatically copied/downloaded from their configured locations into their respective service directories, i.e., under datawave-quickstart/bin/services/{servicename}/.

Additionally, a DataWave build will be kicked off automatically. DataWave’s ingest and web service binaries will be copied into place upon conclusion of the build. This step can take several minutes to complete, so now is a good time step away for a break.

Step 3: Install Services

  $ allInstall                                                                 # Step 3

The allInstall bash function will initialize and configure all services in the correct sequence. Alternatively, individual services may be installed one at a time, if desired, using their respective {servicename}Install bash functions.

Step 4: Start Wildfly and Test Web Services

  $ datawaveWebStart && datawaveWebTest                                        # Step 4

This will start up DataWave Web and run preconfigured tests against DataWave’s REST API. Note any test failures, if present, and check logs for error messages


What’s Next?

  • If all web service tests passed in Step 4, then you’re ready for the guided tour.
  • If you encountered any issues along the way, please read through the troubleshooting guide.
  • To learn more about the quickstart environment and its available features, check out the quickstart reference.