DataWave 6.x Docs - Quickstart Installation

This quickstart provides a single-node standalone DataWave instance that you may use to follow along with the guided tour. It is also generally useful as a development tool, as it provides a consistent, repeatable process for deploying and testing your local DataWave build.

The quickstart provides setup and tear-down automation for DataWave, Hadoop, Accumulo, ZooKeeper, and Wildfly, and it includes many utility functions that will streamline most interactions with these services.

Before You Start

Prerequisites

Linux, Bash and an Internet connection to download tarballs
JDK 11 must be installed with JAVA_HOME and PATH configured accordingly
You should be able to ssh to localhost without a passphrase
- Note that the quickstart’s Hadoop install will set up passphrase-less ssh for you automatically, unless it detects that you already have a private/public key pair generated

System Recommendations

OS: The quickstart has been tested on CentOS 7, RHEL 8 and Ubuntu 20.04
8GB RAM + single quad-core processor (minimum)
Storage: To install everything, you’ll need at least a few GB free
ulimit -u (max user processes): 32768
ulimit -n (open files): 32768
vm.swappiness: 0

Get the Source Code

Clone the Repo

Note: The local path that you select for DataWave’s source code will be refererred to as DW_SOURCE from this point forward. Wherever it appears in a bash command below, be sure to substitute it with the actual path

Docker Alternative

If you would prefer to run the DataWave Quickstart environment as a Docker container, skip the 4 steps described below. Go here instead and view the README file. Note that the Override Default Binaries section below is also relevant to the Docker image build.

Quickstart Setup

# Step 1
$ echo -e "activateDW() {\n source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh\n}" >> ~/.bashrc
# Step 2a
$ source ~/.bashrc
# Step 2b
$ activateDW
# Step 3
$ allInstall
# Step 4
$ datawaveWebStart && datawaveWebTest
# Setup is now complete

The commands above will complete the entire quickstart installation. However, it’s a good idea to at least skim over the sections below to get an idea of how the setup works and how to customize it for your own preferences.

To keep things simple, DataWave, Hadoop, Accumulo, ZooKeeper, and Wildfly will be installed under your DW_SOURCE/contrib/datawave-quickstart directory, and all will be owned by / executed as the current user.

Important: If you currently have any of the above installed locally under any user account, you should ensure that all are stopped/disabled before proceeding

Step 1: Update ~/.bashrc

1.1 Add the Quickstart Environment

This step ensures that your DataWave environment and all its services can be configured on-demand across your various bash sessions, as needed.

  # Step 1
  $ echo -e "activateDW() {\n source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh\n}" >> ~/.bashrc

The activateDW bash function when invoked in your bash session will source env.sh. That script, in turn, bootstraps each DataWave service via its respective {servicename}/bootstrap.sh script. The bootstrap scripts define supporting bash variables and functions, encapsulating configuration and basic start/stop functionality consistently for all services.

1.2 Override Default Binaries (Optional)

To override the quickstart’s default version of a particular installation binary, you may override the desired DW_*_DIST_URI value as shown below. URIs may be local or remote. Local file URI values must be prefixed with file://

  $ vi ~/.bashrc
     ...
     export DW_HADOOP_DIST_URI=file:///my/local/binaries/hadoop-x.y.z.tar.gz
     export DW_ACCUMULO_DIST_URI=http://some.apache.mirror/accumulo/2.1.x/accumulo-2.1.x-bin.tar.gz
     export DW_ZOOKEEPER_DIST_URI=http://some.apache.mirror/zookeeper/3.x/apache-zookeeper-3.x.y.tar.gz
     export DW_WILDFLY_DIST_URI=file:///my/local/binaries/wildfly-17.x.tar.gz
     export DW_MAVEN_DIST_URI=file:///my/local/binaries/apache-maven-x.y.z.tar.gz

     activateDW() {                                                # Added by Step 1
       source DW_SOURCE/contrib/datawave-quickstart/bin/env.sh
     }
     ...

Step 2: Bootstrap the Environment

  $ source ~/.bashrc                                                    # Step 2a
  $ activateDW                                                          # Step 2b

When the activateDW function is invoked for the first time, tarballs for all services will be automatically copied/downloaded from their configured locations into their respective service directories, i.e., under datawave-quickstart/bin/services/{servicename}/.

Additionally, a DataWave build will be kicked off automatically. DataWave’s ingest and web service binaries will be copied into place upon conclusion of the build. This step can take several minutes to complete, so now is a good time step away for a break.

Note: The DW_DATAWAVE_BUILD_COMMAND variable in datawave/bootstrap.sh defines the Maven command used for the build. It may be overridden in ~/.bashrc or from the command line as needed

Step 3: Install Services

  $ allInstall                                                                 # Step 3

The allInstall bash function will initialize and configure all services in the correct sequence. Alternatively, individual services may be installed one at a time, if desired, using their respective {servicename}Install bash functions.

Step 4: Start Wildfly and Test Web Services

  $ datawaveWebStart && datawaveWebTest                                        # Step 4

This will start up DataWave Web and run preconfigured tests against DataWave’s REST API. Note any test failures, if present, and check logs for error messages

What’s Next?

If all web service tests passed in Step 4, then you’re ready for the guided tour.
If you encountered any issues along the way, please read through the troubleshooting guide.
To learn more about the quickstart environment and its available features, check out the quickstart reference.

Tags: