OpenVDB Visual Studios Solution

Couple of people have been asking for some help with compiling OpenVDB on Windows, so here is a quick solution I’ve put together. It contains 4 projects in the openvdb_build directory to be compiled under /MD in VS2010 (pretty sure it’ll work for 2012 as well). I’ve included the binaries for the dll and viewer application, though you will require the vs2010 crt libs from vs2010. The few changes made to the source (mostly for the viewer components) I explain here for compiling with /MT.

Download

- A dynamic library of the main OpenVDB toolkit excluding the viewer components
- A static library of the main OpenVDB toolkit excluding the viewer components
- A static library of the main OpenVDB toolkit with the viewer components (I was unable to build the viewer by linking to a dll)
- The OpenVDB viewer application

The prerequisites are still required, so download and build the following and all you need to do is add the additional include & lib directories in the projects:

- OpenEXRs Half.h and subsequent lib/dll
- TBB
- Zlib

And for the static viewer library:

- glew
- glfw legacy version (under version 3)

Nothing fancy here, just download them and compile under /MD for your required build as per usual. If you want to build the OpenVDB’s viewer application you’ll need to build the openvdb_2.3.0_viewer_static project and link to that. Everything is built in /bin/<project name>/<arch>/<config>/

 

Posted in Compiling OpenVDB & the OpenVDB Viewer on Windows 7, OpenVDB

GPU implementation of Stam’s implicit fluid solver using CUDA

Updated 20/04/2014: Video capture

In this project I wanted to investigate and demonstrate the effectiveness of using Compute Unified Device Architecture code to implement graphical simulations and update visual representations with effective optimization and understanding of graphics memory architecture. The focus is on fluid simulations, explicitly looking at the implementation of a simple implicit fluid solver for the Navier-Stokes equations for incompressible flow, based on the semi-Lagrangian simulation method introduced by Jos Stam’s stable Fluids. The project code uses a sparse linear solver to support arbitrary user input collision boundaries, with planned extensions into more detailed PIC and FLIP solvers.

The video above was simulated and filmed using a NVIDIA Geforce 570GTX with a compute capability of 2.0. Frame playback rates of the following solver grids are listed below:

128×128 Solver Grid  @ 60fps synced x1 Density field
256×256 Solver Grid  @ 60fps synced x1 Density field
512 x 512 Solver Grid  @ 60fps synced x1 Density field - up to a maximum of 7 density fields being simulated and displayed, retaining 60fps
1024 x 1024 Solver Grid  @ 25fps x1 Density field - 9 fields result at 15fps
2048 x 2048 Solver Grid  @ 8fps x1 Density field

I’m planning on going through the fundementals of the implementation and the core principles behind the solver soon. Below are some quick screen caps showing some of the features I have working in context of the video above.

Fluid Example

A single density field with multiple input values on a 2D grid of size 1024×1024 at 25fps

Multiple Fluid Densities

5 different fluid densities symbolised by different colours mixing. The colours and density values can be assigned by the user. A 2D grid of size 1024×1024 at 20fps

Fluid Velocity in 2D

A velocity visualisation of a 2D grid of size 1024×1024 at 25fps

Fluid Collision

Fluid colliding with an arbitrary barrier. The collision barriers can be of any shape and can be drawn on interactively whilst the application is running. 128×128 grid at 60fps sync’d.

Fluid Velocity in 3D

A velocity visualisation of a voxelized 3D grid of size 64x64x64 at 17fps

More info coming asap, thanks for reading!

Tagged with: , , , , , ,
Posted in C++, CUDA, CUDA Fluids, OpenGL, Project, Project: CUDA Fluids

Setting up NVCC for CUDA in Qt

When using the cuda libraries, a seperate compilation and linking process is required for device specific portions of code. The easiest way to do this is to use nvcc (the Nvidia CUDA Compiler). All cuda sources can contain both C/ C++ host code and device functions, as nvcc will seperate these components and handle the host side of things with a general purpose compiler insinuated by the platform. All device methods and attributes must be compiled using NVIDIA compilers/assemblers, which embed the compiled GPU functions as load images in the host object file.

Setting it up in VS is pretty easy due to the template project support from the cuda samples, however to allow me to work on unix based systems I tend to work in Qt. It wasn’t too difficult to find out how to add extra compiler commands to QMAKE, but took a while to set up nicely. It’s also nice to have a .pro which can deal with both linux and Windows. The below is the essential extracts just for configuring nvcc from my .pro. Flags such as SYSTEM_TYPE and NVCC_OPTIONS I set under their respective architecture/platform identities using QMAKE’s parsing and inbuilt flags when setting the main variables for the general host compilation (such as checking QMAKE_TARGET.arch with :contains and generally sub-dividing platforms and compilers with :{} ). I’ve provided an example of this for use on Windows when using MSVC as it requires some extra settings.


## SYSTEM_TYPE - compiling for 32 or 64 bit architecture
SYSTEM_TYPE = 64

## CUDA_COMPUTE_ARCH - This will enable nvcc to compiler appropriate architecture specific code for different compute versions.
## Multiple architectures can be requested by using a space to seperate. example:
## CUDA_COMPUTE_ARCH = 10 20 30 35
CUDA_COMPUTE_ARCH = 20

## CUDA_DEFINES - The seperate defines needed for the cuda device and host methods
CUDA_DEFINES +=

## CUDA_DIR - the directory of cuda such that CUDA\<version-number\ contains the bin, lib, src and include folders
CUDA_DIR= 

## CUDA_SOURCES - the source (generally .cu) files for nvcc. No spaces in path names
CUDA_SOURCES+=

## CUDA_LIBS - the libraries to link
CUDA_LIBS= -lcuda -lcudart 

## CUDA_INC - all incldues needed by the cuda files (such as CUDA\<version-number\include)
CUDA_INC+= $$CUDA_DIR\include

## NVCC_OPTIONS - any further options for the compiler
NVCC_OPTIONS += --use_fast_math --ptxas-options=-v

win32:{

  #Debug and Release flags necessary for compilation and linking
  QMAKE_CFLAGS_DEBUG += /MTd
  QMAKE_CXXFLAGS_DEBUG += /MTd
  QMAKE_CFLAGS_RELEASE += /MT
  QMAKE_CXXFLAGS_RELEASE += /MT
  # The following library conflicts with something in Cuda
  QMAKE_LFLAGS_RELEASE = /NODEFAULTLIB:msvcrt.lib
  QMAKE_LFLAGS_DEBUG = /NODEFAULTLIB:msvcrtd.lib

  win32-msvc2012:contains(QMAKE_TARGET.arch, x86_64):{
       #Can also set SYSTEM_TYPE here
       CONFIG(debug, debug|release) {
            #Debug settings
            message("Using x64 Debug arch config MSVC2012 for build")
            #read as: --compiler-options options,... + ISO-standard C++ exception handling
            # + speed over size, + create debug symbols, + code generation multi-threaded debug
            NVCC_OPTIONS += -Xcompiler /EHsc,/O2,/Zi,/MTd -g
        }
        else {
            #Release settings
            message("Using x64 Release arch config MSVC2012 for build")
            #read as: --compiler-options options,... + ISO-standard C++ exception handling
            # + speed over size, + code generation multi-threaded
            NVCC_OPTIONS += -Xcompiler /EHsc,/O2,/MT
        }
}
}

## correctly formats CUDA_COMPUTE_ARCH to CUDA_ARCH with code gen flags
## resulting format example: -gencode arch=compute_20,code=sm_20
for(_a, CUDA_COMPUTE_ARCH):{
    formatted_arch =$$join(_a,'',' -gencode arch=compute_',',code=sm_$$_a')
    CUDA_ARCH += $$formatted_arch
}

## correctly formats CUDA_DEFINES for nvcc
for(_defines, CUDA_DEFINES):{
    formatted_defines += -D$$_defines
}
CUDA_DEFINES = $$formatted_defines

#nvcc config
CONFIG(debug, debug|release) {
        #Debug settings
        CUDA_OBJECTS_DIR = cudaobj/$$SYSTEM_NAME/Debug
        cuda_d.input = CUDA_SOURCES
        cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
        cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$CUDA_DEFINES $$NVCC_OPTIONS $$CUDA_INC $$CUDA_LIBS --machine $$SYSTEM_TYPE $$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
        cuda_d.dependency_type = TYPE_C
        QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
        # Release settings
        CUDA_OBJECTS_DIR = cudaobj/$$SYSTEM_NAME/Release
        cuda.input = CUDA_SOURCES
        cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
        cuda.commands = $$CUDA_DIR/bin/nvcc $$CUDA_DEFINES $$NVCC_OPTIONS $$CUDA_INC $$CUDA_LIBS --machine $$SYSTEM_TYPE $$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
        cuda.dependency_type = TYPE_C
        QMAKE_EXTRA_COMPILERS += cuda
}

Note that as nvcc practically mimics gcc, it requries its own set defines, includes and library paths for the compilation and linking processes in the .pro, and for the most part cannot deal with spaces in file paths and names!

Tagged with: , , , , , ,
Posted in C++, CUDA, NVCC for CUDA, Qt

CUDA random number generation: Host vs. Device

There are plenty of different methods which allow a programmer to grab a bunch of pseudo random numbers which take into account the accuracy of truly randomised values vs. speed. CUDA and thrust parallel primitives offer a variety of host and device API methods to generate random numbers, but also provide a good insight into the processing speed comparison vs. the CPU.

So, as more of a technical experiment to start observing the real performance differences, I started running some basic tests on arbitrary size data containers using four different methods (the first three use host API calls). I’ll go through each one and display their respective performance difference:

  1. thrust::generate on the host
  2. thrust primitives with counting_iterators and transform on the device
  3. cuRand’s Pseudo Random Generator on the device
  4. Using cuRand’s device API with curandStates to initialize separate CUDA kernels

thrust::generate is a very simple example of using the CPU to quickly fill a container with random numbers and can practically be summarised in a few lines, where _rand is a host function which simply sets the limits on the calculation:


__host__ static __inline__ float _rand()
{
    return (0.0 + (float)(rand())/((float)(RAND_MAX/(1.0f - 0.0f))));
}

thrust::host_vector<float> h_vec(100, 0); //creating 100 random numbers
thrust::generate(h_vec.begin(), h_vec.end(), _rand);
//force swap to clear
h_vec.clear();
thrust::host_vector<float>().swap(h_vec);

Pretty simple, the point of showing this is to be able to compare the speed of this method to the other three GPU based implementations.

The second is slightly more complicated, but is from what I can see the easiest way of creating a device vector of random numbers using the GPU without using curand. Here, we use a tranformation of counting iterators to calculate a value per element of the rng sequence. Basically each counting iterator is fed into an op which discards the iterator amount from a created thrust::default_random_engine to avoid correlation and applys a distribution (if needed). I’ve found the need to explicitly set the execution policy with the first parameter thrust::device for the transform, else you tend to get some crazy results.


int _seed = rand();
thrust::device_vector<float> d_data(100); //100 floats
thrust::counting_iterator<unsigned int> index_sequence_begin(_seed);
thrust::transform(thrust::device, index_sequence_begin, index_sequence_begin + (100), d_data.begin(), psrngen(0.0f, 1.0f));

//force swap to clear
d_data.clear();
thrust::device_vector<float>().swap(d_data);

The operator in this case can be defined with our own structure, hence the choice of an applied uniform distribution in this implemention.


struct psrngen
{
    __host__ __device__ psrngen(float _a, float _b) : a(_a), b(_b) {;}

    __host__ __device__ float operator()(const unsigned int n) const
    {
        thrust::default_random_engine rng;
        thrust::uniform_real_distribution<float> dist(a, b);
        rng.discard(n);
        return dist(rng);
    }
    float a, b;

};

This method gives a the ability to easily change the limits and way the random numbers are created, however as you will see below, a simple curand implemention will do the same.

With the third method, just as before, we create a pseudo_random_generator (this time using curand), pass it a randomly generated seed and leave it down to the basic wrapped API methods to handle the generation. The function names are pretty self explanatory, with curandGenerateUniform ‘s default limits being between 0.0f and 1.0f. checkCudaErrors & CURAND_CALL are basic checks for the status returns.


float *deviceData;
curandGenerator_t gen;
srand(time(NULL));
int _seed = rand();
//allocate space for 100 floats on the GPU
//could also do this with thrust vectors and pass a raw pointer
checkCudaErrors(cudaMalloc((void **)&deviceData, sizeof(float) * 100));
CURAND_CALL(curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT));
CURAND_CALL(curandSetPseudoRandomGeneratorSeed(gen, _seed));
CURAND_CALL(curandGenerateUniform(gen, deviceData, 100)); //generate the random numbers
CURAND_CALL(curandDestroyGenerator(gen));
checkCudaErrors(cudaFree(deviceData));

deviceData = NULL;

The last way I’m going to show is by using curand device API on kernels. Here, we need to allocate space for each individual kernels curandState as well as the device container for the random results. We can then initialize the kernels we are going to use to a particular curandState (the state of the random number generator) by calling curand_init with a given seed and different subsequence value to create a different starting position per kernel. You have the ability here to generate the same sequences for each thread by giving curand_init the same subsequence and offset values!

It’s then a simple case of launching the initialized kernels again ( to avoid calling curand_init) and using the wrapper functions to generate either pseudorandom or quasi random numbers. I’m using curand_uniform to generate floats between 0.0f and 1.0f as before. You can copy the generator state used in local memory for fast generation, and store it back in global memory between kernel launches.

srand(time(NULL));

//naively setting the threads per block and block per grid sizes, where 100 is the amount of rngs
int threadsPerBlock = 1024;
int nBlocks = 100/threadsPerBlock + 1;

//alocate space for each kernels curandState
curandState* deviceStates;
checkCudaErrors(cudaMalloc(&deviceStates, 100*sizeof(curandState)));

//call curand_init on each kernel with the same random seed
//and init the rng states
initialise_curand_on_kernels<<<nBlocks, threadsPerBlock>>>(deviceStates, unsigned(time(NULL)));
getLastCudaError("initialise_curand_on_kernels failed");
cudaSyncAndCheck();

//allocate space for the device container of rns
float* d_random_floats;
checkCudaErrors(cudaMalloc((void**) &d_random_floats, sizeof(float)* 100));

//calculate per element of the container a rn
set_random_number_from_kernels<<<nBlocks, threadsPerBlock>>>(d_random_floats, deviceStates, 100);
getLastCudaError("set_random_number_from_kernels failed");
cudaSyncAndCheck();

//cleanup
checkCudaErrors(cudaFree(d_random_floats));
checkCudaErrors(cudaFree(deviceStates));

The init kernel only requries a call to curand_init:

 

__global__ void initialise_curand_on_kernels(curandState * state, unsigned long seed)
{
    int idx = blockIdx.x*blockDim.x+threadIdx.x;
    curand_init(seed, idx, 0, &state[idx]);
}

And the generator essentially sets up and calls one of curand’s distributors:

 

__device__ float generate(curandState* globalState, int ind)
{
    //copy state to local mem
    curandState localState = globalState[ind];
    //apply uniform distribution with calculated random
    float rndval = curand_uniform( &localState );
    //update state
    globalState[ind] = localState;
    //return value
    return rndval;
}

__global__ void set_random_number_from_kernels(float* _ptr, curandState* globalState, const unsigned int _points)
{
    int idx = blockIdx.x*blockDim.x+threadIdx.x;
    //only call gen on the kernels we have inited
    //(one per device container element)
    if (idx < _points)
    {
        _ptr[idx] = generate(globalState, idx);
    }
}

Now the interesting part; how long do each of these methods take to perform for a variety of random numbers! The test cases I have run always apply a uniform distribution and generate numbers between 0.0f and 1.0f using the corresponding API methods for different container sizes (10, 10000, 10000000) and take an average time of 100 calls to each case. Bear in mind that the data in all these GPU examples is being left on the device, and the timings do not account for device->host overhead if it is needed. There is an issue I’m facing for larger amounts using the device API, I think due to the naive way I’m launching the kernels, so I’ve only managed to provide results for n= 10 and 10000 for this method. I’m using the timer code I previously uploaded to monitor these.

Random Number Speed Tests

Random Number Speed Tests

The allocation and processing speed of the CPU will pretty much always be faster with smaller container sizes, bearing in mind this could be paralyzed even further (by using OpenMP for example), but the main limitation of this is it does  require a halt to the translation unit. I’m uncertain how thrust deals with the device generation, however for smaller and decently sized container sizes it seems to perform really well. cuRand’s host API will almost always guarantee a result dependant on your hardware and generator type, the container size does not deviate the processing speed much due to the way it optimizes the generation on the device, however you can get better performance even on this by using the device API. In general, you will get the best performance from cuRand by generating containers of random numbers that are as large as possible. The docs on the cuRand library are really good and the examples worth looking at for more info.

source code

Tagged with: , , , , , , ,
Posted in C++, CUDA, CUDA Random Number Generation

Timing portions of code

Recently I’ve been getting deeper into OpenMP & CUDA parallelism, and it’s been interesting to see the running time of different optimized functions. There’s been times in the past where I’ve glazed over speed optimizations for bits of code (mainly due to time contraints – how ironic), going on the basis of “this looks fast, it’s fast enough”. Obviously this is bad. And there are plenty of profilers out there to test the timings of different methods! However for convenience, I use the below to give me a somewhat accurate representation of the speed of different functions.

There doesn’t seem to be a particularly reliable timer on Windows, as QueryPerformanceCounter will return different results from a multithreaded program running on a multicore computer. timeGetTime() will only give you accurate results to ms, however for single executing threads QueryPerformanceCounter should be fine, and will return double precision ms. This just makes it quick and easy to see what different implementations yeild!

#ifndef NULL
#def NULL ((void *)0)
#endif

#ifdef _WIN32
#  define WINDOWS_LEAN_AND_MEAN
#  define NOMINMAX
#  include <windows.h>
#else
#  include <ctime>
#  include <sys/time.h>
#endif

namespace utils {

class stopclock
{
public:
    stopclock() {}
    virtual ~stopclock() {;}

public:
    virtual void startTimer() = 0;
    virtual void stopTimer() = 0;
    virtual void resetTimer() = 0;
    virtual const double& getTimeInMilliseconds() = 0;
};

#ifdef _WIN32
class winStopClock : public stopclock
{
public:

    winStopClock() {
        unsigned __int64 freq;
        QueryPerformanceFrequency((LARGE_INTEGER*)&freq);
        timerFrequency = (1.0/freq);
        active = false;
        endTime = startTime = 0;
    }
    ~winStopClock() {;}

    inline void startTimer() { QueryPerformanceCounter((LARGE_INTEGER *)&startTime);
                               endTime = 0;
                               active = true; }
    inline void stopTimer() {
                                if (active)
                                {
                                    QueryPerformanceCounter((LARGE_INTEGER *)&endTime);
                                    timeDifferenceInMilliseconds = ((endTime-startTime) * timerFrequency);
                                    active = false;
                                }
                            }
    inline void resetTimer() { QueryPerformanceCounter((LARGE_INTEGER *) &startTime);
                               endTime = 0;
                               active = true; }
    inline const double& getTimeInMilliseconds() { if (active) stopTimer(); active = true;
                                                   return timeDifferenceInMilliseconds; }

private:
    unsigned __int64 startTime, endTime;
    double timerFrequency, timeDifferenceInMilliseconds;
    bool active;
};

#else

class linuxStopClock : public stopclock
{
public:

    linuxStopClock() {
        active = false;
        endTime = startTime = 0;
    }
    ~linuxStopClock() {;}

    inline void startTimer() { gettimeofday(&tim, NULL);
                               startTime=tim.tv_sec+(tim.tv_usec * 0.0000001);
                               endTime = 0;
                               active = true; }
    inline void stopTimer() {
                                if (active)
                                {
                                    gettimeofday(&tim, NULL);
                                    endTime=tim.tv_sec+(tim.tv_usec * 0.0000001);
                                    active = false;
                                }
                            }
    inline void resetTimer() { gettimeofday(&tim, NULL);
                               startTime=tim.tv_sec+(tim.tv_usec * 0.0000001);
                               endTime = 0;
                               active = true; }
    inline const double& getTimeInMilliseconds() { if (active) stopTimer(); active = true;
                                                   tref = endTime-startTime;
                                                   return tref; }

private:
    struct timeval tim;
    double startTime, endTime ,tref;
    bool active;
};

#endif

inline bool createTimer(stopclock **_timer)
{
#ifdef _WIN32
    *_timer = (stopclock *)new winStopClock();
#else
    *_timer = (stopclock *)new linuxStopClock();
#endif
    return (*_timer != NULL) ? true : false;
}

inline bool removeTimer(stopclock **_timer)
{
    if (*_timer)
    {
        delete *_timer;
        *_timer = NULL;
    }
    return true;
}
}

And use like so:

#include<iostream>

utils::stopclock *s;
utils::createTimer(&s);
s->startTimer();

Sleep(1000);
//do_something();
s->stopTimer();

std::cerr<<s->getTimeInMilliseconds()<<std::endl;
utils::removeTimer(&s);
Tagged with: , , , ,
Posted in C++, Timing portions of code, Uncategorized Code

OpenVDB standalone GUI with OpenGL

This page is dedicated to looking at a project I undertook, set by MPC, which focuses on integrating OpenVDB support into my own windows based application, capable of allowing the drawing of OpenVDB data in OpenGL. The app also presents a variety of memory preferences and data controls to help understand the information in a VDB file. The result is a fairly concise UI allowing for different representations of the data via voxel trees. Check out the video below which demos some of the key features I have working. I plan to keep improving and updating the app when I can!

Key Feature Set

  • Multiple data representations for scalar and vector grids, including Voxel Tree breakdowns, Mesh topology creation and scalar/vector attribute visualization
  • Cycling through independent and time varying attributes per file, known as grids or channels
  • Fast interactive clipbox selection of point regions with large data sets
  • Grid & channel information feedback via UI and document generation
  • Scalar and vector colour ramping using GLSL HSV interpolation
  • GLSL/CUDA scalar and element or magnitude vector filtering. Ability to cumulatively apply scalar filters.
  • Fast user input expression evaluation on large data sets using muParser: http://muparser.beltoforion.de/ built with OpenMP support.
  • GL buffer management, updates and data feedback with CUDA enabled, such as specific numbers of selected or filtered points
  • Multiple data management preferences in terms of grid storage and visualization

The project can currently be compiled on Windows using VC++10/11 for MSVC2010/2012 respectively (check out my blog post here for some fun OpenVDB building for Windows) or on Linux with gcc and clang, with CUDA and muParser functionality as optional. I plan to make the code available as soon as I’ve tidied it up a bit and perhaps built a VS project for support, as it has been created within the Qt IDE and using its windows API.

[Current Version V0.46]

Changes:

  • Optional CUDA implementation of kernel filtering with thrust parallel primitives. This allows for cumulative total feedback from clipping, filtering and expressions
  • Built in muParser (http://muparser.beltoforion.de/) expression evaluation. Although evaluation is processed on the Host, fast memory and visualization updates can be achieved with CUDA
  • Improvements to clipbox interactive selection
  • Updates to preferences and menu
  • 64bit improvements
  • Bug fixes
Tagged with: , ,
Posted in C++, OpenGL, OpenVDB, OpenVDB GUI, Project, Project: OpenVDB GUI

Compiling OpenVDB & the OpenVDB Viewer on Windows 7

Hello all! I thought I’d post my notes I made a while back which helped me setup OpenVDB on Windows based systems using VS in case anyone finds it useful.

Step by step, compiling openVDB and the openVDB Viewer V(2.1) on Windows 7 under x86/64 for MT/MTd using the VC++10/11 for VS2010/2012. There’s not much that differs from Joe Kider’s post for openVDB 0.96.0 here https://groups.google.com/forum/#!topic/openvdb-forum/ZTEbIAk9kdc. However I’ve quickly expanded below on the setup of the prerequisites for anyone that needs it.

Prerequisites for core OpenVDB

We’ll be building all these libraries from source using the pre-set project folders for VS included with the downloads. You’ll need the following for the main source. Keep in mind we’re building for static linking:

  • Boost, Download latest libs http://www.boost.org/users/download/
  • zlib (and the zlib obj files)
  • OpenEXR’s Half class from IlmBase V1.0.3. You can opt to only download this or build the complete OpenEXR lib if you don’t already have it
  • TBB

- 1a) Download zlib source (Tested with V1.25) and the zlib assembly code for 32 or 64bit http://www.winimage.com/zLibDll/index.html if you don’t have MASM.

- 1b) Copy the folder extracted from the assembly source called contrib. Paste this into the extract zlib source zlib-1.2.5 and confirm all merge prompts with the existing folders. Alternatively manually copy the obj files; navigate to the extracted folder from zlib124_masm_obj\contrib\masmx86 for 32 or masmx64 for 64. Copy both obj files. Paste them into the zlib source under zlib-1.2.5\contrib\masm x86/x64

- 1c) Go to the VS proj folder zlib-1.2.5\contrib\vstudio\vc10. Open up the solution. The proj you are after is the static lib (zlibstat), build with your required architecture and config (release/debug).

- 2a) If you only wish to build the Half class which is needed as a minimum, download IlmBase from OpenEXRs source (Tested V1.0.3) here http://www.openexr.com/downloads.html

- 2b) Navigate to the extracted folder\vc\vc9 and open the solution. Follow the prompts to convert the project (select no when asked to create a backup)

- 2c) The post build events for these projects assume you’re building the complete OpenEXR package and will create the deploy folder 2 directories up of your extracted location. You could just remove this step, however the folder structure it creates becomes much easier for include/lib paths later on. However if you wish to, just right click on each project in turn, > Properties > Build Events > Post-Build Event and removing the install command (install<projName>.cmd $(IntDir)).

- 2d) Build the Half project with your required architecture and config. If building in debug you may have to init the build twice for symbols.

- 3a) Download TBB source (Tested with V4.2 u2 – tbb42_20131118oss) https://www.threadingbuildingblocks.org/download

- 3b) Navigate to the extracted folder\build\vs2010, open the solution and build with your required arch. NOTE: I seem to get some redefinition errors for malloc and free later even by switching the code generation to MT and removing the cmd line overwrite, but currently just forcing multiple links later or choosing to remove defaultlib msvcrt.

Prerequisites for OpenVDB Viewer

For the viewer you’ll be needing glew and GLFW (glu as well but this should be standard). If you’re not actually looking to include to the viewer in the static lib just skip these.

  • glew
  • GLFW

- 4a) If somehow you don’t already have glew, download the source from http://glew.sourceforge.net/

- 4b) Navigate to build\vc10 and open and build the solutions static and shared libs.

- 5a) OpenVDB uses an older version of GLFW, as far as I know any version 2.x will work and is much easier to include that to try and update to 3.x. Download from https://github.com/glfw/glfw-legacy

- 5b) Navigate to the extract folder\support\msvc100, open the solution and build with your required architecture and config

Building

Now for building the libs. You can choose to build the viewer and main source separately if you really want and just force the linker to link to both (as we are going to have to do this anyway) but it’s nicer not to. For JUST openVDB it follows the same steps as posted here https://groups.google.com/forum/#!topic/openvdb-forum/ZTEbIAk9kdc so I’ll only touch on this briefly.

- 1a) Open a new instance of VS2010/12 and select a new project. Select a Win32 Console App (as it becomes easier to tweak), deselect ‘Create directory for solution’ and give it a name. Click Ok, then Next.

- 1b) For the settings, deselect precompiled header and select Static Library. Finish

- 2a) You’ll be wanting to add the files from io, math, metadata, tools, tree, utils and the root openvdb folder sources and headers

- 2b) add #define NOMINMAX in Coord.h and add #define ZLIB_WINAPI in Compression.cc

- 2c) If you have only compiled the required Half class from IlmBase, change the include path in Types.h from <OpenEXR/half.h> to <half.h> and include location of the file in step 3b.

- 3a) Go to the properties of the project and navigate to C/C++ > Code Generation and select MTd if building in Debug or MT for release.

- 3b) Navigate to C/C++ in the properties window. In the Additional Include Directories add paths to your downloaded prerequisite include locations; boost, zlib.h, half.h, TBB, openVDB

- 4) Build the solution with your required architecture and config, this should create the static library for OpenVDB.

You can test your build with the code examples here http://www.openvdb.org/documentation/codeExamples.html.

Now for the viewer. I spent a while trying to get it compiled on both 32 and 64 bit systems, but it seems all the type problems related to rendermodules have been fixed in the latest release (v2.1), making things much easier. Please post if there are any errors for either, I’ve tested a fair amount for the new version and everything seems to work nicely.

For a joint library follow all above steps for the main source up to 4 and add the viewer files into the project. If you want a separate library, open a new instance of VS2010/12, follow the steps above but only add the files in the viewer folder (skip above 2a and 4).

-1a) Remove all #include <GL/gl.h> and <GL/glu.h> found in ClipBox.h, Font.h, RenderModules.h, Camera.cc and Viewer.cc and replace with #include <GL/glew.h>

- 1b) There are two places glew needs to be initialized (due to the threading?), one is when the viewer gets initialized in Viewer.cc, the other for when the shaders are first compiled in Rendermodules setshader() although I just opted to add it to the constructor of the shader program. Add glewInit(); to line 180 in RenderModules.cc and to line 165 in Viewer.cc when init is called.

Viewer
init(const std::string&amp; progName, bool verbose)
{
   glewInit();
   …
}

 

ShaderProgram::ShaderProgram():
   mProgram(0),
   mVertShader(0),
   mFragShader(0)
{
   glewInit();
}

- 1c) In the Additional Include Directories add paths to your downloaded prerequisite include locations;

boost, zlib.h, half.h, TBB, OpenVDB, as well as glew and GLFW.

-1d) Build for your requried arch. If you get an error about obj file size (C1128), try adding /bigobj to the cmd line options under C/C++

Testing

- 1a) Open a new instance of VS2010/12 and new project. Select a Win32 Console App. For the settings, deselect precompiled header and select Empty Project. Keep it selected on a console application.

- 1b) add the main.cc file for initializing the viewer found in openvdb\cmd\openvdb_view.

- 1c) C/C++ compiler options: Add include paths to a minimum of TBB, Half, openVDB and boost. Change code generation to /MTd for debug, /MT for release

- 1d) Linker options: Add the following libraries locations under Linker > General > Additional Library Directories. Also add the libraries themselves under Linker > Input > Additional Dependencies in this order (this in an example of linking to the debug libs):

glew32d.lib
glu32.lib
opengl32.lib
GLFW.lib
zlibstat.lib
Half.lib
At the end include the viewer library and, if you compiled separately, then the main lib.

-1e) Add /FORCE:MULTIPLE to the command line linker options under Linker > Command Line.

-1f) Build the solution with your required architecture and config creating the viewer executable.

-1g) If they aren’t in your environment path, you’ll want to copy the glew, TBB and half shared dlls to the executable directory found in their respective bin directories.

- 1h) Download one of the example .vdb files from the openVDB site (one of the smaller ones to test) and add the path location to Properties > Debugging > Command Arguments of the solution. You should be able to run and use the viewer! For 32bit compilers you will get an assertion failure for a subscript range but you can ignore this.

Tagged with: , , , , , , ,
Posted in Compiling OpenVDB & the OpenVDB Viewer on Windows 7, OpenVDB
whoami

I'm a 3rd year University student studying Graphics Software Development for animation, games and effects in the UK. This blog is a combination of bits and pieces I've found useful, interesting or just generally cool whilst programming (and wanted to share!) as well as covering the main projects I've undertaken. As I've only just started this blog, I'm currently trying to post and update projects I've already begun on here whilst working over the next few months. But in the meantime feel free to leave a comment on any of my posts, or if you have any questions send me an e-mail - Nick

nick.avramoussis@gmail.com

Follow

Get every new post delivered to your Inbox.