trinity

A library for anisotropic mesh adaptation on manycore machines.


Project maintained by hobywan Hosted on GitHub Pages — Theme by mattgraham

principle

trinity is a C++ library and command-line tool for anisotropic mesh adaptation.
It is targetted to non-uniform memory access multicore and manycore processors.
It was primarly designed for performance and hence for HPC applications.
It is intended to be involved within a numerical simulation loop.

adaptive-loop

Build Status Codacy Badge license

Table of contents
Share



Build and use

Building the library

Build Status

trinity is completely standalone.
It can be built on Linux or macOS using CMake.
It only requires a C++14 compiler endowed with OpenMP.
It can build medit to render meshes but it is optional though.
It supports hwloc to retrieve and print more information on the host machine.

mkdir build                                        # out-of-source build
cd build                                           #
cmake ..                                           # see build options
make -j4                                           # use multiple jobs
make install                                       # optional
Option Description Default
Build_Medit Build medit mesh renderer ON
Build_GTest Build googletest for future unit tests OFF
Build_MainBuild the command-line toolON
Build_Examples Build built-in examples ON
Use_Deferred Use deferred topology updates scheme in pragmatic OFF
Linking to your project

Codacy Badge

trinity is exported as a package.
To use it in your project, update your CMakeLists.txt with:

find_package(trinity)                           # for build|install trees
target_link_libraries(target PRIVATE trinity)   # replace 'target'

And then include trinity.h in your application.
Please take a look at the examples folder for basic usage.

Use the tool

The list of command arguments is given by the -h option.

host:~$ bin/trinity -h
Usage: trinity [options]

Options:
  -h, --help            show this help message and exit
  -m CHOICE             select mode [release|benchmark|debug]
  -a CHOICE             cpu architecture [skl|knl|kbl]
  -i STRING             initial mesh file
  -o STRING             result mesh file
  -s STRING             solution field .bb file
  -c INT                number of threads
  -b INT                vertex bucket capacity [64-256]
  -t FLOAT              target resolution factor [0.5-1.0]
  -p INT                metric field L^p norm [0-4]
  -r INT                remeshing rounds [1-5]
  -d INT                max refinement/smoothing depth [1-3]
  -v INT                verbosity level [0-2]
  -P CHOICE             enable papi [cache|cycles|tlb|branch]

For now, only .mesh files used in medit are supported.

Setting thread-core affinity

For performance reasons, I recommend to explicitly set thread-core affinity before any run.
Indeed, threads should be statically bound to cores to prevent the OS from migrating them.
Besides, simultaneous multithreading (or hyperthreading on Intel) should be:

It can be done by setting some environment variables:

export OMP_PLACES=[cores|threads] OMP_PROC_BIND=close  # with GNU or clang/LLVM
export KMP_AFFINITY=granularity=[core|fine],compact    # with Intel compiler  

principle

Features

Overview

trinity aims to reduce and equidistribute the interpolation error of a computed physical field u on a triangulated
planar domain M by adapting its discretization with respect to a target number of points n.
Basically, it takes (u, M, n) and outputs a mesh adapted to the variation of the gradient of u on M using n points.
It uses metric tensors to encode the desired point distribution with respect to the estimated error.

principle

It enables to resample and regularize a planar triangular mesh M.
It aims to reduce and equidistribute the error of a solution field u on M using n points.
For that, it uses five kernels:

Error estimate

trinity uses metric tensors to link the error of u with mesh points distribution.
A tensor encodes the desired edge length incident to a point, which may be direction-dependent.
trinity enables to tune the sensitivity of the error estimate according to the simulation needs.
For that, it provides a multi-scale estimate in L^p norm:

It actually implements the continuous metric defined in:

📄 Fréderic Alauzet, Adrien Loseille, Alain Dervieux and Pascal Frey (2006).
“Multi-Dimensional Continuous Metric for Mesh Adaptation”.
In proceedings of the 15th International Meshing Roundtable, pp 191-214, Springer Berlin.

To obtain a good mesh, it needs an accurate metric tensor field.
The latter rely on the computation of the variations of the gradient of u.
It is given by its local hessian matrices.
It is computed in trinity through a L^2 projection.

multiscale_meshes.png

Fine-grained parallelism

trinity enables intra-node parallelism by multithreading.
It relies on a fork-join model through OpenMP.
All kernels are structured into synchronous stages.
A stage consists of local computation, a reduction in shared-memory, and a barrier.

algo_structure

It does not rely on domain partitioning unlike coarse-grained parallel remeshers.
It does not rely on task parallelism and runtime capabilities such as Cilk, TBB or StarPU neither.

In fact manycore machines have plenty of slow cores with small caches.
To scale up, one needs plenty of very thin and local tasks to keep them busy.
In trinity, remesh kernels are expressed into a graph, except for refinement.
Runnable tasks are then extracted using multithreaded heuristics:

graph_matching_inverted.png

trinity fixes incidence data only at the end of a round of any kernel.
It uses an explicit synchronization scheme to fix them.
It relies on the use of low-level atomic primitives.
It was designed to minimize data movement penalties, especially on NUMA cases.

For further details, please take a look at:

📄 Hoby Rakotoarivelo, Franck Ledoux, Franck Pommereau and Nicolas Le-Goff (2017).
“Scalable fine-grained metric-based remeshing algorithm for manycore/NUMA architectures”.
In In proceedings of 23rd International European Conference on Parallel and Distributed Computing, Springer.


Benchmark

Profiling

trinity is natively instrumented.
It prints the runtime stats with three verbosity level.
Here is an output example with the medium level.

screenshot

Stats are exported as tab-separated values and can be easily plotted with gnuplot or matplotlib.
You can use wrappi to profile oncore events such as cycles, caches misses, branch predictions.

Deployment on a cluster

Preparing a benchmark campaign can be tedious 😩.
I included some python scripts to help setting it up on a node, enabling to:

They are somewhat outdated, so adapt them to your needs.


principle

license

trinity is free and intended for research purposes.
It was written during my doctorate, so improvements are welcome.
To get involved, you can:

Enjoy! 😊

Share