I am a post-doctoral scholar in the Department of Statistics at the University of California, Berkeley, and the Big Data group at ICSI led by Michael Mahoney. I studied Mathematics and Mechanical Engineering and I was fortunate to work with Shawn Shadden as my advisor.

My research includes

  • Riemannian and Symplectic geometry and their applications to mechanics of continua
  • Dynamical systems, Lie group integrators, and applications to fluid mechanics
  • Uncertainty quantification and stochastic modeling
  • Linear algebra, including theoretical and computational developments with applications to machine learning

My research in Riemannian geometry and dynamical systems studies the nonlinear deformation of continua (such as fluids) in differential geometric settings (see my dissertation), which has applications in the Lagrangian coherent structures of flows on manifolds (see Gallery and Videos).

I also have been interested to work on the spectral representation of the solenoidal vector fields from a functional analysis point of view (see my thesis and Gallery), which has applications in physics-based supervised learning and reconstruction of incompressible flows. My current research focuses on mathematical aspects of machine learning (Gaussian process, in particular), linear algebra for massive data, and their high-performance implementation.

Along with research, I made several industry-level software packages on a variety of platforms. I am the author of several python packages that can be found on PyPI and Anaconda Cloud (read more at Software), Node.JS apps on npm-js, and Docker images on Docker Hub. To bring the applications of my dissertation to the community, I developed high-performance computing web servers that I host at Berkeley Data Center. These online computational gateways are actively used by several research institutions worldwide (read more at Web Servers).

Education

Publications

Invited Talks

Conferences






Visit imate Homepage.

imate, short for Implicit Matrix Trace Estimator, is a modular and high-performance C++/CUDA library distributed as a Python package that provides scalable randomized algorithms for the computationally expensive matrix functions in machine learning.

imate is scalable to very large matrice and is capable of peta-scale computations on GPU farms. Its core library for basic linear algebraic operations is faster than OpenBLAS, and its pseudo-random generator is a hundred-fold faster than the implementation in the standard C++ library.

The core of imate, which is implemented in C++ and NVIDIA CUDA framework, is a standalone modular library for high-performance low-level algebraic operations on linear operators (including matrices and affine matrix functions). This library provides a unified interface for computations on both CPU and GPU, a unified interface for dense and sparse matrices, a unified container for various data types, and fully automatic memory management and data transfer between CPU and GPU devices on demand.

imate is distributed on PyPI, Anaconda Cloud, and Docker Hub for multiple platforms and for Python and PyPy versions.


Visit G-learn Homepage.

G-learn is a modular and high-performance Python package for machine learning using Gaussian process regression.

This package uses imate as its computational engine, hence, it is scalable to very large matrices and is capable of petascale computation on multi-GPU clusters.

Some of the features of this package include supporting mixed covariance models, automatic relevance determination (ARD), Jacobian and Hessian-based optimization, learning hyperparameters in reduced space (profile likelihood) as described in this paper, and successive evaluations of prediction with \(\mathcal{O}(n)\) complexity.

G-learn is distributed on PyPI, Anaconda Cloud, and Docker Hub for multiple platforms and Python versions.


Visit Trace Homepage.

Trace (Trajectory Reconstruction and Analysis for Coherent Structure Evaluation) is a high-performance online gateway for Lagrangian analysis of oceanographic data.

Trace has a server-client architecture; On the front end, it interacts with users' requests to receive data inputs, configure computation settings, and visualize the results on the fly on an interactive globe map. On the backend, it connects to remote or local data servers and performs parallel computation on massive data on an HPC cluster.

Trace employs several novel algorithms, for instance, the Lagrangian analysis of flow on manifolds (here curved Earth's surface) by a Riemannian geometric framework and Lie group integrators (see my Dissertation).

This online gateway together with Restore (a companion tool) is used by several institutions for both research and field experiments in real-time service.

A few videos of the usage of Trace and visualization of computed data can be found below.


Visit Restore Homepage.

Restore is an online computational gateway to reconstruct incomplete oceanographic datasets.

Restore is developed to be a companion tool to Trace as it can reconstruct the ocean's surface velocity data from incomplete radar or satellite measurements. The processed velocity data are then used as input to Trace to perform further Lagrangian analysis.

Similar to Trace, this tool has a server-client architecture with an interactive user interface at the front end, while at the backend, it connects to remote or local data servers to stream input data and performs parallel computation on an HPC cluster. It then visualizes the results on the user's browser on the globe's map.

The data restoration algorithm employs a feature-preserving image processing technique originating from fluid dynamics (transport partial differential equation) where the missing data are filled by advecting information on regions where the data field is known. More on the algorithm can be found in this paper.

A Demo Video of Restore briefly demonstrates its use.

The above video shows the finite-time Lyapunov exponent (FTLE) field over the entire globe, computed by TRACE on a massive grid of 576 million points (almost ten grid points per square mile). During the animation, the FTLE is superimposed over the sea surface temperature field (data from JPL/NASA) as an underlying advection-dominant field to be compared with the Lagrangian coherent structures on ocean currents.

The above video is a demonstration of TRACE.

The above video is a demonstration of RESTORE.