Free Downloads

MERL software freely available for non-commercial use.

MERL is making some software available to the research community. Simply click the 'Download Now' button below to gain access to the software.

  • QNTRPO — Quasi-Newton Trust Region Policy Optimization

    We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent has become the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithms has achieved state-of-the-art performance on wide variety of tasks and resulted in several improvements in performance of reinforcement learning algorithms across a wide range of systems. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, slow convergence, and dependence on problem scaling. We investigate the use of a dogleg method with a Quasi-Newton approximation for the Hessian to perform trust region method for policy optimization. We show that our particular choice addresses the listed drawbacks without sacrificing computational efficiency.

    We provide for an algorithm which we call Quasi-Newton Trust Region Policy Optimization, which uses a dogleg method for computing the step (i.e., the size and direction) during policy optimization. This code has been tested on several difficult continuous control environments in Mujoco and achieves better learning rate than the TRPO algorithm. The code is compatible with openai-gym and thus can be used with any environment compatible with gym. The approach proposed in the paper appeared in a paper titled "Quasi-Newton Trust Region Policy Optimization" at the Conference on Robot Learning (CoRL), 2019 in Osaka, Japan.

  • RIDE — Robust Iterative Data Estimation

    Recent studies have demonstrated that as classifiers, deep neural networks (e.g., CNNs) are quite vulnerable to adversarial attacks that only add quasi-imperceptible perturbations to the input data but completely change the predictions of the classifiers. To defend classifiers against such adversarial attacks, here we focus on the white-box adversarial defense where the attackers are granted full access to not only the classifiers but also defenders to produce as strong attack as possible. We argue that a successful white-box defender should prevent the attacker from not only direct gradient calculation but also a gradient approximation. Therefore we propose viewing the defense from the perspective of a functional, a high-order function that takes other functions as input and return a new function as the defender. Such a design makes the defender a hidden function, whose gradients are hard to be estimated without knowing the prior. To this end, we propose a novel Robust Iterative Data Estimation (RIDE) algorithm that works as a defender by estimating the true underlying data using each individual adversarial observation. Specifically, the RIDE algorithm takes a randomly initialized neural network as input and returns a parameterized defense model through self-supervised optimization. To the best of our knowledge, we are the first to propose novel self-supervised data estimation for white-box adversarial defense by viewing defenders as functionals.

    This code implements our RIDE algorithm for adversarial defense. As demonstration we show some qualitative results of the defense against 10-iteration white-box attack (PGD attack with BPDA) on MNIST dataset using (a) median filtering, (b) total-variance minimization and (c) the proposed RIDE algorithm. This code is for our arxiv submission “White-Box Adversarial Defense via Self-Supervised Data Estimation”.

  • DSP — Discriminative Subspace Pooling

    Human action recognition from video sequences is one of the fundamental problems in computer vision. In this research, we investigate and propose representation learning approaches towards solving this problem, which we call discriminative subspace pooling. Specifically, we combine recent deep learning approaches with techniques for generating adversarial perturbations into learning novel representations that can summarize long video sequences into compact descriptors – these descriptors capture essential properties of the input videos that are sufficient to achieve good recognition rates. We make two contributions. First, we propose a subspace-based discriminative classifier, similar to a non-linear SVM, but having piecewise-linear decision boundaries, where these boundaries are along orthogonal directions (as a subspace). Computing such decision boundaries need not require kernel space embeddings, but could be achieved using Riemannian optimization techniques. However, for classification, we need a negative set to be classified against. To this end, our second contribution is to apply universal adversarial perturbations on deep features computed from the input videos to generate the negative set. These perturbatoins are such that they are highly likely to result in the mis-classification of the deep features (on its originally trained classifier). Our learned subspace thus picks up those dimensions in the data that are vulnerable to mis-classification, implicitly capturing deep features that are action related.

    This software implements the Discriminative Subspace Pooling (DSP) for video-based action recognition. The software has two modules: (i) that implements generation of adversarial perturbations using a fully-connected neural network, and (ii) computing the DSP descriptors using these perturbations. We also provide sample feature data (from a subset of the popular HMDB51 dataset) to demonstrate the working of our scheme. The approach presented in this code was published in the 2018 European Conference on Computer Vision (ECCV) in a paper titled "Discriminative Subspace Pooling Using Adversarial Perturbvations".

  • GNI — Gradient-based Nikaido-Isoda

    Computing Nash equilibrium (NE) of multiplayer games has witnessed renewed interest due to recent advances in generative adversarial networks (GAN). However, computing equilibrium efficiently is challenging. To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves as a merit function, vanishing only at the first-order stationary points of each player’s optimization problem. Gradient descent is shown to converge sublinearly to a first-order stationary point of the GNI function. For the particular case of bilinear min-max games and multi-player quadratic games, the GNI function is convex. Hence, the application of gradient descent in this case yields linear convergence to an NE (when one exists).

    This code takes as input the players' payoff functions and reformulates the game objective using the GNI formulation, which is then solved via gradient descent. We provide code to simulate four standard games: (i) bilinear min-max games, (ii) convex quadratic programs, (iii) non-convex quadratic programs, and (iv) strictly-convex quadratic programs. We also provide code to simulate a linear generative adversarial network and solve it using the GNI reformulation. Our software can work with any number of players. The approach presented in this code was published in the 2019 International Conference on Machine Learning (ICML) in a paper titled "Game Theoretic Optimization via Gradient-based Nikaido Isoda function

  • SSTL — Semi-Supervised Transfer Learning

    Successful state-of-the-art machine learning techniques rely on the existence of large well sampled and labeled datasets. Today it is easy to obtain a finely sampled dataset because of the decreasing cost of connected low-energy devices. However, it is often difficult to obtain a large number of labels. The reason for this is two-fold. First, labels are often provided by people whose attention span is limited. Second, even if a person was able to label perpetually, this person would need to be shown data in a large variety of conditions. One approach to addressing these problems is to combine labeled data collected in different sessions through transfer learning. Still even this approach suffers from dataset limitations.

    This code allows the use of unlabeled data to improve transfer learning in the case where: the training and testing datasets are drawn from similar probability distributions; and the unlabeled data in each dataset can be described by similar underlying manifolds. The code implements a distribution free, kernel and graph Laplacian-based approach which optimizes empirical risk in the appropriate reproducing kernel Hilbert space. The approach presented in this code was published in the 2018 IEEE Data Science workshop in a paper titled "Semi-Supervised Transfer Learning Using Marginal Predictors".

  • 1bCRB — One-Bit CRB

    Massive multiple-input multiple-output (MIMO) systems can significantly increase the spectral efficiency, mitigate propagation loss by exploiting large array gain, and reduce inter-user interference with high-resolution spatial beamforming. To reduce complexity and power consumption, several transceiver architectures have been proposed for mmWave massive MIMO systems: 1) an analog architecture, 2) a hybrid analog/digital architecture, and 3) a fully digital architecture with low-resolution ADCs.

    To this end, we derive the Cramer-Rao bound (CRB) on estimating angular-domain channel parameters including angles-of-departure (AoDs), angles-of-arrival (AoAs), and associated channel path gains. Our analysis provides a simple tool to compare channel estimation performance among different one-bit quantization schemes. We also introduce a time-varying threshold scheme to one-bit ADCs to remove an ambiguity between the channel path gain and noise variance for the popular fixed zero-threshold scheme.

  • FoldingNet — FoldingNet

    Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid.

  • KCNet — KCNet

    Unlike on images, semantic learning on 3D point clouds using a deep network is challenging due to the naturally unordered data structure. Among existing works, PointNet has achieved promising results by directly learning on point sets. However, it does not take full advantage of a point's local neighborhood that contains fine-grained structural information which turns out to be helpful towards better semantic learning. In this regard, we present two new operations to improve PointNet with a more efficient exploitation of local structures. The first one focuses on local 3D geometric structures. In analogy to a convolution kernel for images, we define a point-set kernel as a set of learnable 3D points that jointly respond to a set of neighboring data points according to their geometric affinities measured by kernel correlation, adapted from a similar technique for point cloud registration. The second one exploits local high-dimensional feature structures by recursive feature aggregation on a nearest-neighbor-graph computed from 3D positions. Experiments show that our network can efficiently capture local information and robustly achieve better performances on major datasets.

  • FRPC — Fast Resampling on Point Clouds via Graphs

    We propose a randomized resampling strategy to reduce the cost of storing, processing and visualizing a large-scale point cloud, that selects a representative subset of points while preserving application-dependent features. The strategy is based on graphs, which can represent underlying surfaces and lend themselves well to efficient computation. We use a general feature-extraction operator to represent application-dependent features and propose a general reconstruction error to evaluate the quality of resampling; by minimizing the error, we obtain a general form of optimal resampling distribution. The proposed resampling distribution is guaranteed to be shift-, rotation- and scale-invariant in the 3D space.

  • PCQM — Point cloud quality metric software

    It is challenging to measure the geometry distortion of point cloud introduced by point cloud compression. Conventionally, the errors between point clouds are measured in terms of point-to-point or point-to-surface distances, that either ignores the surface structures or heavily tends to rely on specific surface reconstructions. To overcome these drawbacks, we propose using point-to-plane distances as a measure of geometric distortions on point cloud compression. The intrinsic resolution of the point clouds is proposed as a normalizer to convert the mean square errors to PSNR numbers. In addition, the perceived local planes are investigated at different scales of the point cloud. Finally, the proposed metric is independent of the size of the point cloud and rather reveals the geometric fidelity of the point cloud. From experiments, we demonstrate that our method could better track the perceived quality than the point-to-point approach while requires limited computations.

  • ROSETA — Robust Online Subspace Estimation and Tracking Algorithm

    This script implements a revised version of the robust online subspace estimation and tracking algorithm (ROSETA) that is capable of identifying and tracking a time-varying low dimensional subspace from incomplete measurements and in the presence of sparse outliers. The algorithm minimizes a robust l1 norm cost function between the observed measurements and their projection onto the estimated subspace. The projection coefficients and sparse outliers are computed using a LASSO solver and the subspace estimate is updated using a proximal point iteration with adaptive parameter selection.

  • CASENet — Deep Category-Aware Semantic Edge Detection

    Boundary and edge cues are highly beneficial in improving a wide variety of vision tasks such as semantic segmentation, object recognition, stereo, and object proposal generation. Recently, the problem of edge detection has been revisited and significant progress has been made with deep learning. While classical edge detection is a challenging binary problem in itself, the category-aware semantic edge detection by nature is an even more challenging multi-label problem. We model the problem such that each edge pixel can be associated with more than one class as they appear in contours or junctions belonging to two or more semantic classes. To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features. We then propose a multi-label loss function to supervise the fused activations. We show that our proposed architecture benefits this problem with better performance, and we outperform the current state-of-the-art semantic edge detection methods by a large margin on standard datasets such as SBD and Cityscapes.

  • NDS — Non-negative dynamical system model

    Non-negative data arise in a variety of important signal processing domains, such as power spectra of signals, pixels in images, and count data. We introduce a novel non-negative dynamical system model for sequences of such data. The model we propose is called non-negative dynamical system (NDS), and bridges two active fields, dynamical systems and nonnegative matrix factorization (NMF). Its formulation follows that of linear dynamical systems, but the observation and the latent variables are assumed non-negative, the linear transforms are assumed to involve non-negative coefficients, and the additive random innovations both for the observation and the latent variables are replaced by multiplicative random innovations. The software includes code for training and testing, as well as a simple framework for applying this model to the task of speech enhancement.

  • JGU — Joint Geodesic Upsampling

    We develop an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image. Specifically, it computes depth for each pixel in the high resolution image using geodesic paths to the pixels whose depths are known from the low resolution one. Though this is closely related to the all-pairshortest-path problem which has O(n2 log n) complexity, we develop a novel approximation algorithm whose complexity grows linearly with the image size and achieve real-time performance. We compare our algorithm with the state of the art on the benchmark dataset and show that our approach provides more accurate depth upsampling with fewer artifacts. In addition, we show that the proposed algorithm is well suited for upsampling depth images using binary edge maps, an important sensor fusion application.

  • EBAD — Exemplar-Based Anomaly Detection

    Anomaly detection in real-valued time series has important applications in many diverse areas. We have developed a general algorithm for detecting anomalies in real-valued time series that is computationally very efficient. Our algorithm is exemplar-based which means a set of exemplars are first learned from a normal time series (i.e. not containing any anomalies) which effectively summarizes all normal windows in the training time series. Anomalous windows of a testing time series can then be efficiently detected using the exemplar-based model.

    The provided code implements our hierarchical exemplar learning algorithm, our exemplar-based anomaly detection algorithm, and a baseline brute-force Euclidean distance anomaly detection algorithm. Two simple time series are also provided to test the code.

  • PEAC — Plane Extraction using Agglomerative Clustering

    Real-time plane extraction in 3D point clouds is crucial to many robotics applications. We present a novel algorithm for reliably detecting multiple planes in real time in organized point clouds obtained from devices such as Kinect sensors. By uniformly dividing such a point cloud into non-overlapping groups of points in the image space, we first construct a graph whose node and edge represent a group of points and their neighborhood respectively. We then perform an agglomerative hierarchical clustering on this graph to systematically merge nodes belonging to the same plane until the plane fitting mean squared error exceeds a threshold. Finally we refine the extracted planes using pixel-wise region growing. Our experiments demonstrate that the proposed algorithm can reliably detect all major planes in the scene at a frame rate of more than 35Hz for 640x480 point clouds, which to the best of our knowledge is much faster than state-of-the-art algorithms.

  • PQP — Parallel Quadratic Programming

    An iterative multiplicative algorithm is proposed for the fast solution of quadratic programming (QP) problems that arise in the real-time implementation of Model Predictive Control (MPC). The proposed algorithm—Parallel Quadratic Programming (PQP)—is amenable to fine-grained parallelization. Conditions on the convergence of the PQP algorithm are given and proved. Due to its extreme simplicity, even serial implementations offer considerable speed advantages. To demonstrate, PQP is applied to several simulation examples, including a stand-alone QP problem and two MPC examples. When implemented in MATLAB using single-thread computations, numerical simulations of PQP demonstrate a 5 - 10x speed-up compared to the MATLAB active-set based QP solver quadprog. A parallel implementation would offer a further speed-up, linear in the number of parallel processors.