We take machine learning models from experiments in a Jupyter notebook to production in the cloud. We follow a structured MLOps process for the entire modeling lifecycle, including Dockerized ML development, parallelized backtesting, ML API patterns, model explainability, model performance monitoring, and infrastructure as code modules for rapid deployment.
We develop models to solve hard problems—from unsupervised anomaly detection in multi-variate time series to dynamic system identification using deep learning. We take a heterodox approach to data science: we start from first principles about the mathematical formulation of the problem and then experiment with the relevant methods—from modern hierarchical Bayesian methods, to gaussian processes, to deep learning, and sometimes to more classical techniques like Kalman filters. As with all good science, we start simple, experiment a lot, and iterate our way to the best possible solution.
We engineer modern data infrastructure, including purpose-built enterprise data platforms, complex data pipelines, batch and streaming data, sensitive data handling, serverless technologies, and more. In data science and machine learning, more and better data always beats better algorithms.
We develop sophisticated computer vision models to solve problems involving techniques such as object recognition, segmentation, real-time gesture recognition, and 3D computer vision using LIDAR and imagery—and in some cases requiring novel methods for data augmentation.
We develop NLP pipelines to do entity resolution among multiple large datasets, create structured data from unstructured text, and optimize search/information retrieval using modern embedding techniques.
We develop structured causal models and observational data to answer causal questions—about both average treatment effects and heterogenous treatment effects. Though causal inference is still in its infancy, it has already been called one of the biggest advances in statistics in the last 50 years.