Code

I try to make useful code available on Github where possible:

Starcoder is the primary code base for my current research on unsupervised machine learning models for traditional scholarship in the humanities.

Scheduler is a Haskell tool that uses Microsoft’s Z3 solver via the SBV library to generate interview schedules according to complex constraints and Google Sheets.

Graphmod is a C++ and Python toolkit I wrote as part of my dissertation work applying graphical models to the task of unsupervised learning for verb syntax.

Steamroller is a Python library that uses the SCons build system to run large numbers of classification experiments.

Concrete Haskell, and the associated auto-generated API, provide interfaces to the Concrete ecosystem.

Vivisect is a tool for monitoring how useful various parts of a deep neural model are for other tasks, like classification and clustering, as they are trained. Similar in some ways to Tensorboard, the motivating use is to track the sort of linguistic information encoded by different neural layers. It is now using Celery as the backend.

NGram is (or aims to be) a fast, unified library of sequence-modeling algorithms based on counting finite-length context windows. In natural language processing this includes traditional language models and character-based language identification, while in data compression it includes prediction by partial matching, numeric encoding, and the like.

HaskSeg