Tutorial: 13:00–14:30 (English)

Reproducible data science with Clojure

Despite sophisticated tooling, data scientists still battle fundamental challenges day to day, like reproducibility, maintainability, and sharing their work. While traditional notebooks offer interactivity and quick feedback, they’re plagued with hidden state dependencies, version control complexity, and production deployment hurdles. Converting notebook-based analyses into production-ready code often requires extensive refactoring, untangling implicit dependencies, debugging hidden state issues, and deciphering sparse documentation. And that’s before tackling today’s reality of excessively large, unstructured data dumps typically lacking any metadata or explanation, making it difficult to find useful data in the first place.

Clojure’s data science ecosystem has been maturing rapidly in recent years. With it’s stable toolkit, immutable data structures, and functional paradigm, Clojure offers a compelling alternative to traditional data science workflows. Imagine knowing exactly which version of your code produced which dataset. Or seamlessly deploying the same code you used in an exploratory analysis to production. And imagine that code also ran in a state of the art literate programming environment, but also simultaneously in your own, familiar IDE.

This hands-on workshop will introduce a new way of thinking about working with data, demonstrating how Clojure’s libraries and tooling solve many pain points in current data science workflows.

Kira Howe

‘@kirahowe.com

Kira has been writing software since 2015, focusing on Clojure for the last 6 years. With a desire to pave the way for Clojure’s broader recognition and adoption in the data science community, she’s actively developing tools and guides aimed at showcasing the strengths of Clojure’s data science toolkit. Her efforts are driven by a vision to broaden Clojure’s adoption in the data world by improving the usability and effectiveness its core libraries. An advocate for Clojure’s potential in the world of data science, she spent most of 2024 working exclusively on open source contributions to support and grow the Clojure data science ecosystem, supported by Clojurists Together and other generous sponsors.