- We have released Weld v0.4.0. See the release notes here.
- We’re trying out Spectrum as a way to provide a forum for users and contributors of Weld!
- We have released Weld v0.3.1. See the release notes here.
- We have released Weld v0.3.0, with a completely redesigned LLVM backend, several performance improvements, new operators, and a better memory management API. We will also begin track releases on the Releases Notes page, so read more about what has changed there.
- We presented a paper on how Weld optimizes data science workloads at VLDB 2019 in Rio de Janeiro, Brazil!
- We have released Weld v0.2.0, Grizzly v0.0.5, and weldnumpy, a Weld-enabled version of NumPy! This release of Weld contains exciting
new features such as data serialization, richer builder support, and improved runtime performance.
- The Weld paper is now on Arxiv!
- We will be at Strata Data Conference NY Sept 26-28! Come see our talk at 4:30PM EDT on 09/27!
- Slides from our talk at Strata + Hadoop World are now up!
- We presented a short position paper describing Weld at CIDR 2017!
Installation and Tutorials
Weld is a runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a small common intermediate representation, similar to CUDA and OpenCL.
Modern analytics applications combine multiple functions from different libraries and frameworks to build complex workflows. Even though individual functions can achieve high performance in isolation, the performance of the combined workflow is often an order of magnitude below hardware limits due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, optimizing and evaluating it only when a result is needed.
Weld can increase the performance of existing data analytics frameworks with little integration effort. For example, for Spark, NumPy, and TensorFlow, porting over a few Weld operators can increase performance by up to 30x even on some simple workloads!
Prototype integrations of Weld with Spark (top left), NumPy (top right), and TensorFlow (bottom left) show up to 30x improvements over the native framework implementations, with no changes to users' application code. Cross library optimizations between Pandas and NumPy (bottom right) can improve performance by up to two orders of magnitude.
Weld is developed in the Stanford Infolab.
Grizzly: Pandas on Weld
Grizzly is a subset of the Pandas data analytics framework integrated with Weld. Read more about it here.
WeldNumpy: NumPy on Weld
WeldNumpy is a subset of the NumPy numerical computing framework integrated with Weld. Read more about it here.
Weld is a work in progress! For support, join our Spectrum channel, subscribe to the Google Group. You can contact the developers at [email protected].