Fast parallel code generation for data analytics frameworks. Developed at Stanford University.
Note: Most of the Weld developers have graduated or are currently busy with other projects. We are still happy to answer questions or review patches on Github, however!
Weld is a compiler and runtime for improving the performance of data-intensive applications. It enables powerful compiler optimizations and automatic parallelization across functions by expressing the core computations in libraries using a small common intermediate representation and a lazy runtime API.
Weld can improve the performance of workflows such as SQL with Spark SQL, logistic regression with TensorFlow, and data cleaning in NumPy and Pandas.
The easiest way to use Weld is through one of our library integrations. Grizzly is a Weld-enabled version of Pandas, and WeldNumPy is a Weld-enabled version of NumPy. Get them both on PyPi:
# Install Grizzly
pip install pygrizzly
# Install WeldNumPy
pip install weldnumpy
You can also build the Weld compiler and runtime from source for use with C, C++, Python, or Rust programs, or to use in your own projects.
Weld started as a research project at Stanford University, and continues to drive new research projects both at Stanford and elsewhere! Below are a few research papers and technical reports on Weld:
Some other research papers in this space by our group at Stanford:
We are working on other follow up projects related to Weld, and other groups are also using Weld to do exciting research in the data space! Find code and more details about these projects on our research page.
For support, join our Spectrum channel or subscribe to the Google Group. You can contact the developers at [email protected].