Changelog ========= v0.4.3 ------ **Performance: torch.compile for all algorithms + cleanup** * Enabled ``torch.compile`` for CRBA and ABA on CUDA (previously skipped when Triton kernels were available). Compilation now optimizes surrounding PyTorch operations while Triton handles the bottleneck 6x6 double-matmul. Result: CRBA 2.1x faster, ABA 1.7x faster at B=1 with compile enabled. * Moved Triton kernel import to module level, eliminating repeated import overhead in hot loops. ``_use_triton_kernels`` now properly gates on ``HAS_TRITON`` instead of being hardcoded to ``True``. * Rewrote ``benchmarks/speed_benchmark.py`` for fair comparison: each bard algorithm timing now includes ``update_kinematics`` (end-to-end), matching ADAM's self-contained calls. Speedup tables use ADAM as the primary baseline. Pinocchio C++ is included as a serial CPU reference. * Added autograd documentation to Quick Start guide with examples for ``d(M)/d(q)`` through CRBA, ``d(qdd)/d(tau)`` through ABA. * Removed obsolete files: ``quick_bench.py``, ``basic_test.py``, ``bard_basic_test.py``, and root-level profiling/debug scripts. v0.4.2 ------ **Performance: Inline spatial cross products, eliminate per-node allocations** * Inline spatial cross products to eliminate per-node GPU allocations. * Revert JIT functions to ``torch.zeros`` pattern for ``torch.compile`` compatibility. * Eliminate hidden tensor allocations in core transform functions. v0.3 ---- **Unified Model/Data API** * Introduced ``Model`` and ``Data`` classes following the Pinocchio/MuJoCo pattern. All computations are now accessed through top-level free functions (``bard.forward_kinematics()``, ``bard.jacobian()``, ``bard.rnea()``, etc.) operating on a ``model`` + ``data`` pair. * Added ``bard.build_model_from_urdf()`` as the primary entry point for loading robots. Returns a ``Model`` object that holds the robot's topology, inertias, and joint parameters. * Added ``bard.create_data()`` for creating pre-allocated computation workspaces. One ``Model`` can be used with multiple ``Data`` instances. * ``bard.update_kinematics(model, data, q, qd)`` performs a single tree traversal, caching transforms, spatial adjoints, and velocities. All subsequent algorithm calls reuse the cached data with zero redundant computation. * **Breaking change:** Removed ``RobotDynamics``, ``KinematicsState``, ``ForwardKinematics``, ``SpatialAcceleration``, ``Jacobian``, ``RNEA``, and ``CRBA`` classes. The ``build_chain_from_urdf()`` function is no longer part of the public API. See the Quick Start guide for migration examples. v0.2 ---- * Benchmarks and performance documentation. * Fixed ``lxml`` dependency version. * Removed Python 3.14 support from CI. v0.1 ---- * Initial release with batched Forward Kinematics, Jacobian, RNEA, CRBA, and Spatial Acceleration. * URDF parsing support. * ``torch.compile`` compatibility. * Fixed-base and floating-base robot support.