Ovld – Efficient and featureful multiple dispatch for Python

110 points by breuleux 4 days ago

linschn 2 days ago

One the one hand, this is pretty cool, the API is pythonic and makes quite a lot of sense.

On the other hand, I can't stop myself from thinking about "Greenspun's tenth rule":

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

This doesn't apply directly here, as the features are intentional and it seems they are not bug ridden at all. But I get a nagging feeling of wanting to shout 'just use lisp!' when reading this.

https://wiki.c2.com/?MultiMethods

upghost a day ago

It's actually more like "just use Haskell".
Having written one of these[1] a decade ago and inflicting it (with the best of intentions) upon production code in anger, I can tell you this often leads to completely unmaintainable code. It is impossible to predict the effect of changing a method, tracing a method, debugging a method (where do I put the breakpoint??).
The code reads beautifully though. Pray you never have to change it.
The reason I say "just use haskell" instead of lisp is bc lisp generics suffer from this same problem.
Btw if anyone has a solution to this "generic/multidispatch maintainability in a dynamically typed language" problem I would love to hear it.
[1]: https://github.com/jjtolton/naga/blob/master/naga/tools.py
- Fr0styMatt88 a day ago
  
  I’ve been doing the once-every-few-years deep dive into Smalltalk that I tend to do out of curiosity and this kind of question is one that always comes up for me. The answer there seems to be “the whole environment and philosophy is geared to work that way”; a dynamic language needs a lot of tooling support and interactivity.
  - igouy a day ago
    
    > this kind of question
    Which kind-of question? "where do I put the breakpoint??"
    
    Fr0styMatt88 a day ago
    
    No specifically the question of how do you write maintainable code with generic multi-dispatch and highly dynamic languages.
    I like Python, but I like static typing too because there’s just less to think about and when I have to learn a new codebase there’s a lot of assumptions I can lean on about how things work; this saves time.
    I like the idea of Smalltalk and when you watch Alan Kay or Dan Ingalls talk about it, they make total sense and you have Pharo and Squeak to back it up as in “yes, you can build large systems with this idea”.
    But I don’t think you could program Smalltalk and have it be maintainable without everything else the environment brings. Being inside the environment with your objects. The total different approach of sending a message an object doesn’t understand, then having the debugger pop up and then you just implementing that message right there. That’s just an utterly different workflow.
    I like the idea in ‘late binding of all things’ and I think the approach of writing a DSL for your problem and then having to write far less code to solve your problem is great. But the objection is then always “okay but what about when someone else has to work with that code”.
    I guess what I’m trying to say is, the more dynamic your language is, the more support you need from your tooling to ease the cognitive load while you program, simply because the state-space of things you can do is bigger and not being restricted by types, etc.
    
    igouy 5 hours ago
    
    Tangentially, you might find the abandoned Nice programming language interesting.
    https://gallium.inria.fr/~remy/poly/mot/10/nice/web/language...
    ----
    > … but what about when someone else has to work with that code.
    Someone else has had to work with that code since before Smalltalk escaped Xerox PARC.
    1984 "Smalltalk-80 The Interactive Programming Environment" page 500
    "At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager."
    https://rmod-files.lille.inria.fr/FreeBooks/TheInteractivePr...
    Later "ENVY/Developer":
    https://archive.esug.org/HistoricalDocuments/TheSmalltalkRep...
hasley 2 days ago

I am thinking more about Julia here - which I would use if Python was not that common in several communities.
- pansa2 2 days ago
  
  Is it common in Julia to use multiple-dispatch on 3 or more arguments, or just double-dispatch?
  Julia definitely made the right choice to implement operators in terms of double-dispatch - it’s straightforward to know what happens when you write `a + b`. Whereas in Python, the addition is turned into a complex set of rules to determine whether to call `a.__add__(b)` or `b.__radd__(a)` - and it can still get it wrong in some fairly simple cases, e.g. when `type(a)` and `type(b)` are sibling classes.
  I wonder whether Python would have been better off implementing double-dispatch natively (especially for operators) - could it get most of the elegance of Julia without the complexity of full multiple-dispatch?
  - ChrisRackauckas 2 days ago
    
    It's not uncommon to dispatch on 3 or more arguments. Linear algebra specializations are one case where I tend to do this a lot, for example specializing on structured matrix types (block banded matrices) against non-standard vectors (GPU arrays), you then need to specialize on the output vector to make it non-ambiguous in many cases.
  - StefanKarpinski 10 hours ago
    
    The paper "Julia: Dynamism and Performance Reconciled by Design" [1] (work largely by Jan Vitek's group at North Eastern, with collaboration from Julia co-creators, myself included), has a really interesting section on multiple dispatch, comparing how different languages with support for it make use of it in practice. The takeaway is that Julia has a much higher "dispatch ratio" and "degree of dispatch" than other systems—it really does lean into multiple dispatch harder than any other language. As to why this is the case: in Julia, multiple dispatch is not opt-in, it's always-on, and it has no runtime cost, so there's no reason not to use it. Anecdotally, once you get used to using multiple dispatch everywhere, when you go back to a language without it, it feels like programming in a straight jacket.
    Double dispatch feels like kind of a hack, tbh, but it is easier to implement and would certainly be an improvement over Python's awkward `__add__` and `__radd__` methods.
    [1] https://janvitek.org/pubs/oopsla18b.pdf
ethagnawl 2 days ago

I will forever think of this talk by Peter Siebel (author of Practical Common Lisp) whenever I hear about multiple dispatch: https://youtu.be/VeAdryYZ7ak?si=wY3RmcRnW96jxQQm
- cdrini 2 days ago
  
  That's a very long video you've linked to :P could you talk about why multiple dispatch reminds you of that talk?
  - ethagnawl a day ago
    
    Seibel spends a fair amount of time talking about how powerful and flexible CL's dynamic dispatch mechanism is and how/why it was still a novel feature at the time (~15 years ago).

ziihrs 2 days ago

I'm curious about what makes this implementation faster than the alternatives.

Also, if you care about function call performance, I guess you'd use PyPy. Have you tried to run the benchmarks (with appropriate warmup) on PyPy to see if the results carry over?

breuleux a day ago

Mainly codegen. Most other libraries do something like `return dispatch_dictionary[tuple(map(type, args))](*args)`. Whereas ovld generates a specialized function to the set of actual signatures of the overloads (still with a dictionary, but you remove a surprising amount of overhead just from removing varargs). The code is registered in the linecache, so you can step into it with pdb if you want to look at it. After that I became a bit obsessive about pushing it further and further (because it's fun). So... when dependent types are involved, it will actually generate custom dispatch code, e.g. if you define an @ovld with x:Literal[0], for example, it will generate an `if x == 0` in the int dispatch path (and you can define custom codegens for new types like Regexp).
Regarding PyPy, I did run some benchmarks recently (I use pytest-benchmark). The advantage over other libraries remains and the magnitude is similar, but when I compare with custom if/isinstance code, that code is optimized a lot more aggressively and it gains a significant edge (let's say ~5x). Now, since I'm into codegen, part of me feels like meeting the challenge and figuring out a way to generate optimal code from a set of signatures, but I think I've spent enough time as it is haha.
- ziihrs a day ago
  
  Very interesting, thanks for the explanation!

paddy_m 2 days ago

This looks really cool. I get multiple dispatch, multi methods, and CLOS method specialization confused in my head. I can understand what this does... But when do you use it.

Can people list some times when they actually used multimethods to solve a real problem? How did it work out? Would you do it again?

SatvikBeri a day ago

My company has used Julia for about 5 years now. 90% of the time in application code you only need single dispatch, same as OOP.
One case where I actually rely on multiple dispatch is conversion to more or less structured data. Inspired by Clojure, I have a function `conform(OutputDataType, input_data::SomeType)` and specific implementations depend on both types involved.
Multiple dispatch is also really helpful when two libraries don't quite do the same thing. In python pandas, numpy, and pytorch all have slightly different implementations of x.std() (standard deviation) with slightly different APIs. This means you can't write code that's generic across a numpy array or a pytorch tensor. In Python this could only easily be fixed if library authors coordinate, but with multiple dispatch you can just fix this in your own code.
- paddy_m a day ago
  
  That makes sense for writing your own generic libraries. Do you frequently have to convert from pytorch to numpy? I would think that a for a given execution unit (pipeline step, single program, model running on a server) that the data will stay as the same type after some initial conversion.
  - SatvikBeri a day ago
    
    I'd say we have about 60% library code and 40% application code internally, so making it easier to write libraries is really nice. E.g. you don't want to have to write different statistical calculations for every type you use.
    In particular, the fact that these types would silently give you different answers if you called `.std()` was a big headache
    It was very common for us to want to be generic over pandas series and numpy arrays. A bit less so with pytorch tensors, but that was because we just aggressively converted them to numpy arrays. Fundamentally these three are all very similar types so it's frustrating to have to treat them differently.
    
    paddy_m a day ago
    
    which implementation of `std()` did you go with?
    I was writing unit tests once against histograms. That code is super finnicky, and I couldn't get pandas and polars numbers to tie out. I wasn't super concerned with the exact output for my application, just that number of buckets was the same and they were roughly the same size. Just bumping to a new version of numpy would result in test breakages because of floating point rounding errors. I'm sure there are much more robust numerical testing things I could have done
    
    SatvikBeri a day ago
    
    Yeah this kind of stuff drove us crazy. Numpy uses things like SIMD everywhere with no option to turn it off, which is good for performance but makes life really hard when you want consistently reproducible results across different machines.
    Before switching to Julia we eventually standardized on numpy.std with ddof=1, and with some error tolerances in our tests.
breuleux a day ago

I use it quite a lot (having made it), most often when I need to recursively process complex heterogeneous data structures. Merging two data structures, treemap, processing a Python AST, etc. Also in any situation where you want to be able to register new behavior.
One case where I'm finding it extremely useful is that I'm currently using ovld to implement a serialization library (https://github.com/breuleux/serieux, but I haven't documented anything yet). The arguments for deserialization are (declared_type, value, context). With multimethods, I can define
* deserialize(type[int], str, Any) to deserialize a string into an int
* deserialize(type[str], Regexp[r"^\$"], Environment) to deserialize $XYZ as an environment variable lookup but only if the context has the Environment type
* deserialize(type[Any], Path, WorkingDirectory) to deserialize from a file
That's one situation where ovld's performance and codegen capabilities come in handy: the overhead is low enough to remain in the same performance ballpark as Pydantic v2.
perlgeek a day ago
A use case is when a sub or method needs to do different things based on the data type of the argument.
Example: turning a data structure into a JSON string, solved with multi dispatch: https://github.com/moritz/json/blob/master/lib/JSON/Tiny.pm#...
Another common useful example is constructors. A Date class might have constructors that accept
```
   Date(Int, Int where 1..12, Int where 1..31) # Y, M, D
   Date(Str where /^\d4-\d2-\d2$/) # YYYY-MM-DD
   Date(DateTime) # extract the date of a DateTime object
```
etc. to make it very intuitive to use.
nbadg a day ago

Also, dependent types, IE value-based dispatch. Which can be incredibly useful when dealing with enums. That alone is enough to make me curious to try it!
- paddy_m a day ago
  
  Value based dispatch is a much better name for it then Dependent Types. I have seen the term dependent types and just glossed over it because I thought it was a more complex topic.
  - breuleux a day ago
    
    Yeah, I suppose. I wrote "dependent" because I think that's the term of art of it, but you're making me think I should probably change the header to something more intuitive to people unfamiliar with type theory.
    
    paddy_m a day ago
    
    I was just commenting that I had glossed over that in multiple readings about different typing systems. The parent of my original comment explained it nicely.
    I know that dependent type is the term of art, and you should probably keep it. You could say something along the lines of
    "ovld supports dependent types (an additionally specific name for a type that is based on its value, ie > 0)" the first time you use the term dependent types."
blarg1 13 hours ago

Makes naming functions easier, eg multiply(mat3,vec3), multiply(mat3,mat3), etc
If the language is dynamically typed, you can use it for polymorphism.

photios 2 days ago

Lord, have mercy on the souls of those developers that inherit such overload-infested codebases.

Iwan-Zotow a day ago

AI has no soul

rahulprasath 2 days ago

Why don’t mainstream languages like Python just support multimethods directly?

perlgeek 2 days ago

Probably because typing in Python is optional and kinda bolted-on after the fact, and you really need a good type system to get the most out of multimethods.
In particular, type declarations, as they were first introduced in python, did not have a run-time effect, and they very much have a run-time effect with multi methods.
(Dataclasses are kind of an exception to this rule, but one that feels far less fundamental than multi dispatch would).
RadiozRadioz 2 days ago

Many of them do, e.g. Java. Why don't _some_ mainstream languages like Python not support it? Entirely design preference, usually because they want to have less emphasis on the type system.
- pdhborges 2 days ago
  
  Java supports it? Are you conflating method overload (which is statically determined) with multimethods?

blu3h4t 2 days ago

Why is it made look like it’s called by the nickname of a prominent Perl hacker? :D

perlgeek 2 days ago

It also looks familiar to multi dispatch in Raku, formerly Perl 6.
See for example https://docs.raku.org/language/functions#Multi-dispatch
shawn_w 2 days ago

Not just a prominent perl hacker, but one who's been working on a major new OO system for the language.
- blu3h4t a day ago
  
  Yea, I know right, Before I had a joke about Perl like this: Perl is like nineties music, it’s simply the best of all times. :) But a few days ago I made a new one up: Perl is a shrodingers programming language. :D
  - shawn_w a day ago
    
    >Perl is like nineties music, it’s simply the best of all times.
    You called that a joke, but I think it's just the truth.

olejorgenb a day ago

I assume this does not play ball with typehints? I don't think a decorator can "output" an @overload?

olejorgenb a day ago

Or maybe it does: https://github.com/breuleux/ovld/blob/f254dcc6215d50d5f9c3d1...
- mpeg a day ago
  
  That's both genius and a massive hack, love it.
  For those who are wondering: they alias @typing.overload as @ovld only for type checkers.
  Normally, @typing.overload is a way to define multiple function signatures with empty bodies so your type checker can work nicely with functions that for example only return a certain type when called with a certain signature, but the function implementation is left to you and usually involves a bunch of runtime type checks to branch your logic the right way
  - paddy_m a day ago
    
    What is the return type of an overloaded function? I don't see that in any of the examples.
    I think it would make sense to have each function return either the exact same type, or a generic.
    @ovld(int) #all of these overloads should return an int len(a:list) -> int len(a:str) -> int #the effective type of len is len(a:Union[int, str]) -> int
    and then a generic example
    T = TypeVar('T') @ovld(T) add(a:np.int8, b:np.int8) -> np.int8 @ovld(T) add(a:np.int16, b:np.int16) -> np.int16
    In the second example, I don't know how you would connect the input params to the generic T. If you put concrete types in, I see how ovld works.
    Am I missing something?
- breuleux a day ago
  
  It works, ish. Type checkers have a rule that you must have a non-stub, non-decorated overload, so if using ty for example, you need to --ignore invalid-overload. mypy doesn't seem to recognize the hack, although I could've sworn it did? Anyway, it'd be nice if we could mark a decorator as functioning like @overload but without the need for stubs.

est a day ago

how about dispatch for both sync and async?

I did this: https://news.ycombinator.com/item?id=43982570

dfissadifdoi a day ago

[dead]