On most applications, you need to hold the CTRL key while rotating the scrollwheel up/down to do this (as simply using the scrollwheel does vertical - falling back to horizontal if no vertical scrolling possible - scrolling of the document). This is quite a straightforward operation once discovered, and something you'll probably never really pay attention to ever again once you figure out how it works.
What (if any) then is the problem with it?
Let's pause for a moment, and do a little experiment:
1. Make sure there's a webbrowser open on some page where there is some zoomable content (which for older browsers will be a page of text only).
2. Without thinking too much about it, and try to zoom in by holding down the CTRL key and moving the scrollwheel, but you're only allowed to do this once. Has the view now been zoomed in (i.e. do items appear larger than they were before)?
3. Repeat this process with a few people, preferably those who do not touch computers that often or who haven't had very much experience using computers extensively over long periods of time. Elderly folk and really young kids might be your only options here. How often did they get it right?
Have you done this yet? If not, go out and do it! Come back and read the rest when you have :)
For me, I never really thought much about this until I came across an (UserPref) option in Blender called 'Invert Zoom Direction' (which is one of the settings I always change on any instance of Blender when using it for the first time). What does it do? Exactly as its name implies of course: it switches the mappings for how scrollwheel zoom events were interpreted.
Referring back to the experiment above. Before I discovered this option in Blender, I'd frequently find myself not getting the zoom response I expected. Everytime after not having changing the zoom for a while (probably a few days or weeks between each time that I'd need to do that in some document/spreadsheet or webpage), my initial attempts at zooming would be end up zooming out. Seeing this, I'd remember/figure out that I needed to scroll in the opposite direction to actually zoom in (now having to do 2 'clicks' of scrolling to get to the original point I wanted to get to).
This is why in step 2 of the experiment, I specified to only do this once, as we're interested in the 'open-loop impulse' action of a user where the action performed is (almost) purely driven by natural instinct. If you considered more than the initial action, what you should find are the 'closed-loop impulses' as well, which would be the user responding to system feedback to their initial action, and then correcting/altering their behaviour to align more with the image the system projects. In practical terms, what would happen is that if the user didn't immediately see a zoom in after telling the computer to, they'd naturally reverse the direction of scrolling (since ultimately they want to zoom in), and this temporal (and subconscious) change in their model of the system means that in short-term memory they now associate zoom in with the what their recent experience with the system showed them.
Over time, with repeated/constant enough exposure, this becomes second-hand as muscle memory takes over, which is why I recommended testing on people less familiar with computers. Such test subjects are less likely to have been 'tainted' by having already adapted themselves to the system (i.e. built up muscle memory of this action). This would mean that it is not possible to get a true measure of whether this is natural or not. Unfortunately, I believe that it will become increasingly difficult to measure such phenomena in future, as practically everyone will be familiar with computer systems already. This has implications for anyone trying to design any 'newer' GUI paradigms, such as the MS Ribbon (though arguably the shot themselves in the foot before getting out of the starting blocks by blotching their implementation in terms of organisation of data).
Now, let us look more at what is going on with this scrollwheel zooming.
Firstly, the 'standard way':
Scrollwheel Up = Zoom In
Scrollwheel Down = Zoom Out
Now, the situation with 'invert zoom':
Scrollwheel Up = Zoom Out
Scrollwheel Down = Zoom In
On Screen (and probably on paper), when laid out like this, the 'standard' way looks like the obvious choice. An increase in the zoom factor is controlled by going moving the wheel in the upwards direction, and a decrease in the zoom factor is controlled by moving the wheel in the downwards direction. Thus, the wheel here acts as like one of those 'spinner' controls (see Figure 1), acting on the zoom-factor setting. Logically, this is going to be intuitive: move the scrollwheel in the direction which changes the zoom factor in the way that you want it to change. What could go wrong with it?
Personally, I believe that this very assumption is flawed. 'Zoom factor' is more of a technical concept (actually, I could probably go as far as saying it's more of a system/implementation detail). People aren't going to stop and think: "If I want to see the facial features of Auntie Ann, I'm going to have to adjust the zoom factor from 1.0x to 5.7x". Instead, a more natural approach would be: "I need to bring this closer so that I can look at it more closely". So, zooming in is more of a "pulling in/closer" operation as opposed to a "let's blow things up" operation.
So, if we're going to be pulling things closer to zoom in, we're going to be wanting to be grabbing the subject and moving it towards us. This is what the scrollwheel down action nicely corresponds to, as you roll the wheel from a position away from you towards yourself. Conversely, to zoom out, we're going to have to push things away, so we're going to be wanting to be grabbing the subject and pushing it away from us. The scrollwheel up action corresponds to this, as you roll the wheel from a position close to yourself away to the top of the mouse.
Another way to view this is pulling yourself to and pushing yourself from a desk while sitting on an office chair while looking at something on the desk. It's quite a natural metaphor.
"Ah...", you say, "but this is just a case you seeing this in the wrong way". But, don't run away just yet. This is not the only reason, but rather just the tip of the iceberg.
If we look at this from another perspective: the "Visual Information-Seeking Mantra" by Ben Shneiderman. This can be summed up as "Overview first, zoom and filter, then details on demand", which is very similar to Blender's interaction paradigm of "View, Select, Edit" (and/or the "Model View Controller" and derivative structures used by Software Engineers).
Shneiderman's mantra is especially true for Zooming User Interfaces (ZUI's) more so than in other interface paradigms, as your navigation through a data space is effectively based on going from a rough overview to a closer view of a portion of that artifact, which leads on to even finer details/views of that topic ad infinitum (or as deep as data is nested to be available for investigation).
Sure, it could be said that ZUI's currently have other sets of problems to overcome still (the "infinitely blank plane" problem, where you don't know if you're zoomed in too far, zoomed out too far, or just looking at a hole where nothing exists), and may be merely a passing fad that never really takes off (as there is very little active research on this, according to Wikipedia at least at the time of writing). But, I do think that ZUI's still offer an interesting paradigm for information visualisation that capitalises on the ways in which people make sense of things by forming abstractions (overview), and from those abstractions being able to identify aspects to investigate with more detail (selection of details for further investigation). This can allow identifying traits within a structured framework which would otherwise have gone unnoticed due to overwhelming complexity being present.
Getting back on topic. If we consider that this is how you'll often work with your documents/etc. under more conventional conditions (which I argue that you probably should, if you're not already), then zooming in will be a more important and often-performed action than zooming out will be. Let's first stop and look at a related operation on the scrollwheel: (vertical) scrolling. Arguably, this is what the scrollwheel was designed for. In this case, the primary action will be scrolling downwards to read more of the document, and that involves performing 'scrollwheel down'.
This is quite a natural mapping (Down = Down), but also happens to be a very comfortable + easy to perform action. If you don't believe me, try flicking the scrollwheel both ways rapidly for a reasonable period of time (say 1-2 minutes :P) resting adequately before each action. Take note of which action you can do relatively quicker than the other, and also take note of which one starts giving you a sore finger near the end of this exercise.
Unless you're superhuman, flicking away should get tiring quite quickly, and will not really be that fast to perform. For most people, the muscles on the underside of the hand (pulling finger 'down' vs pushing away) will be stronger and less prone to fatigue. Thus, it's not such a natural action to perform.
Now, if we connect the dots, we'll see that the standard zooming method requires an unnatural action to be performed for what is ultimately a primary action. If we connect some more dots to what I said about the "pulling" metaphor for zooming-in, we can see that scrollwheel down for zooming in is actually quite a natural and more intuitive mapping than it would logically seem on paper. It's a bit like writing music: despite what music theory often tells you, what looks good on paper may not necessarily sound that brilliant when played out (though sometimes it does still sound ok).
So in conclusion, I think the software pioneers got this one wrong. The better, more natural, and more ergonomically sound method is:
Scrollwheel Up = Zoom Out
Scrollwheel Down = Zoom In
However, because they set this trend, many people have blindly followed it. This has consequently led to cognitive dissonance, where many people have now been brainwashed to perform these actions via muscle memory that getting them to change their ways is not going to be easy/possible.