Having no idea about the extent what this "significantly higher resolution" means, I will try to comment in very simple terms:
Anything, including transparent ones, standing on the path of light reaching the “wells” on the sensor cause a loss in resolution besides sensitivity (quantum efficiency) of the sensor as a whole. An IR filter, a Color Filter Array, an antialiasing filter, a wave plate or any optical glass mean a certain loss of resolution as well as a certain drop in sensitivity (for most of us the number of photons collected in the “well” in a given time.)
The red, green and blue color filters cause up to 30% drop in the intensity of light passing through them, which corresponds to up to 4-stop less ISO sensitivity in conventional terms. (Trick here: Green is located around the mid portion of the visible spectrum so our eyes see sharper with green than what they do with the “side” colors like red and blue. That’s why Dr. Bayer has employed two green with one blue and one red in his pattern; do not mind about luminance or chrominance sensitivities.)
Assuming we employed the usual demosaicing algorithms, the elimination of the antialiasing filter is generally known as improving the resolving power of a sensor by around 10%. This is the increase in actual resolution power; increase in contrast, crispness, acuity, etc. are different subjects. As you note, the use of the AA filter can be likened to closing down a lens further when diffraction starts to appear.
The elimination of the CFA, while helping the sensitivity to raise considerably can also assist the resolution power of the sensor to reach its “native” value. What we do here is actually nothing but trying to “retrieve” back the original resolution power as well as its original sensitivity by trying to eliminate the losses for what we placed on it to get color pictures and to stay away from moire.
How far the elimination of the CFA can help a sensor’s resolution? It’s usually around 25% depending on the algorithm used. To have an idea about it, check first the sample comparisons between the D800 and D800E; they are about 10% apart from each other due to the elimination of the AA filter (as some Nikonians admit of not being able to see or being able to compensate by the PP sharpness slider only.) With crude calculations one can state that the resolution of the M9M could be equivalent of a 28MP color sensor.
The following chart illustrates a typical example of the same sensors resolution power, with and without the CFA.