You’re right. I think because of the bilinear filter and the higher resolution of the emulated image. Therefore, more Y pixels of the emulated images (4,7) mix with each pixel of the actual resolution.
If the contrast between 2 pixels is very high, for example, text, you can see it more clearly.
With my method you can see it not so much because I use a smaller emulated resolution (4x) and each pixel of the scanlines has a different opacity.