I definitely agree that as a general principle, having more opportunities to hear the pronunciation is good. After doing some more reviews, I have a couple of concrete examples of instances when it bothered me and why:
(1) I have just answered a reading question correctly and have previously answered the meaning question correctly too. However, in the intervening time since answering the meaning question, I have forgotten the meaning and want to check the meaning again by opening the infobox. In this case, hearing the reading is distracting me from my goal of looking up the meaning. Furthermore, I just answered the meaning question correctly, so I heard the meaning audio a few seconds ago and it is repetitive hearing it again so soon. I think this scenario could be fixed by disabling the infobox autoplay when the reading answer is already correct.
(2) I have just answered a reading vocab question incorrectly because I mixed up one of the kanji with another visually similar kanji. In this case, my goal in opening the infobox is to figure out what the correct kanji actually is. In this case, it seems like hearing the correct reading before seeing the correct kanji could make me associate the reading with the incorrect visually similar kanji instead of the actual kanji. This issue is harder to fix since there isn’t an easy way for WaniKani to tell whether I am opening the infobox because of a reason like this vs. knowing the kanji but forgetting the reading.
I should also mention that my perspective is influenced by the fact that I use the Review Audio Tweak 2 script, which causes the reading audio to be played after both reading and meaning questions (but only if you have already gotten the reading right). So I already get exposed to the reading audio about 50% more compared to the default WaniKani experience and don’t feel much urgency to hear the audio even more beyond that.