Not all methods work well for everyone. Shadowing is an absolute nightmare for me. I cannot filter out the sounds of my own voice, so it’s impossible for me to hear the person a beat or two ahead of me. And in a group setting? Forget it. It’s just a wad of unintelligible sounds from start to finish.
What’s worked for me is listening in smaller chunks, repeating when the modeling is finished with each chunk, then work towards stringing them all together in one go once I’m comfortable with the chunks. It helps if what I’m attempting to imitate is composed entirely, or mostly, of words I already know. If I’m having to remember a seemingly random sequence of syllables (words I don’t know), that’s brain power taken away from processing the qualities (pitch, timing, etc.) of the sounds.
@Beyond_Sleepy 's suggestions of using recordings and repeating as necessary is how I go about it. I also use resources like Tadoku and YomuJP, which have graded readers with (native speaker!) recordings. They make a great resource for both listening and imitation practice, especially since the narrators use a speaking pace appropriate for the level (ex: N5 material is slower-paced and has clear breaks between verbal/adjectival/etc. phrases.)
Also, while it seems weirdly popular in some Japanese learning circles to be dismissive of pitch accent for words, knowing the pitch of words goes a long way towards being more understandable (and helps to recognize words when listening!) Plus, if you already have an idea of the pitch for the words in the sentence you’re imitating, you can focus on the bigger picture, as well as things like how pitch interacts with particles. I use the WaniKani Pitch Info userscript and listen to the WK-provided sample audio (then repeat it!) several times every time I learn a word, then have my settings set so that the sample audio plays every time I get the reading correct on a vocab review question to make sure I remember the pitch (or one of multiple possible pitches).
Edit: Oh! I almost forgot. I also find it helpful to record myself, then compare my recording with my source material. I think folks might be surprised by what they think they’re doing with their mouths during shadowing vs. what’s actually happening.