Graphical virtual environments are currently far from accessible to blind users as their content is mostly visual. This is especially unfortunate as these environments hold great potential for this population for purposes such as safe orientation, education, and entertainment. Previous tools have increased accessibility but there is still a long way to go. Visual-to-audio Sensory-Substitution-Devices (SSDs) can increase accessibility generically by sonifying on-screen content regardless of the specific environment and offer increased accessibility without the use of expensive dedicated peripherals like electrode/vibrator arrays. Using SSDs virtually utilizes similar skills as when using them in the real world, enabling both training on the device and training on environments virtually before real-world visits. This could enable more complex, standardized and autonomous SSD training and new insights into multisensory interaction and the visually-deprived brain. However, whether congenitally blind users, who have never experienced virtual environments, will be able to use this information for successful perception and interaction within them is currently unclear.We tested this using the EyeMusic SSD, which conveys whole-scene visual information, to perform virtual tasks otherwise impossible without vision. Congenitally blind users had to navigate virtual environments and find doors, differentiate between them based on their features (Experiment1:task1) and surroundings (Experiment1:task2) and walk through them; these tasks were accomplished with a 95% and 97% success rate, respectively. We further explored the reactions of congenitally blind users during their first interaction with a more complex virtual environment than in the previous tasks-walking down a virtual street, recognizing different features of houses and trees, navigating to cross-walks, etc. Users reacted enthusiastically and reported feeling immersed within the environment. They highlighted the potential usefulness of such environments for understanding what visual scenes are supposed to look like and their potential for complex training and suggested many future environments they wished to experience.