PERSPEKTIF GLOBAL TREN DAN PERKEMBANGAN INOVASI PENELITIAN  VIDEO TO MUSIC GENERATION

Ade Bastian; Ardi Mardiana; Muhammad Fahmi Ajiz; Satria Winata

doi:10.31949/infotech.v11i1.13830

Authors

Ade Bastian Universitas Majalengka
Ardi Mardiana Universitas Majalengka
Muhammad Fahmi Ajiz Universitas Majalengka
Satria Winata Universitas Majalengka

DOI:

https://doi.org/10.31949/infotech.v11i1.13830

Abstract

Penelitian ini bertujuan memetakan evolusi generasi musik berbasis AI, khususnya generasi musik dari video. Melalui analisis bibliometrik terhadap 999 publikasi ilmiah (1997-2025), kami menganalisis tren dan struktur konseptual menggunakan VOSviewer. Metode meliputi ekstraksi metadata, konstruksi jaringan ko-kepengarangan, dan identifikasi kluster dominan. Hasil mengungkapkan lima kluster tematik utama: model generatif berbasis teks, generasi musik simbolik, musik video game, integrasi multimedia, dan komposisi otomatis. Studi terbaru menunjukkan pergeseran ke arsitektur generatif multimodal, mengintegrasikan transformer dan model difusi untuk mengatasi tantangan penyelarasan semantik-temporal antara video dan musik. Penelitian mengidentifikasi kesenjangan utama: kelangkaan dataset berpasangan skala besar, kurangnya metrik evaluasi standar, dan terbatasnya sistem generasi real-time. Kebaruan penelitian ini adalah pemetaan bibliometrik pertama yang fokus eksklusif pada generasi musik dari video, memberikan fondasi bagi komunitas akademik dan industri untuk memahami lintasan dan arah masa depan bidang ini.

Keywords:

Artificial Intelligence, AI-Driven Music Generation, Bibliometric Analysis, Generative Models, Video to Music Generation

Downloads

Download data is not yet available.

References

Božić, M., & Horvat, M. (2024). A Survey of Deep Learning Audio Generation Methods. http://arxiv.org/abs/2406.00146

Briot, J.-P., Hadjeres, G., & Pachet, F.-D. (2019). Deep Learning Techniques for Music Generation - A Survey. http://arxiv.org/abs/1709.01620

Dannenberg, R. B., & Neuendorffer, T. (2003). Sound Synthesis from Real-Time Video Images.

DI, S., Jiang, Z., Liu, S., Wang, Z., Zhu, L., He, Z., Liu, H., & Yan, S. (2021). Video Background Music Generation with Controllable Music Transformer. MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, 2037–2045. https://doi.org/10.1145/3474085.3475195

Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070

Gan, C., Huang, D., Chen, P., Tenenbaum, J. B., & Torralba, A. (2020). Foley Music: Learning to Generate Music from Videos. http://arxiv.org/abs/2007.10984

Gu, X., Shen, Y., & Lv, C. (2023). A Dual-Path Cross-Modal Network for Video-Music Retrieval. Sensors, 23(2). https://doi.org/10.3390/s23020805

Ji, S., Wu, S., Wang, Z., Li, S., & Zhang, K. (2025). A Comprehensive Survey on Generative AI for Video-to-Music Generation.

Jiang, Y.-G., Wu, Z., Tang, J., Li, Z., Xue, X., & Chang, S.-F. (2017). Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification. http://arxiv.org/abs/1706.04508

Kang, J., Poria, S., & Herremans, D. (2024). Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. https://doi.org/10.1016/j.eswa.2024.123640

Li, R., Zheng, S., Cheng, X., Zhang, Z., Ji, S., & Zhao, Z. (2024). MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization. http://arxiv.org/abs/2410.12957

Li, S., Qin, Y., Zheng, M., Jin, X., & Liu, Y. (2024). Diff-BGM: A Diffusion Model for Video Background Music Generation. http://arxiv.org/abs/2405.11913

Li, S., Yang, B., Yin, C., Sun, C., Zhang, Y., Dong, W., & Li, C. (2024). VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features. http://arxiv.org/abs/2412.06296

Lin, J.-C., Wei, W.-L., & Wang, H.-M. (2016). Automatic Music Video Generation Based on Emotion- Oriented Pseudo Song Prediction and Matching. MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, 372–376. https://doi.org/10.1145/2964284.2967245

Lin, Y.-B., Tian, Y., Yang, L., Bertasius, G., & Wang, H. (2024). VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos. http://arxiv.org/abs/2409.07450

Liu, X., Tu, T., Ma, Y., & Chua, T.-S. (2025). Extending Visual Dynamics for Video-to-Music Generation. IEEE International Conference on Program Comprehension, 2022-March, 36–47. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Mao, Z., Zhao, M., Wu, Q., Zhong, Z., Liao, W.-H., Wakaki, H., & Mitsufuji, Y. (2025). Cross-Modal Learning for Music-to-Music-Video Description Generation. http://arxiv.org/abs/2503.11190

Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A., & Bengio, Y. (2017). SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. http://arxiv.org/abs/1612.07837

Prétet, L. (2022). Metric learning for video to music recommendation. https://theses.hal.science/tel-03638477v1

Rai, A., & Sridhar, S. (2024). EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos. http://arxiv.org/abs/2407.20592

Stewart, S., KV, G., Lu, L., & Fanelli, A. (2024). Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval. http://arxiv.org/abs/2412.05831

Su, K., Li, J. Y., Huang, Q., Kuzmin, D., Lee, J., Donahue, C., Fei, S., Jansen, A., Wang, Y., Verzetti, M., & Denk, T. (2024). V2Meow: Meowing to the Visual Beat via Video-to-Music Generation. www.aaai.org

Suríš, D., Vondrick, C., Russell, B., Research, A., & Salamon, J. (2022). It’s Time for Artistic Correspondence in Music and Video.

Tian, Z., Liu, Z., Yuan, R., Pan, J., Liu, Q., Tan, X., Chen, Q., Xue, W., & Guo, Y. (2024). VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling. http://arxiv.org/abs/2406.04321

Wang, Z., Bao, C., Zhuo, L., Han, J., Yue, Y., Tang, Y., Huang, V. S.-J., & Liao, Y. (2025). Vision-to-Music Generation: A Survey. http://arxiv.org/abs/2503.21254

Yu, J., Wang, Y., Chen, X., Sun, X., & Qiao, Y. (2023). Long-Term Rhythmic Video Soundtracker. http://arxiv.org/abs/2305.01319

Zhang, L., & Fuentes, M. (2024). SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data. http://arxiv.org/abs/2410.03879

Zhou, Y., Wang, Z., Fang, C., Bui, T., & Berg, T. L. (2018). Visual to Sound: Generating Natural Sound for Videos in the Wild. http://arxiv.org/abs/1712.01393

Zhuo, L., Wang, Z., Wang, B., Liao, Y., Bao, C., Peng, S., Han, S., Zhang, A., Fang, F., & Liu, S. (2023). Video Background Music Generation: Dataset, Method and Evaluation. http://arxiv.org/abs/2211.11248

Zuo, H., You, W., Wu, J., Ren, S., Chen, P., Zhou, M., Lu, Y., & Sun, L. (2025). GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions. http://arxiv.org/abs/2501.09972

PERSPEKTIF GLOBAL TREN DAN PERKEMBANGAN INOVASI PENELITIAN VIDEO TO MUSIC GENERATION

Authors

DOI:

Abstract

Keywords:

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

FOR AUTHOR

JOURNAL INFORMATION

INDEXING

VISITOR