Challenges and Opportunities in Open Generative Models
IRB 4105 or https://umd.zoom.us/s/2203990027
Open source generative models like OpenGPT2, BLOOM, and others have been pivotal in advancing AI technology. These models leverage extensive text data to achieve advanced linguistic capabilities. However, the trend towards proprietary tools and closed large language models is growing, posing unique challenges in open-source AI development. This discussion will explore the challenges and opportunities training these foundation models, the hurdles in dataset governance, and downstream AI for science applications. We will also explore algorithmic improvements in training these models such as discrete diffusion, learned adaptive noise schedules, and modern GAN baselines. We will cover the challenges of generative models in several different modalities: text, image, and biological sequence data.