Curriculum Vitae
I am a Ph.D. Candidate with a focus on Generative modeling. I have experience in building Generative Adversarial Networks (GANs), Diffusion Models, Normalizing Flows, LLMs, ViTs, and other models with applications in generation, classification, segemntation, and detection. I have a focus on explicit density models, or invertible generative models. I am deeply interested in data efficiency, in making models more efficiently utilize data, model distillation, as well as generative data. I have experience training large models from scratch, working in high performance computing (HPC), multi-node training, and writing CUDA kernels. I have a strong background in mathematics and utilizing that knowledge into improving machine learning models.
I am available for full time opportunities and collaboration. Please reach out if you are interested.
Education
University of Oregon
Ph.D. in Computer Science
Eugene, OR
current
University of Oregon
M.A. in Computer Science
Eugene, OR
June 2023
Embry-Riddle Aeronautical Univeristy
B.S. Space Physics
Prescott, AZ
Dec 2014
Experience
Graduate Researcher
Sep 2018 - current
⚬ Researched and developed state-of-the-art (SOTA) Generative Models, including Generative Adversarial Networks (GANs), Vision Transformers (ViTs), Diffusion Models, and Normalizing Flows.
⚬ Performed research that led to SOTA generative models, increasing performance metrics while reducing the number of model parameters and training costs.
⚬ Developed novel attention mechanisms to efficiently incorporate local and global information, which led to reduced computational burdens while increasing performance.
⚬ Developed small and efficient ViTs, which could outperform models with nearly 100x more parameters, on both vision and language (NLP) tasks.
⚬ Innovated new analysis techniques for understanding model behavior and training dynamics which led to developing more robust and computationally efficient AI model.
Metropolis Intern
Sep 2023 - Mar 2024
⚬ Significantly improved the generalizability of ReIdentificationNet by implementing advanced training techniques and architectural modifications, enhancing performance on customer data.
⚬ Led analysis of Identification models to uncover low performance regions and utilized this information to vastly improve robustness for both small and large ReIdentification models.
⚬ Conducted profile analysis of models and implemented optimizations to reduce computational requirements for customers, significantly increasing model throughput.
⚬ Optimized deep learning models through pruning and quantization, improving efficiency, and reducing inference latency while adapting them for seamless deployment with TensorRT compilation.
⚬ Developed models for synthetic data generation, enhancing model robustness and generalization, through the use of both GANs and Diffusion Models.
⚬ Developed tooling to improve team productivity while training models and onboarding new machines, leveraging my experience in Linux, Bash Scripting, and familiarity with HPC environments.
Research Intern
Jun 2021 - Nov 2022
⚬ Researched and developed methods for style based transfer of text, enhancing creativity for the application's commercial use.
⚬ Developed advanced distillation techniques for optimizing model inference without sacrificing performance.
⚬ Investigated the integration of generative models into existing software pipelines, extending the capabilities of Picsart's early AI-driven tools.
⚬ Administered compute infrastructure by developing and implementing tooling such as SLURM scripts and establishing operational standards for ensuring reliability and efficient machine utilization.
⚬ Collaborated in procurement discussions for new machine acquisitions, leveraging my HPC experience to access technical requirements for the businesses's newly founded AI division, directly engaging with sellers.
Lawrence Livermore National Laboratory
Computational Scholar
Jun 2020 - Sep 2020
⚬ Developed machine learning software to analyze noisy X-ray images, contributing to projects focused on identifying geometry and material composition of varying objects.
⚬ Collaborated colsely with imaging scientists (customer), utilizing my physics background to enhance model accuracy, interpret complex data, and develop robust analysis frameworks.
⚬ Engaged in cross-functional meetings and discussions, developing proper proxy experiments to ensure the proper mission objectives were achieved despite the sensitive nature of the project.
Lawrence Livermore National Laboratory
Computational Scholar
Jun 2019 - Sep 2019
⚬ Investigated the integration of machine learning techniques into ALPINE Ascent HPC software suite, optimizing in situ data processing for AI driven data interpolation.
⚬ Researched solutions to overcome HPC data constraints within highly parallel supercomputing environments.
⚬ Provided critical input in technical presentations, leading to subsequent research opportunities and funding within the lab.
ASTRO Intern
Jun 2018 - Aug 2018
⚬ Integrated AUDIOS2's data management framework into ALPINE Ascent HPC software, enabling efficient data handling in computational environments.
⚬ Integrated streaming into ALPINE Ascent, enabling users to perform visualizations and analysis tasks in situ, reducing locking operations in highly parallel environments and enabling higher machine utilization.
⚬ Research and development led to 'Visualization as a Service' (VaaS) paradigm, allowing visualization and analysis to be performed in real time and off-node through arbitrary network connections (i.e. infiniband, ethernet, wireless, over LAN or WAN), extending the scope and flexibility of data processing.
Engineer and Lead Scientist
Jun 2015 - May 2018
⚬ Led and secured NASA STTR Phase I contract (valued \$150k) that led to Phase II funding (valued \$750k), providing essential funding that led to the growth of the company.
⚬ Developed cutting-edge radiation shielding methods capable of generating auxiliary power for satellite operations, minimizing mass expendatures and extending mission capabilities.
⚬ Developed AI optimization algorithms leading to the innovation of novel material and geometric designs for energy producing radiation shielding.
⚬ Spearheaded conversion of acoustic dynamic simulation models, leading to over a 100x reduction in computation time, enabling more accurate and complex simulations to be run faster and reducing costs.
⚬ Built and designed computational and physical testing frameworks, including HPC clusters, to facilitate computation critical for day-to-day operations.
⚬ Built open source libraries and documentation necessary for management of scientific data.
Publications
Efficient Image Generation with Variadic Attention Heads (StyleNAT)
Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi
Bibtex
@InProceedings{WaltonStyleNAT2025CVPR, author = {Walton, Steven and Klyukin, Valeriy and Artemev, Maksim and Derkach, Denis and Orlov, Nikita and Shi, Humphrey}, title = {Efficient Image Generation with Variadic Attention Heads}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, } # Previously @misc{walton2023stylenatgivingheadnew, title={StyleNAT: Giving Each Head a New Perspective}, author={Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi}, year={2023}, eprint={2211.05770}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2211.05770}, }
Steven Walton, Valeriy Klyukin, Maksim Artemev, Denis Derkach, Nikita Orlov, Humphrey Shi
Bibtex
@InProceedings{WaltonDNF2025CVPR, author = {Walton, Steven and Klyukin, Valeriy and Artemev, Maksim and Derkach, Denis and Orlov, Nikita and Shi, Humphrey}, title = {Distilling Normalizing Flows}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, }
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, Ioana Croitoru, Simion-Vlad Bogolin, Jialu Tang, Florian Langer, Vyas Raina, Vatsal Raina, Hanyi Xiong, Vishaal Udandarao, Jingyi Lu, Shiyang Chen, Sam Purkis, Tianshuo Yan, Wenye Lin, Gyungin Shin, Qiaochu Yang, Anh Totti Nguyen, David I. Atkinson, Aaditya Baranwal, Alexandru Coca, Mikah Dang, Sebastian Dziadzio, Jakob D. Kunz, Kaiqu Liang, Alexander Lo, Brian Pulfer, Steven Walton, Charig Yang, Kai Han, Samuel Albanie
Bibtex
@misc{roberts2025zerobenchimpossiblevisualbenchmark, title={ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models}, author={Jonathan Roberts and Mohammad Reza Taesiri and Ansh Sharma and Akash Gupta and Samuel Roberts and Ioana Croitoru and Simion-Vlad Bogolin and Jialu Tang and Florian Langer and Vyas Raina and Vatsal Raina and Hanyi Xiong and Vishaal Udandarao and Jingyi Lu and Shiyang Chen and Sam Purkis and Tianshuo Yan and Wenye Lin and Gyungin Shin and Qiaochu Yang and Anh Totti Nguyen and David I. Atkinson and Aaditya Baranwal and Alexandru Coca and Mikah Dang and Sebastian Dziadzio and Jakob D. Kunz and Kaiqu Liang and Alexander Lo and Brian Pulfer and Steven Walton and Charig Yang and Kai Han and Samuel Albanie}, year={2025}, eprint={2502.09696}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.09696}, }
Design Amortization for Bayesian Optimal Experimental Design
Noble Kennamer, Steven Walton, Alexander Ihler
Bibtex
@article{Kennamer_Walton_Ihler_2023, title={Design Amortization for Bayesian Optimal Experimental Design}, volume={37}, url={https://ojs.aaai.org/index.php/AAAI/article/view/25992}, DOI={10.1609/aaai.v37i7.25992}, abstractNote={Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this work we build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the EIG. Past work focused on learning a new variational model from scratch for each new design considered. Here we present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs. To further improve computational efficiency, we also propose to train the variational model on a significantly cheaper-to-evaluate lower bound, and show empirically that the resulting model provides an excellent guide for more accurate, but expensive to evaluate bounds on the EIG. We demonstrate the effectiveness of our technique on generalized linear models, a class of statistical models that is widely used in the analysis of controlled experiments. Experiments show that our method is able to greatly improve accuracy over existing approximation strategies, and achieve these results with far better sample efficiency.}, number={7}, journal={Proceedings of the AAAI Conference on Artificial Intelligence}, author={Kennamer, Noble and Walton, Steven and Ihler, Alexander}, year={2023}, month={Jun.}, pages={8220-8227} }
Neighborhood Attention Transformer
Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi
Bibtex
@InProceedings{Hassani_2023_CVPR, author = {Hassani, Ali and Walton, Steven and Li, Jiachen and Li, Shen and Shi, Humphrey}, title = {Neighborhood Attention Transformer}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {6185-6194} }
Isomorphism, Normalizing Flows, and Density Estimation: Preserving Relationships Between Data
Steven Walton
Bibtex
@article{walton2023isomorphism, title={Isomorphism, Normalizing Flows, and Density Estimation: Preserving Relationships Between Data}, author={Walton, Steven} }
Semask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi
Bibtex
@InProceedings{Jain_2023_ICCV, author = {Jain, Jitesh and Singh, Anukriti and Orlov, Nikita and Huang, Zilong and Li, Jiachen and Walton, Steven and Shi, Humphrey}, title = {SeMask: Semantically Masked Transformers for Semantic Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {752-761} }
Convmlp: Hierarchical Convolutional MLPs for Vision
Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi
Bibtex
@InProceedings{Li_2023_CVPR, author = {Li, Jiachen and Hassani, Ali and Walton, Steven and Shi, Humphrey}, title = {ConvMLP: Hierarchical Convolutional MLPs for Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {6307-6316} }
Escaping the Big Data Paradigm with Compact Transformers
Steven Walton, Ali Hassani, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, Humphrey Shi
Bibtex
@misc{hassani2022escapingbigdataparadigm, title={Escaping the Big Data Paradigm with Compact Transformers}, author={Ali Hassani and Steven Walton and Nikhil Shah and Abulikemu Abuduweili and Jiachen Li and Humphrey Shi}, year={2022}, eprint={2104.05704}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2104.05704}, }
Visualization as a Service for Scientific Data
David Pugmire, James Kress, Jieyang Chen, Hank Childs, Jong Choi, Dmitry Ganyushin, Berk Geveci, Mark Kim, Scott Klasky, Xin Liang, Jeremy Logan, Nicole Marsaglia, Kshitij Mehta, Norbert Podhorszki, Caitlin Ross, Eric Suchyta, Nick Thompson, Steven Walton, Lipeng Wan, Matthew Wolf
Bibtex
@InProceedings{pugmire_vaas, author="Pugmire, David Kress, James Chen, Jieyang Childs, Hank Choi, Jong Ganyushin, Dmitry Geveci, Berk Kim, Mark Klasky, Scott Liang, Xin Logan, Jeremy Marsaglia, Nicole Mehta, Kshitij Podhorszki, Norbert Ross, Caitlin Suchyta, Eric Thompson, Nick Walton, Steven Wan, Lipeng Wolf, Matthew", tor="Nichols, Jeffrey Verastegui, Becky Maccabe, Arthur `Barney' Hernandez, Oscar Parete-Koon, Suzanne Ahearn, Theresa", title="Visualization as a Service for Scientific Data", booktitle="Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI", year="2020", publisher="Springer International Publishing", address="Cham", pages="157--174", isbn="978-3-030-63393-6" }
DATUM: Dotted Attention Temporal Upscaling Method
Steven Walton
Bibtex
@article{waltondatum, title={DATUM: Dotted Attention Temporal Upscaling Method}, author={Walton, Steven} }
Teaching/TA
Modeling and Simulation
Course : CS 445/545
Winter 2024
Course : CS 451/551
Fall 2024
Course : 472/572
Spring 2024
Course : 472/572
Winter 2023
Course : 472/572
Winter 2022
Course : 413/513
Winter 2021
Course : 314
Fall 2020
Course : 322
Fall 2018