Curriculum Vitae

I am a Ph.D. Candidate with a focus on Generative modeling. I have experience in building Generative Adversarial Networks (GANs), Diffusion Models, Normalizing Flows, LLMs, ViTs, and other models with applications in generation, classification, segemntation, and detection. I have a focus on explicit density models, or invertible generative models. I am deeply interested in data efficiency, in making models more efficiently utilize data, model distillation, as well as generative data. I have experience training large models from scratch, working in high performance computing (HPC), multi-node training, and writing CUDA kernels. I have a strong background in mathematics and utilizing that knowledge into improving machine learning models.

I am available for full time opportunities and collaboration. Please reach out if you are interested.

Education

University of Oregon

Ph.D. in Computer Science

Eugene, OR

current

University of Oregon

M.A. in Computer Science

Eugene, OR

June 2023

Embry-Riddle Aeronautical Univeristy

B.S. Space Physics

Prescott, AZ

Dec 2014

Experience

University of Oregon

Graduate Researcher

Sep 2018 - current

⚬ Researched and developed state-of-the-art (SOTA) Generative Models, including Generative Adversarial Networks (GANs), Vision Transformers (ViTs), Diffusion Models, and Normalizing Flows.

⚬ Performed research that led to SOTA generative models, increasing performance metrics while reducing the number of model parameters and training costs.

⚬ Developed novel attention mechanisms to efficiently incorporate local and global information, which led to reduced computational burdens while increasing performance.

⚬ Developed small and efficient ViTs, which could outperform models with nearly 100x more parameters, on both vision and language (NLP) tasks.

⚬ Innovated new analysis techniques for understanding model behavior and training dynamics which led to developing more robust and computationally efficient AI model.

NVIDIA

Metropolis Intern

Sep 2023 - Mar 2024

⚬ Significantly improved the generalizability of ReIdentificationNet by implementing advanced training techniques and architectural modifications, enhancing performance on customer data.

⚬ Led analysis of Identification models to uncover low performance regions and utilized this information to vastly improve robustness for both small and large ReIdentification models.

⚬ Conducted profile analysis of models and implemented optimizations to reduce computational requirements for customers, significantly increasing model throughput.

⚬ Optimized deep learning models through pruning and quantization, improving efficiency, and reducing inference latency while adapting them for seamless deployment with TensorRT compilation.

⚬ Developed models for synthetic data generation, enhancing model robustness and generalization, through the use of both GANs and Diffusion Models.

⚬ Developed tooling to improve team productivity while training models and onboarding new machines, leveraging my experience in Linux, Bash Scripting, and familiarity with HPC environments.

Picsart

Research Intern

Jun 2021 - Nov 2022

⚬ Researched and developed methods for style based transfer of text, enhancing creativity for the application's commercial use.

⚬ Developed advanced distillation techniques for optimizing model inference without sacrificing performance.

⚬ Investigated the integration of generative models into existing software pipelines, extending the capabilities of Picsart's early AI-driven tools.

⚬ Administered compute infrastructure by developing and implementing tooling such as SLURM scripts and establishing operational standards for ensuring reliability and efficient machine utilization.

⚬ Collaborated in procurement discussions for new machine acquisitions, leveraging my HPC experience to access technical requirements for the businesses's newly founded AI division, directly engaging with sellers.

Lawrence Livermore National Laboratory

Computational Scholar

Jun 2020 - Sep 2020

⚬ Developed machine learning software to analyze noisy X-ray images, contributing to projects focused on identifying geometry and material composition of varying objects.

⚬ Collaborated colsely with imaging scientists (customer), utilizing my physics background to enhance model accuracy, interpret complex data, and develop robust analysis frameworks.

⚬ Engaged in cross-functional meetings and discussions, developing proper proxy experiments to ensure the proper mission objectives were achieved despite the sensitive nature of the project.

Lawrence Livermore National Laboratory

Computational Scholar

Jun 2019 - Sep 2019

⚬ Investigated the integration of machine learning techniques into ALPINE Ascent HPC software suite, optimizing in situ data processing for AI driven data interpolation.

⚬ Researched solutions to overcome HPC data constraints within highly parallel supercomputing environments.

⚬ Provided critical input in technical presentations, leading to subsequent research opportunities and funding within the lab.

Oak Ridge National Laboratory

ASTRO Intern

Jun 2018 - Aug 2018

⚬ Integrated AUDIOS2's data management framework into ALPINE Ascent HPC software, enabling efficient data handling in computational environments.

⚬ Integrated streaming into ALPINE Ascent, enabling users to perform visualizations and analysis tasks in situ, reducing locking operations in highly parallel environments and enabling higher machine utilization.

⚬ Research and development led to 'Visualization as a Service' (VaaS) paradigm, allowing visualization and analysis to be performed in real time and off-node through arbitrary network connections (i.e. infiniband, ethernet, wireless, over LAN or WAN), extending the scope and flexibility of data processing.

Gloyer-Taylor Labs LLC

Engineer and Lead Scientist

Jun 2015 - May 2018

⚬ Led and secured NASA STTR Phase I contract (valued \$150k) that led to Phase II funding (valued \$750k), providing essential funding that led to the growth of the company.

⚬ Developed cutting-edge radiation shielding methods capable of generating auxiliary power for satellite operations, minimizing mass expendatures and extending mission capabilities.

⚬ Developed AI optimization algorithms leading to the innovation of novel material and geometric designs for energy producing radiation shielding.

⚬ Spearheaded conversion of acoustic dynamic simulation models, leading to over a 100x reduction in computation time, enabling more accurate and complex simulations to be run faster and reducing costs.

⚬ Built and designed computational and physical testing frameworks, including HPC clusters, to facilitate computation critical for day-to-day operations.

⚬ Built open source libraries and documentation necessary for management of scientific data.

Publications

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi

GitHub Link

Bibtex

@misc{hassani2025generalizedneighborhoodattentionmultidimensional,
  title={Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light},
  author={Ali Hassani and Fengzhe Zhou and Aditya Kane and Jiannan Huang and Chieh-Yun Chen and Min Shi and Steven Walton and Markus Hoehnerbach and Vijay Thakkar and Michael Isaev and Qinsheng Zhang and Bing Xu and Haicheng Wu and Wen-mei Hwu and Ming-Yu Liu and Humphrey Shi},
  year={2025},
  eprint={2504.16922},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2504.16922},
}

Efficient Image Generation with Variadic Attention Heads (StyleNAT)

Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi

GitHub Link

Bibtex

@InProceedings{WaltonStyleNAT2025CVPR,
  title     = {Efficient Image Generation with Variadic Attention Heads},
  author={Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2025},
} # # Formerly # @misc{walton2023stylenatgivingheadnew,
  title={StyleNAT: Giving Each Head a New Perspective},
  author={Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi},
  year={2023},
  eprint={2211.05770},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2211.05770},
}

Distilling Normalizing Flows

Steven Walton, Valeriy Klyukin, Maksim Artemev, Denis Derkach, Nikita Orlov, Humphrey Shi

Bibtex

@InProceedings{WaltonDNF2025CVPR,
  author    = {Walton, Steven and Klyukin, Valeriy and Artemev, Maksim and Derkach, Denis and Orlov, Nikita and Shi, Humphrey},
  title     = {Distilling Normalizing Flows},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2025},
}

ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, Ioana Croitoru, Simion-Vlad Bogolin, Jialu Tang, Florian Langer, Vyas Raina, Vatsal Raina, Hanyi Xiong, Vishaal Udandarao, Jingyi Lu, Shiyang Chen, Sam Purkis, Tianshuo Yan, Wenye Lin, Gyungin Shin, Qiaochu Yang, Anh Totti Nguyen, David I. Atkinson, Aaditya Baranwal, Alexandru Coca, Mikah Dang, Sebastian Dziadzio, Jakob D. Kunz, Kaiqu Liang, Alexander Lo, Brian Pulfer, Steven Walton, Charig Yang, Kai Han, Samuel Albanie

GitHub Link

Bibtex

@misc{roberts2025zerobenchimpossiblevisualbenchmark,
  title={ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models},
  author={Jonathan Roberts and Mohammad Reza Taesiri and Ansh Sharma and Akash Gupta and Samuel Roberts and Ioana Croitoru and Simion-Vlad Bogolin and Jialu Tang and Florian Langer and Vyas Raina and Vatsal Raina and Hanyi Xiong and Vishaal Udandarao and Jingyi Lu and Shiyang Chen and Sam Purkis and Tianshuo Yan and Wenye Lin and Gyungin Shin and Qiaochu Yang and Anh Totti Nguyen and David I. Atkinson and Aaditya Baranwal and Alexandru Coca and Mikah Dang and Sebastian Dziadzio and Jakob D. Kunz and Kaiqu Liang and Alexander Lo and Brian Pulfer and Steven Walton and Charig Yang and Kai Han and Samuel Albanie},
  year={2025},
  eprint={2502.09696},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.09696},
}

Design Amortization for Bayesian Optimal Experimental Design

Noble Kennamer, Steven Walton, Alexander Ihler

Bibtex

@article{Kennamer_Walton_Ihler_2023,
  title={Design Amortization for Bayesian Optimal Experimental Design},
  volume={37},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/25992},
  DOI={10.1609/aaai.v37i7.25992},
  abstractNote={Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this work we build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the EIG. Past work focused on learning a new variational model from scratch for each new design considered. Here we present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs. To further improve computational efficiency, we also propose to train the variational model on a significantly cheaper-to-evaluate lower bound, and show empirically that the resulting model provides an excellent guide for more accurate, but expensive to evaluate bounds on the EIG. We demonstrate the effectiveness of our technique on generalized linear models, a class of statistical models that is widely used in the analysis of controlled experiments. Experiments show that our method is able to greatly improve accuracy over existing approximation strategies, and achieve these results with far better sample efficiency.},
  number={7},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Kennamer, Noble and Walton, Steven and Ihler, Alexander},
  year={2023},
  month={Jun.},
  pages={8220-8227}
}

Neighborhood Attention Transformer

Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi

GitHub Link NATTEN Library

Bibtex

@InProceedings{Hassani_2023_CVPR,
  author    = {Hassani, Ali and Walton, Steven and Li, Jiachen and Li, Shen and Shi, Humphrey},
  title     = {Neighborhood Attention Transformer},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2023},
  pages     = {6185-6194}
}

Isomorphism, Normalizing Flows, and Density Estimation: Preserving Relationships Between Data

Steven Walton

Bibtex

@article{walton2023isomorphism,
  title={Isomorphism, Normalizing Flows, and Density Estimation: Preserving Relationships Between Data},
  author={Walton, Steven}
}

Semask: Semantically Masked Transformers for Semantic Segmentation

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

GitHub Link

Bibtex

@InProceedings{Jain_2023_ICCV,
  author    = {Jain, Jitesh and Singh, Anukriti and Orlov, Nikita and Huang, Zilong and Li, Jiachen and Walton, Steven and Shi, Humphrey},
  title     = {SeMask: Semantically Masked Transformers for Semantic Segmentation},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  month     = {October},
  year      = {2023},
  pages     = {752-761}
}

Convmlp: Hierarchical Convolutional MLPs for Vision

Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi

GitHub Link

Bibtex

@InProceedings{Li_2023_CVPR,
  author    = {Li, Jiachen and Hassani, Ali and Walton, Steven and Shi, Humphrey},
  title     = {ConvMLP: Hierarchical Convolutional MLPs for Vision},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month     = {June},
  year      = {2023},
  pages     = {6307-6316}
}

Escaping the Big Data Paradigm with Compact Transformers

Steven Walton, Ali Hassani, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, Humphrey Shi

GitHub Link Pytorch Blog

Bibtex

@misc{hassani2022escapingbigdataparadigm,
  title={Escaping the Big Data Paradigm with Compact Transformers},
  author={Ali Hassani and Steven Walton and Nikhil Shah and Abulikemu Abuduweili and Jiachen Li and Humphrey Shi},
  year={2022},
  eprint={2104.05704},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2104.05704},
}

Visualization as a Service for Scientific Data

David Pugmire, James Kress, Jieyang Chen, Hank Childs, Jong Choi, Dmitry Ganyushin, Berk Geveci, Mark Kim, Scott Klasky, Xin Liang, Jeremy Logan, Nicole Marsaglia, Kshitij Mehta, Norbert Podhorszki, Caitlin Ross, Eric Suchyta, Nick Thompson, Steven Walton, Lipeng Wan, Matthew Wolf

Bibtex

@InProceedings{pugmire_vaas,
  author="Pugmire, David Kress, James Chen, Jieyang Childs, Hank Choi, Jong Ganyushin, Dmitry Geveci, Berk Kim, Mark Klasky, Scott Liang, Xin Logan, Jeremy Marsaglia, Nicole Mehta, Kshitij Podhorszki, Norbert Ross, Caitlin Suchyta, Eric Thompson, Nick Walton, Steven Wan, Lipeng Wolf, Matthew", tor="Nichols, Jeffrey Verastegui, Becky Maccabe, Arthur `Barney' Hernandez, Oscar Parete-Koon, Suzanne Ahearn, Theresa",
  title="Visualization as a Service for Scientific Data",
  booktitle="Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI",
  year="2020",
  publisher="Springer International Publishing",
  address="Cham",
  pages="157--174",
  isbn="978-3-030-63393-6"
}

DATUM: Dotted Attention Temporal Upscaling Method

Steven Walton

Bibtex

@article{waltondatum,
  title={DATUM: Dotted Attention Temporal Upscaling Method},
  author={Walton, Steven}
}

Teaching/TA

Modeling and Simulation

Course : CS 445/545

Winter 2024

Database Processing

Course : CS 451/551

Fall 2024

Machine Learning

Course : 472/572

Spring 2024

Machine Learning

Course : 472/572

Winter 2023

Machine Learning

Course : 472/572

Winter 2022

Advanced Data Structures

Course : 413/513

Winter 2021

Computer Organization

Course : 314

Fall 2020

Intro to Software Engineering

Course : 322

Fall 2018

Curriculum Vitae

Trending Tags