I currently work at Lawrence Livermore National Laboratory as a Computer Scientist and workflow expert. My work is focused in the topic areas of HPC simulation, simulation management, software engineering, and software workflow automation. My primary roles are:
Project Lead and Author of the Maestro Workflow Conductor: Maestro is a Python command line tool and library for specifying, automating, and monitoring software workflows. Users create YAML specifications called “study descriptions” that define a software workflows, which are then concretely expanded based on static variables and parameters. See our README about how to build a basic study or our sample studies.
Maestro GitHub page
Download Star Watch - Maestro on PyPi
- Maestro on ReadtheDocs
- Maestro is part of the RADIUSS effort at LLNL
Maestro GitHub page
- Workflow Expert and Researcher for the RAS protein pilot project (one of three pilots of JDACS4C). Our mission aims at scaling molecular dynamics to long enough timescales to better understand the RAS protein and its role in the development of cancer. Our team has successfully run large scale MD simulation campaigns on Sierra, the second fastest supercomputer on the TOP500 list (as of November 2018). As part of an interdisciplinary team of scientists, my responsibilities include managing the project’s code repositories, providing guidance on workflow/code design, and implementing new features.
Interests and Past Experience
My technical interests include (but are not limited to) Software Engineering, Software Design, Computer Architecture, simulation, simulation and software automation, Object-Oriented Design, Python, and HPC. I initially started with an interest in Computer Architecture, which led me to become a Performance Architect at Intel Corporation. My experiences at Intel maintaining simulators, running large numbers of simulations, and using the results to make architectural assessments allowed me to appreciate the software systems required to perform computational studies. Through these experiences I broadened my interests to simulation software workflows, co-designing and implementing Study Launcher, a workflow launcher that utilized an XML specification. In May 2016, I joined Lawrence Livermore National Laboratory as a Computer Scientist and workflow expert to continue to learn more about computational workflow and automation.
Skills and Proficiencies
Excellent (Go-to tools) ★ ★ ★
Python, Git, GitHub, Bash, LaTeX, SLURM, LSF, Unix
Proficient (Competent and comfortable) ★ ★ ☆
- C++, C, C#, CSS, HTML, Linux
Basic (Essential foundation and basics) ★ ☆ ☆
- Ruby, Java, SQL
Domain Knowledge (Fundamental Concepts)
- Software Engineering, Python, Agile/Scrum Development Methods, Software System Design, Object Oriented Design, Algorithms, Simulation, Workflow automation and tools, HPC, Advanced Computer Architecture
Awards and Recognition
- Lawarence Livermore National Laboratory Science & Technology Excellence in Publication Award (August 2020)
- Best Paper at Supercomputing’19 for A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer [ARTICLE]
- Multiple ASQ Division Awards at LLNL
- Two Intel Team Awards
- Most Outstanding Graduate Award by USF’s CSE Department
- Member of Tau Beta Pi engineering honor society (FL Gamma Chapter)
Ferreira da Silva, R., Casanova, H., Chard, K., Laney, D., Ahn, D., Jha, S., … Wozniak, J. (2021). Workflows Community Summit: Bringing the Scientific Workflows Community Together. Zenodo. https://doi.org/10.5281/zenodo.4606958
Peterson, J. L., Anirudh, R., Athey, K., Bay, B., Bremer, P.-T., Castillo, V., … Yeom, J.-S. (2019). Merlin: Enabling Machine Learning-Ready HPC Ensembles.
Di Natale, F., Bhatia, H., Carpenter, T. S., Neale, C., Schumacher, S. K., Oppelstrup, T., … Ingólfsson, H. I. (2019). A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 57:1–57:16). New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356197
Computational models can define the functional dynamics of complex systems in exceptional detail. However, many modeling studies face seemingly incommensurate requirements: to gain meaningful insights into some phenomena requires models with high resolution (microscopic) detail that must nevertheless evolve over large (macroscopic) length- and time-scales. Multiscale modeling has become increasingly important to bridge this gap. Executing complex multiscale models on current petascale computers with high levels of parallelism and heterogeneous architectures is challenging. Many distinct types of resources need to be simultaneously managed, such as GPUs and CPUs, memory size and latencies, communication bottlenecks, and filesystem bandwidth. In addition, robustness to failure of compute nodes, network, and filesystems is critical. We introduce a first-of-its-kind, massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), which couples a macro scale model spanning micrometer length- and millisecond time-scales with a micro scale model employing high-fidelity molecular dynamics (MD) simulations. MuMMI is a cohesive and transferable infrastructure designed for scalability and efficient execution on heterogeneous resources. A central workflow manager simultaneously allocates GPUs and CPUs while robustly handling failures in compute nodes, communication networks, and filesystems. A hierarchical scheduler controls GPU-accelerated MD simulations and in situ analysis. We present the various MuMMI components, including the macro model, GPU-accelerated MD, in situ analysis of MD data, machine learning selection module, a highly scalable hierarchical scheduler, and detail the central workflow manager that ties these modules together. In addition, we present performance data from our runs on Sierra, in which we validated MuMMI by investigating an experimentally intractable biological system: the dynamic interaction between RAS proteins and a plasma membrane. We used up to 4000 nodes of the Sierra supercomputer, concurrently utilizing over 16,000 GPUs and 176,000 CPU cores, and running up to 36,000 different tasks. This multiscale simulation includes about 120,000 MD simulations aggregating over 200 milliseconds, which is orders of magnitude greater than comparable studies.
Patki, T., Frye, Z., Bhatia, H., Di Natale, F., Glosli, J., Ingolfsson, H., & Rountree, B. (2019). Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS) (pp. 31–39). https://doi.org/10.1109/WORKS49585.2019.00009
Accomplishing the goal of exascale computing under a potential power limit requires HPC clusters to maximize both parallel efficiency and power efficiency. As modern HPC systems embark on a trend toward extreme heterogeneity leveraging multiple GPUs per node, power management becomes even more challenging, especially when catering to scientific workflows with co-scheduled components. The impact of managing GPU power on workflow performance and run-to-run reproducibility has not been adequately studied. In this paper, we present a first-of-its-kind research to study the impact of the two power management knobs that are available on NVIDIA Volta GPUs: frequency capping and power capping. We analyzed performance and power metrics of GPU’s on a top-10 supercomputer by tuning these knobs for more than 5,300 runs in a scientific workflow. Our data found that GPU power capping in a scientific workflow is an effective way of improving power efficiency while preserving performance, while GPU frequency capping is a demonstrably unpredictable way of reducing power consumption. Additionally, we identified that frequency capping results in higher variation and anomalous behavior on GPUs, which is counterintuitive to what has been observed in the research conducted on CPUs.
Pauli, E. T., Aschwanden, P. D., Laney, D. E., Dahlgren, T., Semler, J. A., Di Natale, F., … Administration, U. S. D. O. E. N. N. S. (2018). Simulation INsight and Analysis. https://doi.org/10.11578/dc.20190715.10
Carpenter, T. S., López, C. A., Neale, C., Montour, C., Ingólfsson, H. I., Di Natale, F., … Gnanakaran, S. (2018). Capturing Phase Behavior of Ternary Lipid Mixtures with a Refined Martini Coarse-Grained Force Field. Journal of Chemical Theory and Computation, 14(11), 6050–6062. https://doi.org/10.1021/acs.jctc.8b00496
Di Natale, F. (2017). Maestro Workflow Conductor. Retrieved from https://github.com/LLNL/maestrowf
MaestroWF is a Python tool and software package for loading YAML study specifications that represents a simulation campaign. The package is capable of parameterizing a study, pulling dependencies automatically, formatting output directories, and managing the flow and execution of the campaign. MaestroWF also provides a set of abstracted objects that can also be used to develop user specific scripts for launching simulation campaigns.
For the most up-to-date publications, please see my Google Scholar profile
Invited Talks
Di Natale, F., Bhatia, H., Carpenter, T. S., Neale, C., Schumacher, S. K., Oppelstrup, T., … Ingólfsson, H. I. (2019). MuMMI: Massively Parallel Multiscale Simulation for Modeling RAS Protein and ML Workflow Challenges. In Supercomputing '19.
Di Natale, F., Bhatia, H., Ingólfsson, H. I., Streitz, F., & Nissley, D. V. (2019). MuMMI: Massively Parallel Multiscale Simulation for Modeling RAS Protein and ML Workflow Challenges. Data Science Workshop.
This session will cover the challenges for many applications of scientific machine learning related to both large scale workflows and incorporating experimental data with scientific simulations. Each of the talks identified will cover different aspects of these challenges and how they relate to multiple scientific areas.
Chowdhury, F., Di Natale, F., Moody, A., Gonsiorowski, E., Mohror, K., & Yu, W. (2019). Understanding I/O Behavior in Scientific Workflows on High Performance Computing Systems. Supercomputing'19.
Community Involvement
- Review Panelist for the Department of Energy Early-Career Research Program (ECRP) 2021