Gateway Challenges and Evolution: Current Trends and Practices

Gateway Challenges and Evolution: Current Trends and Practices

Gateway Challenges and Evolution: Current Trends and Practices in US Science Gateways Mark A. Miller San Diego Supercomputer Center Talk overview Part 1. Overview of US Gateway development efforts Part 2. Overview of Issues faced by a growing Gateway Talk overview Part 1. Overview of US Gateway development efforts

Part 2. Overview of Issues faced by a growing Gateway e Set of Functional Science Gateways is really, really large e Set of Functional Science Gateways is really, really large The Nucleic Acids Research Bioinformatics Links Directory now contains : 134 resources 455 databases 1205 web server tools

e Set of Functional Science Gateways is really, really large The Nucleic Acids Research Bioinformatics Links Directory now contains : 134 resources 455 databases 1205 web server tools And criteria for inclusion in that list are quite restrictive! The Set of US projects that explicitly call themselves Science Gateways is almost managable. 8 Science Gateways sponsored by the National Energy Research

Scientific Computing Center, US Dept of Energy name Deep Sky The Materials Project QCD link http://deepskyproject.org/ http://materialsproject.org/ http://qcd.nersc.gov/ CXIDB http://cxidb.org/

http://portal.nersc.gov/project/ 20C_Reanalysis/ http://portal.nersc.gov/project/ dayabay/ 20th Century Reanalysis Daya Bay Earth System Grid Configuration NOVA http://esg.nersc.gov/esgf-web-fe/ https://portal-auth.nersc.gov/nova/

login/?next=/nova/ subject image querying, telescope data materials quuery lattice guage theory coherent xray imaging data bank global weather trends Dayabay Neutrino Detector Gateway Earth System Grid Climate Gateway and Data-node NERSC online (Vienna ab initio

Simulation Package jobs data X X X computation X X X X

X 35 or so XSEDE Science Gateways, sponsored by the US National Science Foundation Current ProjectComputational Poertalt Computational Resource Access Asteroseismic Modeling Portal Center for Multiscale Modeling of Atmospheric Processes CIG Science Gateway for the Geodynamics Community CIPRES Portal for inference of large phylogenetic trees Community Climate System Model (CCSM) TeraGrid Gateway Computational Chemistry Grid CyberGIS Gateway

Cyberinfrastructure for End-to-End Environmental Exploration Port al Developing Social Informatics Data Grid (SIDGrid) Stellar Astronomy and Astrophysics Atmospheric Sciences Geophysics Systematic and Population Biology Atmospheric Sciences Chemistry Geosciences Earth Sciences Language, Cognition, and Social Behavior

EPSCoR Desktop to TeraGrid EcoSystem Systematic and Population Biology High-Resolution Modeling of Hydrodynamic Experiments with Ultra Biophysics Scan Iplant Agave Foundation API Integrative Biology and Neuroscience National Biomedical Computation Resource Network for Computational Nanotechnology and nanoHUB Network for Earthquake Engineering Simulation Neuroscience Gateway Integrative Biology and Neuroscience Emerging Technologies Initiation Earthquake Hazard Mitigation

Neurosciences Neutron Science TeraGrid Gateway OGCE Science Gateway Portal Materials Research Information Technology and Organizations ROBETTA: Automated Prediction of Protein Structure and Interacti Molecular Biosciences ons SCEC Earthworks Project Tera 3D) Seismology Social Science Gateway Social and Economic Science

TeraGrid Geographic Information Science Gateway Geography and Regional Science VLab - Virtual Laboratory for Earth and Planetary Materials Materials Research Data Portals Indiana University Centralized Life Sciences Data Genetics and Nucleic Acids Dark Energy Survey Data Management Extragalactic Astronomy and Cosmology Biodrugscore: A portal for customized scoring and Biochemistry and Molecular Structure and Function ranking Online of molecules docked to the human proteo Engineering Infrastructure Development Globus me High Resolution Daily Temperature and Precipitati Atmospheric Sciences

on Data formodeling, the Northeast United Isoscapes analysis andStates prediction (IsoM Environmental Biology AP) Linked Environments for Atmospheric Discovery Atmospheric Sciences Massive Pulsar Surveys using the Arecibo L-band Astronomical Sciences Feed Array (ALFA) Purdue Environmental

Data Portal Earth Sciences QuakeSim Geophysics Science Gateway for Diffraction Facilities, Data an Chemistry d Methods The Earth System Grid Global Atmospheric Research The set of US Science Gateway efforts that: attempt to advance the architecture and methodology of Gateway creation

attempt to advance practices in Gateway scalability is a set we can hope to cover in the context of this talk Generation 1 Gateway software: The archetypal Generation 1 Gateway is a three tiered web application. It has a HTML/jsp front end, uses Globus for submission. (early examples: GridSpeed, GPDK) Generation 1 Gateways: Many highly successful Science Gateways in

the US today evolved from the Gen 1 concept. Generation 1 Gateways: Important evolutionary adaptations: Addition of Content Management Systems Provide Descriptions of a Command Line tools interface and workflow (JSON, XML, Rappture) so users can contribute new interfaces/tools Add support for data sharing Provide for Social Networking Deploy pluggable job submission tools allow many choices for submission

Generation 1 Gateways: Platform: HubZero Highly developed tools for sharing and collaborating Joomla! CMS Provides user with a VM instance, provides lots of flexibility in interaction. Rappture toolkit allows user creation of tools with fluid, interactive interfaces Emphasis on modelling and demonstrations Relatively lightweight computing within the VM Some groups have connected the VM to NFS/job submission tools 481.906 234.949 234.949 150.883 33,620

33,620 60.523 32.580 32.580 33.447 23.425 23.425 ~ 800.000 23.444 5.385 14.667

3.393 12.550 1.224 10.995 1.863 10.490 1.328

10.358 1.879 9.792 3.141 9.666 1.635 M. McLennan Hubzero instances can be

purchased: Generation 1 Gateways: Platform: Galaxy Project historically encourages interaction between developers and users. Distributable Galaxy package not originally designed for submission to remote

resources, but this capability has been developed. Cloud instances available. Supports flexible addition of command line tools. Main server has 30.000 registered users/ 160.000 jobs per month. Many, many local Galaxy installations around the world. Galaxy is highly customizable, but is presented only as a Genomics Application Generation 1 Gateways: Platform: Workbench Framework

Emphasis on submission of long running command line jobs to remote HPC resources. Supports flexible addition of command line tools. Main server has 6.000 registered users/8.000 jobs per month/supported 600+ publications. Brings 29% of all XSEDE users each quarter, who use 0.7% of all XSEDE resources. 3 other Gateways now use the Workbench Framework. Generation 2 Gateway software: The archetypal Generation 2 Gateway: distributable portlet container provides the essential login/user management

functions via portlets out of the box. new portlets can be added scalably and can be shared between Gateways flexible methods for job submission the GUI appearance is customizable by the user (early examples: GridSphere; JetSpeed, IBM WebSphere) Generation 2 Gateways: Many US Science Gateways were built from the Gen 2 concept using Gridsphere. These include: LEAD, NCBR, TGIS, VLab, IsoMAP, QuakeSIM, PEDP, CCSM.

Gridsphere project has ended, but some of these Gateways are still functioning Generation 2 Gateways: Important evolutionary events: The Open Grid Computing Environment software (OGCE) implements access to infrastructure Web services via portlets. The goal is to make it possible to centralize

infrastructure web services. OGCE established a repository of web service Portlets. Generation 2 Gateways: The VLAB portal, built on Gridsphere Generation 3 Gateway software: The archetypal Generation 3 Gateway uses a GUI/presentation layer created by the Gateway developer to consume infrastructure services

made available via public API. (current examples: OGCE; iPlant Foundational API; NEWT) Domain Gateway Browse r Interfac e Domain Gateway

Domain Gateway Web Server Domain Gateway Applica tion Manage r

Domain Gateway Domain Gateway Domain Gateway Domain Gateway Browse

r Interfac e Domain Gateway Domain Gateway Web Server Domain

Gateway Domain Gateway Domain Gateway Domain Gateway Applica tion Manage

r All middleware service provided from a single production location Generation 3 Gateways: Platform: OCGE Tools for creating portals using centralized infrastructure web services OCGE Services currently support several production Gateways. OCGE Project has committed to open governance: Apache RAVE for Interface Apache Airavata for middleware Home page: http://www.collab-ogce.org/ogce/index.php/Main_Page

Generation 3 Gateways: Platform: iPlant AGAVE Foundational API Provides centralized public web services for access to XSEDE data and compute resources https://foundation.iplantcollaborative.org/ Generation 3 Gateways: Platform: NERSC Web Toolkit (NEWT) Provides centralized public web services for access to NERSC data and compute resources https://newt.nersc.gov/

Generation 3 Gateways: The Gen 3 concept is still in rapid evolution, with results to be determined. Another Important Project. Are you building websites that serve your science discipline? Do you wish you could connect with and learn from others who are doing the same thing? We are building an institute to serve youand others like youwith resources, services, experts, and ideas for creating and sustaining science gateways. Sign up to join the conversation:

http://sciencegateways.org/volunteer/ science gateway /s ns gt w/ n. 1. an online community space for science and engineering research and education. 2. a Web-based resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline. Another Important Project. Folks from the OGCE, iPlant, and Hubzero projects have partnered with Nancy Wilkins-Diehr of SDSC to create a Science Gateways Institute. They have received preliminary funding to prepare a proposal for submission in 2014.

Assist with the entire lifecycle of a gateway: Business plan development and review Development environment, consulting, documentation and software recommendations Software repositories

Software engineering facilities Software assessment services like Open Source Software Advisory Service, Apache assessment service, Software Sustainability Institute (UK) Build-and-test facilities Hosting service Offering gateways expertise in the

following areas: Usability assessment Licensing Sustainability Project management Security Summary: The most successful US Gateways in production today are evolved versions of Gen 1 architectures.

The Gen 2 concept of the portlet container did not get sufficient traction in the US. It was hampered by high overhead and unmet expectations leading to low adoption. The Gen 3 concept is still evolving rapidly in several projects. Centralized infrastructure web services are used by several production Gateways, though this is not yet a generic solution. Success of the Gen 3 concept will depend on its rate of adoption and the ability to

recruit/engage a critical mass of developers. The ScienceGateways.org project may help bring focus to gateway development and sustainability efforts. Talk overview Part 1. Overview of US Gateway development efforts Part 2. Overview of Issues faced by a growing Gateway Phylogenetics is the study of diversification of

life on the planet Earth, both past and present, and the relationships among living things through time ? Evolutionary relationships can (for the most part) be represented as a directed acyclic graph. Evolutionary relationships can be inferred from DNA sequence comparisons: Align sequences to determine evolutionary equivalence: Infer evolutionary relationships based on some set of assumptions:

Evolutionary relationships can be inferred from DNA sequence comparisons: Align sequences to determine evolutionary equivalence: Infer evolutionary relationships based on some set of assumptions: Tree inference is NP hard, even with heuristics, the codes are compute-intense; desktop computing is no longer adequate. Workflow for the CIPRES Gateway: CIPRES Gateway

Assemble Sequences Upload to Portal Store Run Alignment Run Tree Inference Post-Tree Analysis

Download Make all command line options available Make parallel codes available Core (thousands) SUshours (thousands) What if you build it and too many people come?

4000 3500 3000 2500 2000 1500 1000 500 ?!? Initial allocation

2 4Month 6 810 To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively Efficiency of resource use: All codes are benchmarked and configured for good efficiency automatically, based on user input

Make the system robust to system outages, so running jobs are not lost when communication between the server and the compute resource are severed. (saved 7% of long jobs) Monitor resource use for anomalous spikes in resource consumption per job (e.g. identify and eliminate file system incompatibilities with code). To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively Monitor resource distribution: Identify usage patterns

Usage In the Reporting Period Sept, 2010 May, 2011 Core hours used % of Users % total SU 0 30 K 97 45 30 K 300,000 K

3 55 Usage In the Reporting Period Sept, 2010 May, 2011 Core hours used % of Users % total SU 0 30 K

97 45 30 K 300,000 K 3 55 We need to monitor individual users, because we want all XSEDE users to be subject to the same level of peer review. Monitor resource distribution:

Identify usage patterns Establish a Fair Use policy Establish a Fair Use Policy Users in the US are permitted to use 50,000 core hours from the community allocation annually. Users at non-US institutions can use up to 30,000 core hours annually. Users at US institutions can apply for a personal XSEDE allocation if they require more core hours. Monitor resource distribution: Identify usage patterns Establish a Fair Use policy

Create tools to Enforce Fair Use Policy Monitor resource distribution: Identify usage patterns Establish a Fair Use policy Create tools to Enforce Fair Use Policy Tools to track usage by each user Tools to disable submission from over-active accounts Tools to notify users when they reach thresholds of use Monitor resource distribution: Identify usage patterns Establish a Fair Use policy Create tools to Enforce Fair Use Policy

Engage Users in Monitoring their own usage Help users track their resource consumption: Notify users of their usage level Create a conditional warning element in the interface XML Core Hours Consumed Impact of new policies/tools on user demographics: 2010/11 2011/12

Cumulative Core Hours 9000 1600 8000 1400 7000 1200

6000 1000 5000 800 4000 600 3000

400 2000 200 1000 2010 2011 2012

2013 Year 2 4 6 8 10 12

14 16 SUs / month (in thousands) Jobs submitted / month 12 24Usage Dec 362009 Feb 2013 Impact of New Policies/tools on

users submit 160 more jobs each month 29,000 more core hours requested each month. Projected use for 2013 - 2014 is 20 million core hours

Impact of Policy on Usage Dec 2009 Feb 2013 800 Users/Month Total Users 600 Repeat Users 400 New Users

200 2010 2012 2011 2013 Year 12

24 36 Impact of Policy on Usage Dec 2009 Feb 2013 Growth in resource usage is driven primarily by new Total Users 600 users, not by waste or high use by a few users. Repeat Users Users/Month

800 400 New Users 200 2010 2012 2011

2013 Year 12 24 36 Core (thousands) SUshours (thousands)

What if you build it and too many people come? 4000 3500 3000 2500 2000 1500 1000 500 ?!? Initial

allocation 2 4Month 6 810 What if you build it and too many people come? At 14 million core hours/year, the 2012/2013 CSG brings 29% of all XSEDE users, and consumes only about 0.7% of allocatable XSEDE resources. BUT.. The CIPRES use case is different from the typical XSEDE resource request: Most tree inference codes scale to no more than 64 cores.

20% of CSG users are students in classes, so queue time matters 88% of CSG jobs complete within 12 hours, so queue time matters 3% of CSG jobs run for more then 1 week and most codes have no restart capability, so run times of up to 334 hours are required. These jobs are not a good fit for the intent of the large XSEDE machines Important Policy Moment: Based (in part) on our use case, the US NSF created the Trestles cluster to provide On demand computing (Thanks, NSF!): Trestles is managed and allocated to keep queue depth near zero Administrators allow CSG to run jobs for 334 hours The machine is significant in size, but small jobs (64 cores or

less) are welcomed Important Policy Moment: Based (in part) on our use case, the US NSF created the Trestles cluster to provide On demand computing (Thanks, NSF!): Trestles is managed and allocated to keep queue depth near zero Administrators allow CSG to run jobs for 334 hours The machine is significant in size, but small jobs (64 cores or less) are welcomed CIPRES usage now amounts to 21% of the entire allocatable Trestles machine.what if there were 4 Gateways as successful as CIPRES? or for that matter, what about CIPRES in 2017?

If Gateways are valued, we will need supportive policy decisions at the National / International Level about HPC resource allocation. More investment by US in on demand HPC computing? Accommodation of jobs that scale to a small number of cores Investment by other countries in on demand HPC as a fraction of their total HPC resources, perhaps as a consortium? Otherwise: Decrease user base by eliminating non- US users? (about half of the total user base) Require fee-for-service for non-US users, high-end users, all users?

How to keep the CIPRES Gateway operating? Annual operating budget ~ $200,000 per year to keep the server functioning 20 million cores hours of compute time for 2013-2014 (generously provided by NSF) Served 2,800+ scientists in 2012/2013 allocation year 250+ publications enabled in 2012/2013 58+ instructors supported in 2012/2013 There is a strong dynamic tension between innovation and infrastructure I think what your Gateway has accomplished is impressive, but unless your proposal describes a plan to create new capabilities that do not exist anywhere, you will not get the scores required to win an award

Project Officer, US NSF Division of Biological Infrastructure There is a strong dynamic tension between innovation and infrastructure I think what your Gateway has accomplished is impressive, but unless your proposal describes a plan to create new capabilities that do not exist anywhere, you will not get the scores required to win an award In other words, were having big impact, but NSF isnt going to pay us to make the CIPRES software better for users (or even to continue operations).now what? Project Officer, US NSF Division of Biological Infrastructure

Survival Strategy 1. Innovate! Workflow for the CIPRES Gateway: CIPRES Gateway Assemble Sequences Upload to Portal Store Run Alignment

Run Tree Inference Post-Tree Analysis Download These are highly-evolved desktop/browser applications That have no tree inference tools or are under powered: raxmlGUI Influenza DB

These projects offer powerful and distinct user experiences, and are interested in incorporating powerful tree inference tools into an existing application: RESTful Services will put CIPRES in many environments XSEDE CSG Parallel codes

raxmlGUI RESTful Services will put CIPRES in many environments XSEDE CSG Parallel codes raxmlGUI With a variety of new user interfaces!

We will be adding complexity, with significant risk, and significant potential benefit. Stay tuned There is a strong dynamic tension between innovation and infrastructure Developers may address new research topics in the course of gateway design in order to further their academic goals. Resulting gateways may be more complex than necessary, less reliable, and may not meet the goals of the domain science community for whom they were designed. Focus group participants noted that sometimes simple tools are all that is needed to enable cutting edge science, but [Gateway developers] make the easy things hard.

Wilkins-Diehr, N., and Lawrence, K. A. (2010) in Gateway Computing Environments Workshop (GCE), 2010 There is a strong dynamic tension between innovation and infrastructure To stay federally funded we must continually innovate. This is not necessarily what users want or need. Wilkins-Diehr, N., and Lawrence, K. A. (2010) in Gateway Computing Environments Workshop (GCE), 2010 Survival Strategy 2. Identify new funding models

Lets start by clarifying the value proposition: Random user feedback: It is hard for me to imagine how I could work at a reasonable pace without this resource, especially when things like MS or grant submission deadlines loom. Gateways add value to Universities by making their professors more competitive Random user feedback: It is an easy-to-use cluster to run BEAST analyses in a short time. This allows students to run analyses that actually converge in a single class. I found it is important to be able to let the student explore the analysis 'all the way', i.e. not just show the principle but actually let them run an entire Markov chain and let them evaluate the results. For that I found

that having access to the CIPRES Science Gateway to be crucial. Gateways add value to Universities by making their classroom instruction better The CSG has supported researchers funded by awards from: In the US: 14 governmental agencies 26 non-governmental organizations 25 Universities On 5 other continents: 63 governmental agencies 10 non-governmental organizations 30 Universities Gateways add value to many other organizations as well.

To preserve the spirit of Science Gateways, we must find a way to pitch the value proposition above the level of individual investigators. Above-campus models have been used successfully by others: See the Kuahli foundation (http://www.kuali.org/) as an example. Can such models be crafted for Gateways? To be continued. Acknowledgements: CIPRES Science Gateway

Terri Schwartz Hybrid Code Development Wayne Pfeiffer Alexandros Stamatakis XSEDE Implementation Support Nancy Wilkins-Diehr Doru Marcusiu Leo Carson Mahidhar Tatineni Workbench Framework: ` Terri Schwartz Paul Hoover

Lucie Chan Jeremy Carver

Recently Viewed Presentations

  • PowerPoint Slides for Professors Spring 2010 Version This

    PowerPoint Slides for Professors Spring 2010 Version This

    Munich Re, Swiss Re, Berkshire Hathaway…, Mega-mergers and acquisitions internationally. Reinsurance departments (of insurance companies) Directly. Through underwriting agents. Suppliers of Reinsurance. Pools.
  • AN INTRODUCTION TO CRIMINAL LAW Chapter 1 THE

    AN INTRODUCTION TO CRIMINAL LAW Chapter 1 THE

    Not recognized as criminal unless included within statutory law. Each state define crimes and penalties applicable to their state. Names and citations may vary. Recorded officially in publications. Criminal codes or penal codes. Distinguishes substantive and procedural law. Sources of...
  • Instructions: Examine the Political Cartoon Below That ...

    Instructions: Examine the Political Cartoon Below That ...

    Sarcasm: verbal irony; typically a sharp, harsh, or bitter remark. The purpose of a satire is to hold a mirror up to society and inspire a change. ... Pope used satire to attack the immorality and bad taste of the...
  • Data Visualization - University of Wisconsin-Platteville

    Data Visualization - University of Wisconsin-Platteville

    Currently, the internet presents a highly disorganized collage of information. Many of us are working in an information-soaked world.There is too much of everything. We are subject everywhere to a sensory overload of images, bombarded with information; in magazines and...
  • Cardiac Conduction System

    Cardiac Conduction System

    An antagonist muscle is one that works in opposition to the agonist, so when the biceps is contracting, the triceps is lengthening and acting as the antagonist. Agonists, Antagonists Continued When one muscle is acting as an agonist and the...
  • Housing Fraud - Awareness Training

    Housing Fraud - Awareness Training

    Children services must comply with current legislation, as all others in the council should. The overall ethos should be about protecting both the families we work with from misuse of powers in this sensitive area and protecting staff and the...
  • Marketing International Chapitre 5

    Marketing International Chapitre 5

    Marketing International Chapitre 5 Politique internationale de produit 1. Le phénomène de globalisation 1.1. Globalisation des modes de consommation Pour des catégories de produits larges, sur de longues périodes de temps, à travers : l'imitation de modes de consommation étrangers...
  • The Joy Luck Club Discussion Questions The Joy

    The Joy Luck Club Discussion Questions The Joy

    Compare Jing-Mei and Waverly Jong. What is the greatest difference between the two? When SuyuanWoo says, ""She is like this crab . . . always walking sideways, moving crooked. You can make your legs go the other way," what does...