Gateway Challenges and Evolution: Current Trends and Practices
Gateway Challenges and Evolution: Current Trends and Practices in US Science Gateways Mark A. Miller San Diego Supercomputer Center Talk overview Part 1. Overview of US Gateway development efforts Part 2. Overview of Issues faced by a growing Gateway Talk overview Part 1. Overview of US Gateway development efforts
Part 2. Overview of Issues faced by a growing Gateway e Set of Functional Science Gateways is really, really large e Set of Functional Science Gateways is really, really large The Nucleic Acids Research Bioinformatics Links Directory now contains : 134 resources 455 databases 1205 web server tools
e Set of Functional Science Gateways is really, really large The Nucleic Acids Research Bioinformatics Links Directory now contains : 134 resources 455 databases 1205 web server tools And criteria for inclusion in that list are quite restrictive! The Set of US projects that explicitly call themselves Science Gateways is almost managable. 8 Science Gateways sponsored by the National Energy Research
Scientific Computing Center, US Dept of Energy name Deep Sky The Materials Project QCD link http://deepskyproject.org/ http://materialsproject.org/ http://qcd.nersc.gov/ CXIDB http://cxidb.org/
http://portal.nersc.gov/project/ 20C_Reanalysis/ http://portal.nersc.gov/project/ dayabay/ 20th Century Reanalysis Daya Bay Earth System Grid Configuration NOVA http://esg.nersc.gov/esgf-web-fe/ https://portal-auth.nersc.gov/nova/
login/?next=/nova/ subject image querying, telescope data materials quuery lattice guage theory coherent xray imaging data bank global weather trends Dayabay Neutrino Detector Gateway Earth System Grid Climate Gateway and Data-node NERSC online (Vienna ab initio
Simulation Package jobs data X X X computation X X X X
X 35 or so XSEDE Science Gateways, sponsored by the US National Science Foundation Current ProjectComputational Poertalt Computational Resource Access Asteroseismic Modeling Portal Center for Multiscale Modeling of Atmospheric Processes CIG Science Gateway for the Geodynamics Community CIPRES Portal for inference of large phylogenetic trees Community Climate System Model (CCSM) TeraGrid Gateway Computational Chemistry Grid CyberGIS Gateway
Cyberinfrastructure for End-to-End Environmental Exploration Port al Developing Social Informatics Data Grid (SIDGrid) Stellar Astronomy and Astrophysics Atmospheric Sciences Geophysics Systematic and Population Biology Atmospheric Sciences Chemistry Geosciences Earth Sciences Language, Cognition, and Social Behavior
EPSCoR Desktop to TeraGrid EcoSystem Systematic and Population Biology High-Resolution Modeling of Hydrodynamic Experiments with Ultra Biophysics Scan Iplant Agave Foundation API Integrative Biology and Neuroscience National Biomedical Computation Resource Network for Computational Nanotechnology and nanoHUB Network for Earthquake Engineering Simulation Neuroscience Gateway Integrative Biology and Neuroscience Emerging Technologies Initiation Earthquake Hazard Mitigation
Neurosciences Neutron Science TeraGrid Gateway OGCE Science Gateway Portal Materials Research Information Technology and Organizations ROBETTA: Automated Prediction of Protein Structure and Interacti Molecular Biosciences ons SCEC Earthworks Project Tera 3D) Seismology Social Science Gateway Social and Economic Science
TeraGrid Geographic Information Science Gateway Geography and Regional Science VLab - Virtual Laboratory for Earth and Planetary Materials Materials Research Data Portals Indiana University Centralized Life Sciences Data Genetics and Nucleic Acids Dark Energy Survey Data Management Extragalactic Astronomy and Cosmology Biodrugscore: A portal for customized scoring and Biochemistry and Molecular Structure and Function ranking Online of molecules docked to the human proteo Engineering Infrastructure Development Globus me High Resolution Daily Temperature and Precipitati Atmospheric Sciences
on Data formodeling, the Northeast United Isoscapes analysis andStates prediction (IsoM Environmental Biology AP) Linked Environments for Atmospheric Discovery Atmospheric Sciences Massive Pulsar Surveys using the Arecibo L-band Astronomical Sciences Feed Array (ALFA) Purdue Environmental
Data Portal Earth Sciences QuakeSim Geophysics Science Gateway for Diffraction Facilities, Data an Chemistry d Methods The Earth System Grid Global Atmospheric Research The set of US Science Gateway efforts that: attempt to advance the architecture and methodology of Gateway creation
attempt to advance practices in Gateway scalability is a set we can hope to cover in the context of this talk Generation 1 Gateway software: The archetypal Generation 1 Gateway is a three tiered web application. It has a HTML/jsp front end, uses Globus for submission. (early examples: GridSpeed, GPDK) Generation 1 Gateways: Many highly successful Science Gateways in
the US today evolved from the Gen 1 concept. Generation 1 Gateways: Important evolutionary adaptations: Addition of Content Management Systems Provide Descriptions of a Command Line tools interface and workflow (JSON, XML, Rappture) so users can contribute new interfaces/tools Add support for data sharing Provide for Social Networking Deploy pluggable job submission tools allow many choices for submission
Generation 1 Gateways: Platform: HubZero Highly developed tools for sharing and collaborating Joomla! CMS Provides user with a VM instance, provides lots of flexibility in interaction. Rappture toolkit allows user creation of tools with fluid, interactive interfaces Emphasis on modelling and demonstrations Relatively lightweight computing within the VM Some groups have connected the VM to NFS/job submission tools 481.906 234.949 234.949 150.883 33,620
10.358 1.879 9.792 3.141 9.666 1.635 M. McLennan Hubzero instances can be
purchased: Generation 1 Gateways: Platform: Galaxy Project historically encourages interaction between developers and users. Distributable Galaxy package not originally designed for submission to remote
resources, but this capability has been developed. Cloud instances available. Supports flexible addition of command line tools. Main server has 30.000 registered users/ 160.000 jobs per month. Many, many local Galaxy installations around the world. Galaxy is highly customizable, but is presented only as a Genomics Application Generation 1 Gateways: Platform: Workbench Framework
Emphasis on submission of long running command line jobs to remote HPC resources. Supports flexible addition of command line tools. Main server has 6.000 registered users/8.000 jobs per month/supported 600+ publications. Brings 29% of all XSEDE users each quarter, who use 0.7% of all XSEDE resources. 3 other Gateways now use the Workbench Framework. Generation 2 Gateway software: The archetypal Generation 2 Gateway: distributable portlet container provides the essential login/user management
functions via portlets out of the box. new portlets can be added scalably and can be shared between Gateways flexible methods for job submission the GUI appearance is customizable by the user (early examples: GridSphere; JetSpeed, IBM WebSphere) Generation 2 Gateways: Many US Science Gateways were built from the Gen 2 concept using Gridsphere. These include: LEAD, NCBR, TGIS, VLab, IsoMAP, QuakeSIM, PEDP, CCSM.
Gridsphere project has ended, but some of these Gateways are still functioning Generation 2 Gateways: Important evolutionary events: The Open Grid Computing Environment software (OGCE) implements access to infrastructure Web services via portlets. The goal is to make it possible to centralize
infrastructure web services. OGCE established a repository of web service Portlets. Generation 2 Gateways: The VLAB portal, built on Gridsphere Generation 3 Gateway software: The archetypal Generation 3 Gateway uses a GUI/presentation layer created by the Gateway developer to consume infrastructure services
made available via public API. (current examples: OGCE; iPlant Foundational API; NEWT) Domain Gateway Browse r Interfac e Domain Gateway
Domain Gateway Web Server Domain Gateway Applica tion Manage r
r All middleware service provided from a single production location Generation 3 Gateways: Platform: OCGE Tools for creating portals using centralized infrastructure web services OCGE Services currently support several production Gateways. OCGE Project has committed to open governance: Apache RAVE for Interface Apache Airavata for middleware Home page: http://www.collab-ogce.org/ogce/index.php/Main_Page
Generation 3 Gateways: Platform: iPlant AGAVE Foundational API Provides centralized public web services for access to XSEDE data and compute resources https://foundation.iplantcollaborative.org/ Generation 3 Gateways: Platform: NERSC Web Toolkit (NEWT) Provides centralized public web services for access to NERSC data and compute resources https://newt.nersc.gov/
Generation 3 Gateways: The Gen 3 concept is still in rapid evolution, with results to be determined. Another Important Project. Are you building websites that serve your science discipline? Do you wish you could connect with and learn from others who are doing the same thing? We are building an institute to serve youand others like youwith resources, services, experts, and ideas for creating and sustaining science gateways. Sign up to join the conversation:
http://sciencegateways.org/volunteer/ science gateway /s ns gt w/ n. 1. an online community space for science and engineering research and education. 2. a Web-based resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline. Another Important Project. Folks from the OGCE, iPlant, and Hubzero projects have partnered with Nancy Wilkins-Diehr of SDSC to create a Science Gateways Institute. They have received preliminary funding to prepare a proposal for submission in 2014.
Assist with the entire lifecycle of a gateway: Business plan development and review Development environment, consulting, documentation and software recommendations Software repositories
Software engineering facilities Software assessment services like Open Source Software Advisory Service, Apache assessment service, Software Sustainability Institute (UK) Build-and-test facilities Hosting service Offering gateways expertise in the
following areas: Usability assessment Licensing Sustainability Project management Security Summary: The most successful US Gateways in production today are evolved versions of Gen 1 architectures.
The Gen 2 concept of the portlet container did not get sufficient traction in the US. It was hampered by high overhead and unmet expectations leading to low adoption. The Gen 3 concept is still evolving rapidly in several projects. Centralized infrastructure web services are used by several production Gateways, though this is not yet a generic solution. Success of the Gen 3 concept will depend on its rate of adoption and the ability to
recruit/engage a critical mass of developers. The ScienceGateways.org project may help bring focus to gateway development and sustainability efforts. Talk overview Part 1. Overview of US Gateway development efforts Part 2. Overview of Issues faced by a growing Gateway Phylogenetics is the study of diversification of
life on the planet Earth, both past and present, and the relationships among living things through time ? Evolutionary relationships can (for the most part) be represented as a directed acyclic graph. Evolutionary relationships can be inferred from DNA sequence comparisons: Align sequences to determine evolutionary equivalence: Infer evolutionary relationships based on some set of assumptions:
Evolutionary relationships can be inferred from DNA sequence comparisons: Align sequences to determine evolutionary equivalence: Infer evolutionary relationships based on some set of assumptions: Tree inference is NP hard, even with heuristics, the codes are compute-intense; desktop computing is no longer adequate. Workflow for the CIPRES Gateway: CIPRES Gateway
Assemble Sequences Upload to Portal Store Run Alignment Run Tree Inference Post-Tree Analysis
Download Make all command line options available Make parallel codes available Core (thousands) SUshours (thousands) What if you build it and too many people come?
2 4Month 6 810 To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively Efficiency of resource use: All codes are benchmarked and configured for good efficiency automatically, based on user input
Make the system robust to system outages, so running jobs are not lost when communication between the server and the compute resource are severed. (saved 7% of long jobs) Monitor resource use for anomalous spikes in resource consumption per job (e.g. identify and eliminate file system incompatibilities with code). To optimize resource use: Ensure resource use is as efficient as possible Make sure resources are used effectively Monitor resource distribution: Identify usage patterns
Usage In the Reporting Period Sept, 2010 May, 2011 Core hours used % of Users % total SU 0 30 K 97 45 30 K 300,000 K
3 55 Usage In the Reporting Period Sept, 2010 May, 2011 Core hours used % of Users % total SU 0 30 K
97 45 30 K 300,000 K 3 55 We need to monitor individual users, because we want all XSEDE users to be subject to the same level of peer review. Monitor resource distribution:
Identify usage patterns Establish a Fair Use policy Establish a Fair Use Policy Users in the US are permitted to use 50,000 core hours from the community allocation annually. Users at non-US institutions can use up to 30,000 core hours annually. Users at US institutions can apply for a personal XSEDE allocation if they require more core hours. Monitor resource distribution: Identify usage patterns Establish a Fair Use policy
Create tools to Enforce Fair Use Policy Monitor resource distribution: Identify usage patterns Establish a Fair Use policy Create tools to Enforce Fair Use Policy Tools to track usage by each user Tools to disable submission from over-active accounts Tools to notify users when they reach thresholds of use Monitor resource distribution: Identify usage patterns Establish a Fair Use policy Create tools to Enforce Fair Use Policy
Engage Users in Monitoring their own usage Help users track their resource consumption: Notify users of their usage level Create a conditional warning element in the interface XML Core Hours Consumed Impact of new policies/tools on user demographics: 2010/11 2011/12
14 16 SUs / month (in thousands) Jobs submitted / month 12 24Usage Dec 362009 Feb 2013 Impact of New Policies/tools on
users submit 160 more jobs each month 29,000 more core hours requested each month. Projected use for 2013 - 2014 is 20 million core hours
Impact of Policy on Usage Dec 2009 Feb 2013 800 Users/Month Total Users 600 Repeat Users 400 New Users
200 2010 2012 2011 2013 Year 12
24 36 Impact of Policy on Usage Dec 2009 Feb 2013 Growth in resource usage is driven primarily by new Total Users 600 users, not by waste or high use by a few users. Repeat Users Users/Month
800 400 New Users 200 2010 2012 2011
2013 Year 12 24 36 Core (thousands) SUshours (thousands)
What if you build it and too many people come? 4000 3500 3000 2500 2000 1500 1000 500 ?!? Initial
allocation 2 4Month 6 810 What if you build it and too many people come? At 14 million core hours/year, the 2012/2013 CSG brings 29% of all XSEDE users, and consumes only about 0.7% of allocatable XSEDE resources. BUT.. The CIPRES use case is different from the typical XSEDE resource request: Most tree inference codes scale to no more than 64 cores.
20% of CSG users are students in classes, so queue time matters 88% of CSG jobs complete within 12 hours, so queue time matters 3% of CSG jobs run for more then 1 week and most codes have no restart capability, so run times of up to 334 hours are required. These jobs are not a good fit for the intent of the large XSEDE machines Important Policy Moment: Based (in part) on our use case, the US NSF created the Trestles cluster to provide On demand computing (Thanks, NSF!): Trestles is managed and allocated to keep queue depth near zero Administrators allow CSG to run jobs for 334 hours The machine is significant in size, but small jobs (64 cores or
less) are welcomed Important Policy Moment: Based (in part) on our use case, the US NSF created the Trestles cluster to provide On demand computing (Thanks, NSF!): Trestles is managed and allocated to keep queue depth near zero Administrators allow CSG to run jobs for 334 hours The machine is significant in size, but small jobs (64 cores or less) are welcomed CIPRES usage now amounts to 21% of the entire allocatable Trestles machine.what if there were 4 Gateways as successful as CIPRES? or for that matter, what about CIPRES in 2017?
If Gateways are valued, we will need supportive policy decisions at the National / International Level about HPC resource allocation. More investment by US in on demand HPC computing? Accommodation of jobs that scale to a small number of cores Investment by other countries in on demand HPC as a fraction of their total HPC resources, perhaps as a consortium? Otherwise: Decrease user base by eliminating non- US users? (about half of the total user base) Require fee-for-service for non-US users, high-end users, all users?
How to keep the CIPRES Gateway operating? Annual operating budget ~ $200,000 per year to keep the server functioning 20 million cores hours of compute time for 2013-2014 (generously provided by NSF) Served 2,800+ scientists in 2012/2013 allocation year 250+ publications enabled in 2012/2013 58+ instructors supported in 2012/2013 There is a strong dynamic tension between innovation and infrastructure I think what your Gateway has accomplished is impressive, but unless your proposal describes a plan to create new capabilities that do not exist anywhere, you will not get the scores required to win an award
Project Officer, US NSF Division of Biological Infrastructure There is a strong dynamic tension between innovation and infrastructure I think what your Gateway has accomplished is impressive, but unless your proposal describes a plan to create new capabilities that do not exist anywhere, you will not get the scores required to win an award In other words, were having big impact, but NSF isnt going to pay us to make the CIPRES software better for users (or even to continue operations).now what? Project Officer, US NSF Division of Biological Infrastructure
Survival Strategy 1. Innovate! Workflow for the CIPRES Gateway: CIPRES Gateway Assemble Sequences Upload to Portal Store Run Alignment
Run Tree Inference Post-Tree Analysis Download These are highly-evolved desktop/browser applications That have no tree inference tools or are under powered: raxmlGUI Influenza DB
These projects offer powerful and distinct user experiences, and are interested in incorporating powerful tree inference tools into an existing application: RESTful Services will put CIPRES in many environments XSEDE CSG Parallel codes
raxmlGUI RESTful Services will put CIPRES in many environments XSEDE CSG Parallel codes raxmlGUI With a variety of new user interfaces!
We will be adding complexity, with significant risk, and significant potential benefit. Stay tuned There is a strong dynamic tension between innovation and infrastructure Developers may address new research topics in the course of gateway design in order to further their academic goals. Resulting gateways may be more complex than necessary, less reliable, and may not meet the goals of the domain science community for whom they were designed. Focus group participants noted that sometimes simple tools are all that is needed to enable cutting edge science, but [Gateway developers] make the easy things hard.
Wilkins-Diehr, N., and Lawrence, K. A. (2010) in Gateway Computing Environments Workshop (GCE), 2010 There is a strong dynamic tension between innovation and infrastructure To stay federally funded we must continually innovate. This is not necessarily what users want or need. Wilkins-Diehr, N., and Lawrence, K. A. (2010) in Gateway Computing Environments Workshop (GCE), 2010 Survival Strategy 2. Identify new funding models
Lets start by clarifying the value proposition: Random user feedback: It is hard for me to imagine how I could work at a reasonable pace without this resource, especially when things like MS or grant submission deadlines loom. Gateways add value to Universities by making their professors more competitive Random user feedback: It is an easy-to-use cluster to run BEAST analyses in a short time. This allows students to run analyses that actually converge in a single class. I found it is important to be able to let the student explore the analysis 'all the way', i.e. not just show the principle but actually let them run an entire Markov chain and let them evaluate the results. For that I found
that having access to the CIPRES Science Gateway to be crucial. Gateways add value to Universities by making their classroom instruction better The CSG has supported researchers funded by awards from: In the US: 14 governmental agencies 26 non-governmental organizations 25 Universities On 5 other continents: 63 governmental agencies 10 non-governmental organizations 30 Universities Gateways add value to many other organizations as well.
To preserve the spirit of Science Gateways, we must find a way to pitch the value proposition above the level of individual investigators. Above-campus models have been used successfully by others: See the Kuahli foundation (http://www.kuali.org/) as an example. Can such models be crafted for Gateways? To be continued. Acknowledgements: CIPRES Science Gateway
Terri Schwartz Hybrid Code Development Wayne Pfeiffer Alexandros Stamatakis XSEDE Implementation Support Nancy Wilkins-Diehr Doru Marcusiu Leo Carson Mahidhar Tatineni Workbench Framework: ` Terri Schwartz Paul Hoover
Water and Honey Buckets in Rural Alaska By Mary C. Pete U.S. Arctic Research Commission January 13, 2011 Stebbins, Alaska Water and Sanitation in Rural Alaska 23% of the approximately 280 rural communities lack adequate water and sewerage systems Respiratory...
Write the equation. ANSWER: (-7°F; 14°F + -21°F = -7°F) Real Life - 50 Points QUESTION: A snorkeler descends into the Atlantic Ocean and reaches 285 feet. Write an integer to represent the situation below, and tell the value of...
Polarization Jones vector & matrices Phys 375 * * Matrix treatment of polarization Consider a light ray with an instantaneous E-vector as shown x y Ex Ey * Matrix treatment of polarization Combining the components The terms in brackets represents...
Desde estos primeros pasos se empieza ya a apreciar en sus piezas, un estilo personal e innovador, con un alejamiento a la recreación fiel de la realidad hacia una búsqueda de la emoción a través de la exageración de los...
Naught was a babe in arms. He gave no trouble.Whenever he appeared you just "carried" him. ... and easy to handle. Excerpt from A Tree Grows in Brooklyn by Betty Smith. 2 was a baby boy who could walk and...
activity and sport. 3.3.1 . The different types of sporting behaviour: sportsmanship, gamesmanship, and the reasons for, and consequences of, deviance at elite level. 3.3.2 . Interpretation and analysis of graphical representation of data associated with trends in ethical and...
Needs Assessment Qualitative & Quantitative Methods Needs Assessment is: A type of applied research. Data is collected for a purpose! Can be either a descriptive or exploratory study. Can use either quantitative or qualitative methods. Can use a combination of...