Data Capture, Quality Management, and Storage Tools for

Data Capture, Quality Management, and Storage Tools for

Data Capture, Quality Management, and Storage Tools for Citizen Monitoring Groups Revital Katznelson CA State Water Resources Control Board 1 2 Today: The story of the four Functions Basic spreadsheet formats and database building block Examples of error calculation functions Advantages and disadvantages of Excel and Access Data flow 3 I wanted a data management system that has Tangible and user-friendly tools Stratified or tiered structure for different levels of detail Linkage between components Information retrieval and display tools Linkage to GIS, mapping options Compatibility with systems used by others at the Watershed, City, County, State, and Nation level Linkage to existing systems www Accessibility 4 I took a close look at available systems STORET, CCAMP, SFEI,

KRIS, CERES, CALWATER, SINC, SWIM, SWAMP 5 and discovered that we need to cater for four separate functions of a data management system (1) documentation & QA/QC; (2) storage & sharing (3) retrieval, and (4) interpretation & presentation. 6 Function (1) - Documentation & QA/QC -most is done at the monitoring Project level by folks who know about the project, - need a platform for data entry & documentation, error calculation, data verification and validation, etc., - it is easier to separate field measurements from lab analyses, - need placeholders for all essential metadata and inventories, and - can be done in MS Excel by most people, or in a combination of MS Access and Excel, if Access expertise is available. 7 Function (2) Data Storage - storage is very easy if all the information is already captured and can be stored as is, at the Project level, - sharing data with others must be selective, - only a sub-set of essential information will be uploaded onto the Project website or exported into a central database.

8 Function (3) Retrieval - requires that information is organized and interlinked in a way that allows any data user to sort, filter, group, and do any other query activity using anything from basic Excel tools to sophisticated Access or Oracle tools. - good idea to implement basic database structure (I.e., parse information into atomic bits, have only one data type in a column, and avoid mixing of apples and oranges in drop-down menus). It is also good to provide for effective linkage between data tables - if applied, any search engine and query tool can be used to retrieve your data from just about any relational database 9 Function (4) - Data Interpretation & Presentation - this can be done ONLY after the retrieval tools have extracted the desired information from the database tables effectively, - you will need additional tools for plotting, mapping, or running statistical comparisons - if you have some programming-endowed folks who like to automate it in sync with the retrieval - the sky is your limit. 10 When you plan a monitoring effort you need to know -- what needs to be done (tasks), -- who will do it (which role), -- what will they use to do it (tools and platforms), -- how much will it cost, and -- can the Project afford it. 11

Building blocks of a database. Start with Entities with Unique IDs Station ID Sample ID or Activity ID Instrument ID Project ID Trip ID Station-Visit ID Unique IDs are used for tracking, sorting, grouping, filtering 12 What do we need to capture about the Station? Waterbody/sub-watershed/watershed Hydrologic unit (CalWater, HUCS, etc) Lat-Long Position AND datum Driving directions Nearest milepost Access to Station Verbal Description of Landmarks etc. USGS gauge # (if present) Pictures! (plus many other bits of information)

13 Sample ID and Activity ID Activity can be an Observation (with verbal result), a Field Measurement (numeric result, done in Station), or a Sample (jar shipped elsewhere for analysis) For a Sample, capture the following Sampling Log information: Activity [or Sample] ID (helps tracking!) Station ID Date, Time Sampling Device Types and Number of containers Preservatives 14 The project What do we need to capture about the Project and the Project team? 1. 2. 3. 4. 5. 6. Organization Name Teams (Field Crews) People and roles

Contact Person Contact information (address, email, phone, etc) Project Duration (for STORET) 15 Instrument ID and Standard ID What do we need to capture about the Instrument? Instrument ID, Serial number, or other unique identifier Model; Type, features; Range; Resolution; Service records, etc. Standards have unique LOT numbers that can be tracked, or you can create a Standard ID . It DOES matter which one! 16 More building blocks of a database What the users of your data want to know A. How good is your data: What is the accuracy and precision of your measurements and analyses? B. What do your data represent in the environment? 17 b. When you plan a monitoring effort you also need to know what the Results will represent in the Environment Spatial descriptors Station Type : Creek, Outfall, Ditch Station Selection Intent: Impact assessment, Source ID

Reach Selection Design: Systematic, Directed, Random, Station Selection or Non-Deliberate (Anecdotal) Design: (same options) Temporal descriptors Flow Conditions: Storm runoff flows (wet) or base flow (dry) weather Sample Timing Intent: Worst case, Snapshot, Routine Monitoring Seasonal Sampling Design: Systematic, Directed, Random, etc. Diurnal Sampling Design: (same options) Season of interest: Summer, Fall 18 And let your monitoring data speak for themselves! I am the worst case scenario DO=5.6 I have been collected in a stagnant ditch at 14:00 pH=8.7 19 Case Study: Field Measurements Focus: Checking, recording, calculating, and communicating the accuracy and the precision of field measurements with probes and meters 20

(I am walking into murky waters with thorny issues here) Are you committed to deliver data of known accuracy and precision? 21 If you are Here is what it takes 1. If you calibrated an instrument, collected data, and now you are ready to calibrate again, do an accuracy check first and record the reading before any calibration adjustments. [this is the same as post-calibration check]. 2. Run periodic accuracy checks to all your non-adjustable instruments 3. Repeat discrete field measurements with each Instrument at least twice on every Trip 4. Write it all down, preferably with Instrument ID. In other words -- Assign a unique Instrument ID to every measurement device -- Link every Result with the Instrument that was used to measure it -- Link every batch of Results with Instrument calibration and accuracy checks records, and Instrument repeated measurement records, for a given period of time 22 Formats for packaging information in tables See handout: Spreadsheet formats Redundancy happens! It is inevitable, so you might as well put it where it looks into the future Go Vertical! But put in a manageable amount of records Not all bits are needed in the database, but For the number of information bits used at the project operations level (i.e., on the ground), the sky is the limit

23 Option 1: What was the actual accuracy and precision Instrument ID Characteristic (Parameter) TTP-STB01 Temperature, water ECP-STB01 Specific conductivity PHST-STB03j pH PHP-STB01 pH Results Result Units C uS/cm pH pH 14.57 758.7 8 8.34 Accuracy (Percent) -1.4 % -0.14 % 0.5 Res.

0.7% Precision 0.06 %, RPD 0.40 %, RPD 0.5 Resolution 0.12 %, RPD tion 2: What MQOs for accuracy and precision were m Instrument Result Accuracy Precision TTP-STB01 Temperature, water Units C 14.57 MQO 5% MQO 5 %, RPD ECP-STB01 Specific conductivity uS/cm 758.7 2%

ID Characteristic (Parameter) Results PHST-STB03j pH pH 8 0.5 1 %, RPD 20 %, RPD PHP-STB01 pH pH 8.34 5% 5 %, RPD 24 How is the % accuracy generated? From Post-event accuracy check (a.k.a. postcalibration) records: Reading of the instrument in Standard (before calibration adjustment), and the true value of the Standard. This data quality indicator has to be calculated for both options, and compared to MQOs for Option 2 25

Essential post-event accuracy check records Instrument Characteri Units Standard ID sitc (Paramete r) DOP-STB01 DO % sat humid air DOP-STB01 DO % sat ECP-STB01 Sp.Cond PHP-STB01 pH PHP-STB01 pH TTP-STB01 Temp TTP-STB01 Temp uS pH pH C C saturated water STB-EC10y STB-PH20f STB-PH29b TR-STB43 TR-STB43 "True" Value

Reading in Standard Drift Percent Accura cy 100 100 97.3 95 -2.7 -5 -2.7 -5.0 1412 7 9 21.5 21 1410 7.05 8.98 21.19 21.21 -2 0.05 -0.02 -0.31 0.21

-0.1 0.7 -0.2 -1.4 1.0 Differential = (Reading in Standard) (True value) Percent accuracy = ((Reading in Standard) (True value)) x 100 (True value) 26 How is the % RPD generated? From pairs of Repeated field measurements: The difference between the two values expressed as a percentage of their average. This data quality indicator has to be calculated for both options, and compared to MQOs for Option 2 27 Essential Precision Worksheet columns Instrument ID Characteristic Results (Parameter) Units mg/l DOP-STB01 DO mg/l DOP-STB01 DO % sat DOP-STB01 DO uS/cm ECP-STB01 Sp.cond. uS/cm

ECP-STB01 Sp.cond. pH PHP-STB01 pH pH PHP-STB01 pH C TTP-STB01 Temp. C TTP-STB01 Temp. * RPD is the Relative Percent Difference Result Repeated reproduci Max Result bility RPD* (RPD*) 2.84 11.96 121.5 746.9 648.4 8.61 8.55 15.97 16.19 2.65 11.68 121.5 746.7 651 8.62 8.55 15.97 16.2

RPD = ((Result) (Repeated Result Value)) x 100 ((Result) + (Repeated Result Value))/2 6.92 2.37 0.00 0.03 0.40 0.12 0.00 0.00 0.06 6.92 0.40 0.12 0.06 28 I am no less than 600 uS, no more than 700 uS I come with a cumulative error range of 50% to 100% EC=650 uS NH3=5 mg/l 29 Examples: Projects and Programs

Field data sheets in drawer (too many folks) Excel spreadsheets home made Excel spreadsheet templates and data transfer tools Excel regional database with web and data transfer interfaces Access database for Project home made Regional Access database Program central Access or Oracle centralized database 30 Models of data management systems Field operators: Partial documentation Field and other Project operators: Field and other Project operators: Data capture and entry Function (1) (2) Central program Functions (1), (2), possibly (3) and (4) National Database Functions (3) and (4) On-line Database

Functions (1) (2) (3) and (4) National Database Functions (3) and (4) 31 Functions: (1) documentation & QA/QC; (2) storage & sharing; (3) retrieval; (4) presentation. Web hosting If you want to create your own web-based database, even just for for function 3 (retrieval), check out Web Hosting opportunities: For $10-20 per month you can have Your own domain MySQL database with several GB of storage Periodic backup of your data But you will be the one designing the database with all its tools, setting it up, uploading data, and updating the data. 32 ONE Field Measurement and Recording TWO Date Entry (Direct or via Form) Generic Field Operations SOPs Instrumentspecific SOPs Field Data Sheet - HARDCOPY

"Dale" Data Entry Form Data Entry SOPs "Chris" Dale with aor DEV PDA THREE Error Assessment Data Validation RESULT electronic Table (with repeated measurements) RESULT Table Resolution FIVE Data Upload Precision Worksheet formulae for imprecision Calibration and Accuracy Checks Table - formulae for percent inaccuracy RESULT Table - Accuracy & Precision (or Resolution Uncertainty); Validity Status

dBasespecific standards & formats IT Specialist A SIX Calibration and Accuracy Checks electronic Table Data Validation SOPs "Pat" Crosswalks Data Entry Form Error Calculation SOPs "Chris" FOUR Calibration and Accuracy Checks Field Data Sheet -HARDCOPY dBase uploading protocols batch file

dBase A IT Specialist B batch file dBase B batch file dBase C 33 Excel versus Access: Advantages Advantages of Excel: Small files, easy to e-mail, easy to exchange Intuitive, easy to learn, transparent, easy to see your data Supports drop-down menus to reduce data entry errors Easy to sort and filter data Good for calculations and graphing Advantages of Access: More practical for large databases Supports sophisticated queries and security features Can generate data reports & tables in various formats Controlled data entry, and less hands-on manipulation of data Modified from a summary by Ellie Ely, November 2004 34 Excel versus Access: Disadvantages Disadvantages of Excel: File gets cumbersome with large data sets Requires a lot of hands-on manipulation

Potential for human error when manipulating data No capabilities for complex queries Disadvantages of Access: Harder to learn, takes dedication and experience Large files, 20 or 30MB -- harder to exchange Cannot do calculations or graphs (but data are easily exported to Excel for that) Modified from a summary by Ellie Ely, November 2004 35 Use capture tools for all Water Quality Data Elements (WQDEs) What Station Collection Collection Sampling Position in Instrument Characteristic visit ID Date Time Device Water ID (Parameter) Column (STORET none surface TTP-STB01 Temperature, water V1 6/22/2003 11:23:41 none surface ECP-STB01 Specific conductivity V1 6/22/2003 11:23:41 none surface DOP-STB01 Dissolved oxygen (DO) V1

6/22/2003 11:23:41 none surface PHP-STB01 pH V1 6/22/2003 11:23:41 Results Units Result C uS/cm mg/l pH 14.74 926.8 2.84 7.59 Team Name CAWCM Berkeley Ecology Center RK Crew CAWCM Wildcat Creek Monitors RK Crew Organizatio Organizational Organizationa

nal Entity Entity Type l Entity Category Contact person ID Watersehd Not-Profit Organizatio Resource n Center Watersehd Volunteer Organizatio Group n Contact Last name Contact First name Contact Role Leff Penny Trainer Contact Address Status Line 1 (volunteer or staff) Volunteer

Address Line 2 Depth (From Surface) 929 2.65 7.59 Sequencer Project Name 1 Technical Volunteer Leader Project ID Hydrologic Unit (Calwater) DQM- Protocol/S Field Operator SOP ID OP Name Reference 0.1 Operator's QA/QC Specified Error Review Date Range QA/QC Combined

Review inaccuracy Person and 10/24/2003 R. Katznelsonimprecision 1.51 10/24/2003 R. Katznelson 0.54 10/24/2003 R. Katznelson 11.92 10/24/2003 R. Katznelson 0.83 Resolution Uncertainty Factor 0.07 0.01 0.35 0.13 Document Validity Error Fidelity of Data Use ation Qualifier Range Data Potential Level Category Entry Code Adequate Valid 0 to 2% nap any use Adequate Valid 0 to 2%

nap any use Adequate Valid 10 to 20% nap any use Adequate Valid 0 to 2% nap any use 0.01 When How Why City SubWatershed Waterbody Station Name Station ID Station Type Activity or (Station Facility primary type) (Station secondary type) Alvarado WIL070a River/Strea

Station City Station County Station State Station Location Description Richmon Contra d Costa CA Alvarado Park, 61 m downstream of footbridge on path leading from McBryde Ave at junction with Park Ave. Alvarado Park, 58 m downstream of footbridge Wildcat W IL03 Variability Wildcat Creek Wildcat Creek Wildcat Creek

2 Wildcat W IL03 Variability Wildcat Creek Wildcat Creek Wildcat Creek Alvarado WIL070b River/Strea m Richmon Contra d Costa CA 3 Wildcat W IL03 Variability Wildcat W IL03 Variability Wildcat Creek Wildcat Creek

Wildcat Creek Wildcat Creek Wildcat Creek Wildcat Creek Alvarado WIL070c River/Strea m Alvarado WIL070d River/Strea m Richmon Contra d Costa Richmon Contra d Costa CA 4 Depth Interval 0.01 Param&Meth Domain od Code Code Agency Inventory #

Serial # Common Name Characteristic (Parameter) Type /Method Features 1 DOP-STB1 DOP STB nap 03A0616 (?) 2 TR-STB43 TR STB Dissolved Oxygen probe Thermometer Dissolved Oxygen Temperature

Polarographic, Rapid Pulse mercury bulb thermometer 3 TTP-STB1 TTP STB nap 01J0429 Temperature probe Temperature Thermistor 7 cm long, 1 cm 6552 diameter NIST calibrated (to be used as standard) 1 cm long extension 6560 from probe Berkeley Watershed Depth

Unit R. Katznelson R. Katznelson R. Katznelson R. Katznelson Sequencer Instrument ID Katznelso Revital n (worksheets) Replicate Duplicate Bracket or Measureme Measure Instrument nt Result ment Resolution Result 0.01 14.74 Who Wher e STORET Organizational Organizati Entity Name n ID How Good? m CA Alvarado Park, 52 m downstream of footbridge Alvarado Park, 46 m

downstream of footbridge Project ID Dataset ID Scenario or Question Station Type WIL03 what is the interRiver/Stre urban habitat variability in am Wildcat Creek during summer? WILD01 Land Use Setting Model Calibration Mode (manual or auto & standard values) automatic automatic Activity or Facility Station Selection Intent

Sample Reach Station Seasonal Timing Intent Selection Selection Sampling Design Design Design recreational park not applicable characterizati directed on directed directed Season of Interest summer Diurnal Sampling Design directed What does it represent? 36 Total Number Date of of StationStation Visits

Visit Tally 14 10/24/2003 You can package it all in the Project File "Dataset" DATASET ID "Location" STATION ID "RESULTS" . "Calibration and Accuracy Checks" INSTRUMENT ID STANDARD ID "Standards" STANDARD ID "Instrumeth" INSTRUMENT ID DATASET ID STATION ID "Sampling Log" STATION ID SAMPLE ID "Project Organization" PROJECT ID ORGANIZATION ID INSTRUMENT ID SAMPLE ID PROJECT ID

ORGANIZATION ID 37 You can have Project personnel document and manipulate the data; Then transfer only selected elements to the Central Database Project Personnel Field measuremen ts Instrument ID Repeated Measureme nts Calibration 200 Data records Elements Standard ID Calculatio n of Data Quality Indicators (e.g., accuracy, precision) and attachme nt to Results IT

guru Data transfer (selecte d Fields) Internal spreadsheets left behind (Instruments, Standards, Calibration, etc.) Data user Centra l Database 50 Data Element s 38 The Central Database can be STORET SIM 39 .Or a node in the California Cooperative Data Management System SWAM PAB 1747

Compliant EMP Node MLML Multiple Clients Other Standards Other Node Node SM ND Node Multiple Clients Other Standards Exchange Network Node BDAT Bonderson Node

Node Web Services EDSC Standards Cal/EPA CDX Node USGS CSUS Node Client AB 1747 Compliant Client AB 1747 Compliant California Environmental Data Exchange Network Source: Karl Jacob, DWR 40 . or the National data exchange network! A. Nearer-Term Vision for the Data Flow Data Access Application Washington Idaho

Network Node Data Source .xml .xml Network Node Data Source INTERNET .xml Oregon Network Node Data Source California .xml Network Node .xml Data Source .xml

Exchange Host Network Node Host Database Data Catalog EPA .xml CDX Source: Curtis Cude, Pacific Northwest Water Quality Data Exchange Data Access Application WQ Monitoring Data Warehouse 41 B. Long-Term Vision for the Data Flow Data Access Application Washington Idaho Network Node Data

Source .xml .xml Network Node Data Source INTERNET .xml Oregon Network Node Data Source California .xml Network Node .xml Data Source .xml Exchange Host EPA

Network Node Host Database CDX Data Catalog WQ Monitoring Data Warehouse Source: Curtis Cude, Pacific Northwest Water Quality Data Exchange 42 Ready to transfer your data? Find out the about the restrictions (business rules, formats, permitted values), Identify the data flow pathways, and Decide if you want to use the updatable or the non-updatable mode in your target central database. 43 XML Schema Source: Curtis Cude, Pacific Northwest Water Quality Data Exchange 44 Summary Actions for capture, quality management, and storage of monitoring data involve many

tasks, employs many roles, and require many tools The two extremes are a totally centralized system (Region or State) versus a local database at the Project level Centralized data management options require lots of resources and IT support The choice of tools and platforms are not always yours, but when it is plan ahead 45

Recently Viewed Presentations

  • Welcome Evening Monday 14th September

    Welcome Evening Monday 14th September

    Welcome Evening RACE to the top! Respectful Achieving Creative Enthusiastic Welcome to Primary 4 Mrs Lyall #Grow! Create! Shine! Aims of Meeting Informal- not specific for each child Face to name Report Format Routines and Expectations Subject Areas Homework Issues...
  • PC Computers or Desktop Computers: Usually a computer

    PC Computers or Desktop Computers: Usually a computer

    Games (solitaire, first person games like Sims, or Minecraft), ) (Bigger video Cards, with more speed for faster frame rates) Spreadsheets, inventory, accounting, etc. (MS Excel) Database Software (MS Access) Music Creation, or Player (Audio) Video Creation, or Player (video)...
  • Welcome! On green post it, please answer the following ...

    Welcome! On green post it, please answer the following ...

    The first reason Northville Public Schools should have automatic flush toilets and touchless sinks is because they are better for the environment. According to sustainability-certification.com, "If a sink drips three drops a minute it wastes about a liter a day...
  • CS310 The Linux Command Line in a Nutshell

    CS310 The Linux Command Line in a Nutshell

    Times New Roman Arial Courier New Default Design CS310 Connect to zeus.cs.pacificu.edu Connected to zeus Creating a file To create or open a file using emacs Build the executable Build a tar.gz file Makefiles Alternate Plan Write a lex and...
  • Anthology Poems at a glance: Meaning (M), Context (C), Quotes ...

    Anthology Poems at a glance: Meaning (M), Context (C), Quotes ...

    Act 2:2 -metaphor -Juliet questioning Romeo's family names importance. "Methinks I see thee now, thou art so low, as are dead in the bottom of the tomb" Act 3:5 -Juliet has a vision of Romeo lying dead. "Proud I can...
  • Fourth Degree Faithful Comptroller Financial Duties

    Fourth Degree Faithful Comptroller Financial Duties

    Overview: Collect all financial transactions from members and other sources. Record, keep and maintain an account of all monies received. Report at the regular meeting of the Grand Chapter all monies received from all sources at or between meetings.
  • Protecting Yourself from Fraud including Identity Theft

    Protecting Yourself from Fraud including Identity Theft

    Prevent business practices that are anticompetitive, deceptive or unfair to consumers. Federal Trade Commission. Consumer Financial Protection Bureau. Federal Drug Administration. Federal Communications Commission. Most Common Government Protection Agencies .
  • The Prompt

    The Prompt

    Write an essay An essay that compares AND contrasts C&C the short story and the novel And linking them through their thematic idea Using setting, character development and conflict S. of an H H.B Thematic Idea Individuality through freedom Individuality...