Biomedical Ontologies How to make and use them

Biomedical Ontologies How to make and use them

Biomedical Ontologies How to make and use them Nigam Shah Barry Smith Post-doctoral Fellow, SMI [email protected] Professor of Philosophy University at Buffalo [email protected] Data explosion in the life sciences

Sequence information The first data type to be available in large amounts Has had the maximum time to be standardized FASTA format is the most popular Expression information Recent rise in abundance Transcription factor binding information High throughput available in yeast Protein-Protein interaction information Relatively recent rise in availability. ChIP, array based.

Past knowledge, traditional experiments, published papers. 2 So many biological databases, so little time More than 1000 different databases! Some biological databases: AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN,

ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!! 3 More data is good, whats the problem? Too unstructured: from a variety of incompatible sources

no standard naming convention each with a custom browsing and querying mechanism and poor interaction with other data sources Difficult to use and understand the available data, information and knowledge 4 Ontologies to the rescue Ontologies provide formal specification of how to represent objects, concepts and relationships among them Ontologies provide a shared understanding

[language] for communicating biological information Ontologies overcome the semantic heterogeneity commonly encountered in biomedical databases Ontologies are interpretable by humans and by computer programs. 5 Copyright Stanford University 2006 6 Copyright Stanford University 2006

7 Part 1 Part 2 Part 4 Part 3 Part 5 8 Uses of ontologies 1. Naming things

2. 3. 4. 5. 6. Reference ontologies Controlled terms for annotating things As a data exchange format Define a knowledgebase schema

Computer reasoning over data Driving NLP Information integration 9 The Gene Ontology www.geneontology.org The Gene Ontology (GO) project is an effort to provide consistent descriptions of gene products. The project began as a collaboration between three model organism databases: FlyBase (Drosophila) Saccharomyces Genome Database (SGD) Mouse Genome Database (MGD)

GO creates terms for: Biological Process Molecular Function Cellular Component 10 Biological Process) 11 Nat Genet. 2000 May;25(1):25-9. Use of GO for analysis: Shared GO terms 12

MESH = Medical Entity Subject Headings www.nlm.nih.gov/mesh Controlled vocabulary for indexing biomedical articles 19,000 main headings organized hierarchically Implicit semantics of parent-child relationships Multiple inheritance List of subheadings attached to main headings as modifiers Copyright Stanford University 2006

13 MeSH Subtrees Body Regions [A01] 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] +

Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (..) Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] +

Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (.) 14 MeSH Headings in an article MH - Adult MH - Antipsychotic Agents/pharmacology/*therapeutic use Supplementary heading MH - Comparative Study MH - Dose-Response Relationship, Drug

MH - Female Main headings MH - Genotype Minor heading Major heading Qualifier MH - Human MH - Male MH - Pharmacogenetics MH - Polymorphism (Genetics)/*genetics MH - Prognosis MH - Psychiatric Status Rating Scales MH - Receptors, Serotonin/drug effects/*genetics MH - Risperidone/pharmacology/*therapeutic use MH - Schizophrenia/diagnosis/*drug therapy/genetics

MH - Schizophrenic Psychology MH - Support, Non-U.S. Gov't MH - Treatment Outcome 15 Use of MeSH for Information Retrieval Computational Biology [MH] AND Medical Informatics [MH] Copyright Stanford University 2006 16 Foundational Model of Anatomy

sig.biostr.washington.edu/projects/fm/ Long-term project at University of Washington to create a comprehensive ontology of human anatomy 72K concepts, 1.9M relationships Rich semantics 17 Anatomical Structure Organ

Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision

is_a Pleural PleuralSac Sac Pleural Pleural Cavity Cavity Parietal Parietal Pleura Pleura

Interlobar Interlobar recess recess Organ Part Mediastinal Mediastinal Pleura Pleura Structure of FMA Tissue

Pleura(Wall Pleura(Wall ofofSac) Sac) Visceral Visceral Pleura Pleura Mesothelium Mesothelium ofofPleura Pleura

of Organ Cavity t_ Organ Cavity Subdivision pa r Anatomical Space

18 Use of FMA: Image annotation LA LA RA LV RA LV RAA RAA

RV RV Images possess no knowledge of their contents FMA-based image annotation provides that 19 knowledge Uses of ontologies 1. 2. 3.

4. 5. 6. Naming things As a data exchange format Define a knowledgebase schema Computer reasoning over data Driving NLP Information integration 20 MGED Ontology www.mged.org

Provides standard terms for annotation of microarray experiments Enables unambiguous descriptions of how the experiment was performed Enables structured queries of elements of the experiments 21 MGED Ontology Browser http://nciterms.nci.nih.gov/priv_mged_o/Connect.do 22

USE OF MGED ONTOLOGY: ArrayExpress Query form 23 Uses of ontologies 1. 2. 3. 4. 5. 6.

Naming things As a data exchange format Define a knowledgebase schema Computer reasoning over data Driving NLP Information integration 28 Ontologies support reasoning Reasoning = infer new knowledge from existing assertions Reasoning often of two types Closed world

Open world Virtual Soldier Project 29 Our task Use geometric models to predict expected organ damage from penetrating injury Given: 3-D volumetric imaging data Given: injury trajectory Predict: organ damage and extent of injuries

This task requires anatomic reasoning 30 Defining anatomic structures in terms of vascular supply This organs arterial supply is defined here FMA OWL Concept Definitions 31

An example injury Bullet trajectory (hitting coronary artery) A bullet path is described, and predicted primary injuries are displayed32 Inferring Injury Propagation Totally ischemic myocardium A computer reasoning service deduces parts of the myocardium that are at risk consequent to injury of a coronary artery, shown as

highlighted structures in the ontology (above) Stanford University and as shaded parts of the imageCopyright of the heart 2006 (right). Partially ischemic myocardium 33 Part 1 Part 2

Part 4 Part 3 Part 5 42 Various meanings of Ontology Philosophy: Ontology is the study of what entities and what types of entities exist in reality. AI: An ontology is a explicit specification of concepts &

relationships that can exist in a domain of discourse IT: an ontology is a data model that represents a domain and is used to reason about the objects in that domain and the relations between them 43 The common ground Ontology = A specification of entities (or concepts), relations, instances and axioms in an area of study.

44 ENTITIES Representing entities 1. Physical Reality A. The reality on the side of the patient 2.

Psychological Reality = our knowledge and beliefs about 1. B. Cognitive representations of this reality on the part of clinicians Propositions, Theories, Texts = formalizations of those ideas and beliefs

C. Publicly accessible concretizations of these cognitive representations in textual, graphical and digital artifacts 3. 46 Definitions Entity = anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (Levels 1, 2 and 3)

Domain = a portion of reality that forms the subjectmatter of a single science or technology or mode of study; Representation = an image, idea, map, picture, name or description ... of some entity or entities. Representational Units = terms, icons, alphanumeric identifiers ... which refer, or are intended to refer, to entities. 47 A representation is not the same as the entity it represents Brain of Mr. X Ontology

CT Scan of the Brain of Mr. X 48 Ontologies do not represent concepts in peoples heads 49 So, an Ontology Ontology = a representational artifact whose representational units (drawn from a natural or formalized language) are intended to represent types [of entities] in reality those relations between these types which are

true universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lung 51 Results in A tension between computer scientists and philosophers. Philosophers view: If the Ontology is built to represent reality then the exchange formats and data models based on it always remains valid allowing interoperability and and Computer scientists view: KISS

52 Results in the need to distinguish Ontologies, terminologies, catalogs: represent what is general in reality = types [classes] Databases, inventories: represent what is particular in reality = instances 53 Types Substance

Organism Animal Mammal leaf node Cat Mammal Frog

instances 54 Classes (Types) & Defined classes (Fiat types ) Class = a maximal collection of particulars determined by a general term (cell, oophorectomy VA Hospital, breast cancer patients in VA Hospital) the class A = the collection of all particulars x for which x is A is true Defined Class = A class defined by a general

term which does not designate a type in reality e.g. pathways 55 types < defined classes < concepts Not all of those things which people like to call concepts correspond to defined classes Surgical or other procedure not carried out because of patient's decision is a concept in SNOMED

56 Ontologies that represent concepts tend to make mistakes 1. congenital absent nipple is_a nipple 2. failure to introduce or to remove other tube or instrument is_a disease 3. bacteria causes experimental model of disease

concepts do not stand in part_of connectedness causes treats ... relations to each other 57 A Terminology is A representational artifact whose

representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to represent defined classes. Most Medical Ontologies are terminologies 58 The International Classification of Diseases 724 724.0 724.00 724.01 724.02 724.09

724.1 724.2 724.3 724.4 724.5 724.6 724.7 724.70 724.71 724.71 724.8 724.9 Unspecified disorders of the back Spinal stenosis, other than cervical

Spinal stenosis, unspecified region Spinal stenosis, thoracic region Spinal stenosis, lumbar region Spinal stenosis, other Pain in thoracic spine Lumbago Sciatica Thoracic or lumbosacral neuritis Backache, unspecified Disorders of sacrum Disorders of coccyx Unspecified disorder of coccyx Hypermobility of coccyx Coccygodynia Other symptoms referable to back

Other unspecified back disorders 59 ICD9 (1977): A Handful of Codes for Traffic Accidents 60 ICD10 (1999): 587 codes for such accidents V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income W65.40 Drowning and submersion while in bath-tub, street

and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities 61 RELATIONSHIPS The is_a relation What does A is_a B mean? For all x, if x instance_of A then x instance_of some B cell division is_a biological process

ALL-SOME STRUCTURE 63 The part_of (vs. has_part) relation Human being has_part testis? human testis part_of human being ? Human being has_part heart? A part_of B = all instances of A are instance-level parts of

some instance of B human testis part_of human being human heart part_of human being ? 64 Two kinds of parthood between instances: Marys heart part_of Mary this nucleus part_of this cell between types

human heart part_of human cell nucleus part_of cell Copyright Stanford University 2006 65 The part_of relation What does A part_of B mean? For all x, if x instance_of A then there is some y, y instance_of B and x part_of y where part_of is the instance-level part relation cell nucleus part_of cell

ALL-SOME STRUCTURE 66 A part_of B, B part_of C ... The all-some structure of the definitions allows cascading of inferences 1. within ontologies 2. between ontologies 3. between ontologies and EHR repositories of instance-data 67

Mathematical properties matter Expectations of symmetry may hold only at the instance level if A interacts with B, it does not follow that B interacts with A Properties of Relations 1. 2. 3. 4. 5.

Transitivity Symmetry Reflexivity Anti-Symmetry if A is expressed simultaneously with B, it does not follow that B is expressed simultaneously with A 69

Other Ontology-like things Controlled vocabulary = A list of explicitly enumerated unambiguous terms; Controlled by a central registration authority; Taxonomy = collection of controlled vocabulary terms organized into a hierarchy Thesaurus = Collection of controlled vocabulary terms organized into a specialized network 70 Increasing formality Originally by Michael Uschold, with permission 71

Application vs. Reference Ontologies A reference ontology is analogous to a scientific theory. consists of representations of biological reality which are correct according to our current understanding. An application ontology is a software artifact: for, structuring data according to some hierarchy of classes, for the purpose of managing and manipulating that data, supporting interoperability of various resources. As far as possible, we should focus on developing [scientific] information models, data-models, processmodels etc to be as close as possible to and refer to reference ontologies.

72 Languages [formalisms] for Ontologies There are numerous ways of declaring both reference and application ontologies Almost all ontology languages give you the ability [and syntax] for declaring entities and relationships The main differences are in the ability [and mechanism] of describing the attributes of the entities and the mathematical properties of the relationships. http://xml.coverpages.org/OntologyExchange.html Another major difference is the level of tool support available for writing in that language. http://xml.com/2002/11/06/Ontology_Editor_Survey.html

73 A partial list of ontology languages 1. KIF = Knowledge Interchange format 2. OKBC = Open Knowledge Base Connectivity The Generic Frame Protocol is the implicit formalism underlying OKBC. 3. OBO = Open Biomedical Ontology 4. OWL = Web Ontology Language

Will be discussed in todays tutorial Subsumes XML, RDF(S), DAML+OIL 74 What an Ontology is NOT An ontology is not the same as a knowledgebase Ontology (types) + Instances = KB An ontology is not the same as a database schema A database schema is designed to store the instances conforming to an ontology An ontology is not the same as an XSD An XSD tells you how to store the information that describes the instances

75 Part 1 Part 2 Part 4 Part 3 Part 5 79 Overview of OWL

Nigam Shah [email protected] OWL Web Ontology Language Recommended by W3C since Feb 2004 Based on predecessors (DAML+OIL) A Web Language: Based on RDF(S)

An Ontology Language: Based on logic Three varieties OWL-full OWL-DL (OWL) OWL-Lite The Three Sublanguages of OWL OWL Full Maximum expressiveness with syntactic freedom of RDF with no computational guarantees OWL DL Highly expressive while retaining computational completeness OWL Lite

Classification hierarchy and simple constraints Working with OWL syntax is not easy Tools are being developed for OWL Even with nice XML tools, RDF syntax is not very nice to work with Basic Protg-OWL usage Nigam Shah [email protected]

Protg OWL: a GUI environment for OWL Robust OWL environment within PROTG framework Most widely used tool for editing and managing OWL ontologies Protg OWL features

Loading and saving OWL files & databases Graphical editors for class expressions Access to description logics (DL) reasoners via Protg GUI Ontology visualization components Built on Protg platform Can hook in custom-tailored components API for new applications PROJECTS Loading OWL files 1. If you only have an OWL file:

- File New Project - Select OWL Files as the type - Tick Create from existing sources - Next to select the .owl file 2. If youve got a valid project file*: - File Open Project - select the .pprj file * ie one created on this version of Protg - the s/w gets updated once every few days, so dont count on it unless youve created it recently safest to build from the .owl file if in doubt (Create or load an OWL project)

File New Project OR File Open Project Protg OWL Overview Classes Subclass relationships Disjoint classes OWL for data exchange Properties

Characteristics (transitive, inverse) Range and Domain ObjectProperties (references) DatatypeProperties (simple values) Individuals Property values Class Descriptions Restrictions Logical expressions OWL for classification

and reasoning Ontology Development Process determine scope consider reuse enumerate terms define classes

define properties define constraints create instances In reality - an iterative process: determine consider scope reuse define

properties consider reuse define classes define properties enumerate consider terms reuse define properties define

constraints define constraints create instances define classes create instances enumerate terms define

classes define classes create instances Establish Purpose determine scope consider reuse

enumerate terms define classes define properties define constraints create instances

What will the ontology be used for? Classification of Pneumonia: Bacterial Pneumonia (caused by bacteria) Pneumococcal Pneumonia (caused by a particular kind of bacteria) Viral Pneumonia (caused by viruses) Mixed Pneumonia (caused by both bacteria and viruses) Enumerate Important Concepts determine scope consider reuse

enumerate terms define classes define properties define constraints create instances

What are the terms we need to talk about? Pneumonias, infectious organisms. What are the properties of these terms? hasRadiologyFinding, hasLocus, hasCause. What do we want to say about the terms? Pneumonias cause radiology opacity findings Pneumonias are located in lung Mixed pneumonias are caused by bacteria and viruses. CLASSES

Classes Sets of individuals with common characteristics Individuals are instances of at least one class Beach City Sydney Cairns BondiBeach CurrawongBeach

Superclass Relationships Classes organized in a hierarchy implies subsumption Direct instances of subclass are also (indirect) instances of superclasses Cairns Sydney Canberra Coonabarabran Class Relationships Classes can overlap arbitrarily Classes are assumed non-disjoint by default (ie, they may share instances)

RetireeDestination City Cairns Sydney BondiBeach Class Disjointness All classes could potentially overlap Specify disjointness to make sure they dont share instances disjointWith UrbanArea

Sydney Sydney City RuralArea Woomera CapeYork Destination Class Editor Class annotations (for class metadata) Class name and documentation Properties available

to Class Disjoints widget Conditions Widget Class-specific tools (find usage etc) Define classes and the class hierarchy determine scope consider reuse enumerate

terms define classes define properties define constraints create instances Identify Classes (from the previous term

list) If something can have a kind then it is a Class Kind of Pneumonia - Pneumonia is a Class Kind of Samson X - Samson is an individual Kind of Bacteria Bacteria is a Class Define classes and the class hierarchy determine scope

consider reuse enumerate terms define classes define properties define constraints

Arrange Classes in an hierarchy PneumococcalPneumonia is a subclass of Pneumonia Every PneumococcalPneumonia is a Pneumonia Pneumococcus is a subclass of Bacteria Every Pneumococcus is a Bacteria MixedPneumonia is a subclass of Pneumonia Every MixedPneumonia is a Pneumonia create instances Create classes:

create Pneumonia class Class Disjoints Note that Bacterial Pneumonia has has superclass superclass Pneumonia Pneumonia as as aa necessary necessary condition Is Is asserted

asserted to to be be disjoint disjoint from its siblings siblings Necessary parent Disjoint classes What it means All BacterialPneumonias are Pneumonias No BacterialPneumonia is not a Pneumonia Nothing is both:

a BacterialPneumonia and a ViralPneumona a BacterialPneumonia and a MixedPneumonia NB: In OWL classes can overlap unless declared disjoint! Add Annotations on Classes Another Way to Create Classes A class can be the union of two classes An InfectiousPneumonia is either a BacterialPneumonia or a ViralPneumonia A class can be the intersection of two classes A MixedPneumonia is any Pneumonia that is caused

by both Bacteria and Viruses A class can be the complement of another class Noninfectious pneumonia is any pneumonia that is not caused by an infectious agent (bacteria or virus) Create a class by composition An InfectiousPneumonia is a Pneumonia that is either a BacterialPneumonia or a ViralPneumonia PROPERTIES OWL Properties

Datatype Property relates Individuals to data (int, string, float etc) Pneumonia hasRadiologyFinding xsd:String Object Property relates Individuals BacterialPneumonia hasCause Bacterium Annotation Property for attaching metadata to classes, individuals or properties OntologyClass hasAuthor Natasha Datatype Properties Link individuals to primitive values

(integers, floats, strings, booleans etc) Often: AnnotationProperties without formal meaning Sydney hasSize = 4,500,000 isCapital = true rdfs:comment = Dont miss the opera house Object Properties Link two individuals together Relationships (0..n, n..m) rt a P

has BondiBeach Sydney hasA ccom odatio n FourSeasons Annotation Properties

To annotate classes, properties, and individuals Usually used for documentation m o c : rdfs nt e m My comment

Sydney hasA utho r Kaustubh Supekar Properties of an OWL property Functional Person has_Mother Mother Transitive A hasPart B, B hasPart C ==> A hasPart C

InverseFunctional Person has_SSN SSN Symmetric A worksWith B ==> B worksWith A Define Properties of Classes determine scope consider reuse

enumerate terms define classes define properties define constraints create instances

Properties in a class definition describe attributes of instances of the class and relations to other instances Each Pneumonia will have radiology findings and a cause Each cause for pneumonia will have a causative organism. Create object property has_part Click on properties tab Click on Create_Object_property icon and create has_partCreate Object property icon

Object property hasLocus (already present) Create New Datatype Property, hasRadiologyFinding Datatype = string Create annotation property hasAuthor RESTRICTIONS Restrictions (Overview) An anonymous class consisting of all individuals that fulfill the condition Define a condition for property values

allValuesFrom someValuesFrom hasValue minCardinality maxCardinality cardinality Define Constraints : OWL Restrictions

determine scope consider reuse enumerate terms define classes define properties

define constraints create instances Quantifier restriction How to represent the fact that every pneumonia must be located in a a lung? Cardinality restrictions How to represent that a Hand must have 5 fingers as parts ? hasValue restrictions

How to define the value of a relation for a class ? (relationship between class and a individual) Quantifier Restrictions Restrictions are of the form All members of class C have as values for property p some things of Class D () only things of class D () at least | at most | exactly n things Examples some (someValuesFrom) () (Existential) Cheesy_Pizza has_base someValuesFrom Cheese_Topping.

Implies- All cheesy pizzas have some (at least 1) toping that is a cheesey topping only (allValuesFrom) () (Universal) VegetarianPizza has_topping allValuesFrom Vegetarian_Topping. Implies - All Vegetarian pizzas have only toppings that are Vegetarian Toppings Creating Restrictions Restricted Property Restriction Type Filler

Expression Expression Construct Palette Syntax check Create a restriction: Add a datatype property All pneumonias are disorders that have a radiological finding of opacification

Add an Object Property All pneumonias are located in some lung All pneumonias are disorders that are located in some lung and have a radiological finding of opacification Add more object properties BacterialPneumonia is caused by some bacteria BacterialPneumonia causedBy some Bacteria BacterialPneumonia causedBy.Bacteria ViralPneumonia is caused by some virus

ViralPneumonia causedBy some Virus MixedPneumonia is caused by some bacteria and by some virus MixedPneumonia (causedBy some Bacteria) (causedBy some Virus) Using expression editor All MixedPneumonias are Pneumonias caused by Bacteria or by Viruses Class Descriptions Define the meaning of classes Description Logic expressions (anonymous

class expressions) are used: All national parks have campgrounds. A backpackers destination is a destination that has budget accommodation and offers sports or adventure activities. Expressions usually restrict property values Reasoners can perform inference/classification Defined/Primitive Classes Necessary Conditions: (Primitive / partial classes) If we know that something is a X, then it must fulfill the conditions... Necessary & Sufficient Conditions:

(Defined / complete classes) If something fulfills the conditions..., then it is an X. NationalPark QuietDestination Defined/Primitive Classes Necessary Conditions: (Primitive classes) Describes a subclass If something is a Class_X, then it must fulfill the conditions... Converse may NOT be true: If something fulfills the conditions..., then it is a Class_X.

Class_X Necessary & Sufficient Conditions: (Defined classes) If something fulfills the conditions..., then it is a Class_X. Class_X e.g., Disorder is a necessary condition on Pneumonia Disorder Pneumonia If something is a Pneumonia, then it is a Disorder BUT If something is a Disorder, it may not be a Pneumonia

Necessary & sufficient conditions on BacterialPneumonia BacterialPneumonia If N&S conditions, then it is a BacterialPneumonia AND If something is a BacterialPneumonia, then N&S condtions INDIVIDUALS Individuals Represent objects in the domain

Specific things Two names could represent the same real-world individual Sydney BondiBeach SydneysOlympicBeach Create instances determine scope consider

reuse enumerate terms define classes define properties define constraints create

create instances instances Create an instance of a class The class becomes a direct type of the instance Any superclass of the direct type is a type of the instance Generally, you create instances if you have a type-of something Classification

Reasoners Reasoners (classifiers) infer information that is not explicitly contained within the ontology Standard reasoner services are: Consistency Checking (i.e., satisfiabilitycan a class have any instances?) Subsumption Checking (Finding subclassesis A a subclass of B?) Equivalence Checking Instantiation Checking (Which classes does an individual belong to) For Protg we recommend RACER or Fact++ (but other tools with DIG support work too) Reasoners can be used at runtime in applications as a querying mechanism

Used during development as an ontology compiler. Ontologies can be compiled to check if the meaning is what was intended Run a DL Reasoner with Protg OWL Protg OWL can work with multiple reasoners Racer (http://www.racer-systems.com/) Pellet (http://www.mindswap.org/2003/pellet/) Fact++ (http://owl.man.ac.uk/factplusplus/) Need to install, configure, and run at least one reasoner as a separate process Protg OWL and reasoner exchange information through inter-process communication

Make InfectiousPneumonia a defined class An infectious pneumonia is either a bacterial or viral pneumonia Now classify BacterialPneumonia & ViralPneumonia are now subclasses of InfectiousPneumonia Visualization Further reading/exploration Protg: http://protege.stanford.edu Protg OWL:

http://protege.stanford.edu/plugins/owl/ Protg OWL discussion list Protg Workshops (early 2006) Protg International Conference OWL tutorial materials from CO-ODE project site (University of Manchester) http://www.co-ode.org/resources/tutorials/ NCBO (http://bioontology.org) More about Protg OWL Documentation on http://protege.stanford.edu/plugins/owl/documentation.

html Excellent tutorial by Mathew Horridge http://www.co-ode.org/resources/tutorials/ProtegeOWLT utorial.pdf Other resources at http://www.co-ode.org/resources/ Part 1 Part 2 Part 4

Part 3 Part 5 164 Exercise Goals Create Ontology of Plants and Animals Steps 1. Identify classes, properties, and instances 2. Identify definable & primitive classes 3. Organize primitive classes into a hierarchy 4. Create relations between primitive classes using properties.

5. Set domain and range constraints for the properties 6. Define the definable things using primitives, properties and OWL axioms 7. Check with Classifier 165 Initial Terms Plant Lassie Animal Dog Cat Eats

Cow Person Grass Herbivore Carnivore Gender Omnivore Buddha 166 Common mistakes Too much trust in natural language To much trust in natural language leads to

ambiguities. E.g. 'ontology' is used systematically ambiguous in natural language in order to refer: (a) to a field of scientific research and (b) a type of certain artifacts that are created by researchers. These are quite different entities that have to be treated as distinct entities. People tend to trust natural language naively and assume the following correspondence: One natural language expression corresponds to one entity. 168 Naive conceptualizations

Most computer scientists embrace naive conceptualization, they declare things like 'Fake Diamond is_a Diamond 'Absent leg is_a leg'. Besides the fact that it is nonsense, this is wrong, because now 'Absent leg' will inherit all properties from 'leg'. 169 Logical ambiguity Different readings of "part_of"

cell nucleus part_of cell all Xs are part of some Ys All-Some STRUCTURE carrot part_of vomitus. some Xs are part of some Ys Some-Some STRUCTURE 170 Confusion caused by "is_a" "is_a" used for both instance_of and subtype Correct: red is_a color, dictionary is_a

book Incorrect: this flower is_a red, this dictionary is_a book Correct: the color of this book instance_of red 171 Inheritance We use is_a for inheritance. All properties of the parent node should be inherited by the child node: everything which holds of color holds of red. part_of does not support inheritance: not everything which holds of cell holds of cell nucleus

something similar to inheritance holds for instance_of 172 Too much information in one ontology Most ontologies are is_a hierarchies of substance types. (Examples are the taxonomy of biological species or anatomical ontologies.) People often make the mistake to include relevant information in the ontology that belongs to another ontology, e.g. information about development state or pathology Correct: animal, mammal, dog Incorrect: animal, dog, brown dog, 6 year old brown dog The right solution is to keep the ontology of substance particulars and the ontology of attributes distinct.

173 ICD10 (1999): 587 codes for such accidents V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities 174

Part 1 Part 2 Part 4 Part 3 Part 5 175 Dos and Donts while creating your own ontology Barry Smith [email protected]

Why do we need [a higher] guidance? 1. Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) 2. Unintuitive rules for classification lead to entry errors (problematic links) 3.

Facilitate training of curators 4. Overcome obstacles to mapping with other ontology and terminology systems 5. Enhance harvesting of content through automatic reasoning systems 177 First Commandment: Univocity

Terms (including those describing relations) should have the same meaning on every occasion of use. In other words, they should refer to the same kinds of entities in reality Problem example: chromosome in Sequence Ontology and in Cell Component Ontology means different things 178 Example of univocity problem (Old) Gene Ontology: part_of = may be part of flagellum part_of cell

part_of = is at times part of replication fork part_of the nucleoplasm part_of = is included as a sub-list in 179 Second Commandment: Positivity Complements of classes are not themselves classes. Terms such as non-mammal or nonmembrane do not designate genuine classes.

180 Third Commandment: Objectivity Which classes exist is not a function of our biological knowledge. Terms such as unknown or unclassified or unlocalized: do not designate biological natural kinds do not designate differentiating characteristics [differentia] of biological natural kinds 181 Fourth Commandment: Single Inheritance

No diamonds No class in a classification hierarchy should have more than one is_a parent on the immediate higher level C is_a2 B is_a1 A

182 Problems with multiple inheritance is_a has two meanings breaks the rule of univocity the multiple meanings makes coherent integration across ontologies difficult Benefit: keeps the ontology simple by having multiple sorts of partitions brought together within

the same framework C is_a2 B is_a1 A 183 Copyright Stanford University 2006 184

Fifth Commandment: Intelligibility of Definitions The terms used in a definition should be simpler (more intelligible) than the term to be defined otherwise the definition provides no assistance to human understanding for machine processing 185 Sixth Commandment: Basis in Reality

When building or maintaining an ontology, always think carefully at how classes (types, kinds, species) relate to instances in reality If the Ontology is built to represent things that exist then the exchange format, data-model, xsd etc (application ontology), based on it always remains valid even if our interpretation changes (B.P. hypertension) 186 Seventh Commandment: Distinguish Universals and Instances A good ontology must distinguish clearly

between universals (types, kinds, classes) and instances (tokens, individuals, particulars) 187 The Seven Commandments 1. Univocity: Terms should have the same meanings on every occasion of use 2. Positivity: Terms such as non-mammal or non-membrane do not designate genuine classes. 3. Objectivity: Terms such as unknown or unclassified or unlocalized do not designate biological natural kinds.

4. Single Inheritance: No class in a classification hierarchy should have more than one is_a parent on the immediate higher level 5. Intelligibility of Definitions: The terms used in a definition should be simpler (more intelligible) than the term to be defined 6. Basis in Reality: When building or maintaining an ontology, always think carefully at how classes relate to instances in reality 7. Distinguish Universals and Instances 188 Not everyone is a believer The world of biomedical research is a world of difficult trade-offs The benefits of formal (logical and ontological) rigor need to be balanced

Against the constraints of computer tractability, Against the needs of biomedical practitioners. BUT alignment and integration of biomedical information resources will be achieved only to the degree that these principles of classification and definition are followed 189 Definitions should be intelligible to both machines and humans Machines can cope with the full formal representation Humans need to use modularity

Plasma membrane is a cell part [immediate parent] that surrounds the cytoplasm [differentia] 190 Principle of Compositionality The meanings of compound terms should be determined by the meanings of component terms together with the rules governing syntax

191 Principle of Syntactic Separateness Do not confuse sentences with ontology terms If you want to say: No As are Bs do not invent a new class of non-Bs and say A is_a non-B 192 Keep Epistemology Separate If you want to say that we do not know where As are located do not invent a new class of

As with unknown locations Example: Holliday junction helicase complex is-a unlocalized A well-constructed ontology should grow linearly [monotonically]; it should not need to delete classes or relations because of increases in knowledge 193 Some other rules of thumb 1. Dont confuse entities with concepts 2. Dont confuse entities with ways of getting to know entities

a brain is not the same as its CT-scan 3. Dont confuse entities with ways of talking about entities A persons medical record is not == person himself 4. Dont confuse entities with artifacts of your database representation ... e.g. multiple dosing event in PharmGKB

5. An ontology should not change when the ontology language changes The process of driving a car doesnt change whether you describe it in English or Spanish. 194 Guidelines for instances Every class has at least one instance Each child class has a smaller set of instances than its parent class Distinct classes on the same level never share instances

Distinct leaf classes within a classification never share instances 195 Principles for Relations in Ontologies Barry Smith [email protected] Benefits of well-defined relationships If the relations in an ontology are well-defined [All-Some structure], then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C). Relations used in ontologies thus far have not been well defined in this sense.

Find all DNA binding proteins should also find all transcription factor proteins because Transcription factor is_a DNA binding protein 199 How to define the is_a relation What does A is_a B mean? For all x, if x instance_of A then x instance_of some B cell division is_a biological process ALL-SOME STRUCTURE 201 How to define A part_of B

What does A part_of B mean? For all x, if x instance_of A then there is some y, y instance_of B and x part_of y where part_of is the instance-level part relation cell nucleus part_of cell ALL-SOME STRUCTURE 203 Kinds of relations Between classes: is_a, part_of, ...

Between an instance and a class this explosion instance_of the class explosion Between instances: Marys heart part_of Mary 204 How many relations do we need? Properties of Relations 1. 2. 3.

4. 5. Transitivity Symmetry Reflexivity Anti-Symmetry Avoid putting _ between arbitrary characters and calling it a relation is_somehow_related_to

is the worst kind of relation to create! 205 Dont forget instances when defining relations part_of as a relation between classes versus part_of as a relation between instances nucleus part_of cell your heart part_of you What holds on the level of instances may not hold on the level of universals

nucleus adjacent_to cytoplasm Not: cytoplasm adjacent_to nucleus seminal vesicle adjacent_to urinary bladder Not: urinary bladder adjacent_to seminal vesicle 206 Time matters e.g. derives_from C C1

c at t c1 at t1 time C' c' at t instances ovum zygote derives_from sperm

207 The take home Follow a methodology which enforces clear, coherent definitions for entities and relationships This promotes quality assurance intent is not hard-coded into software Meaning of relationships is defined, not inferred Enables automated reasoning across ontologies and across data at different granularities 209

Acknowledgements NCBO is funded by NIH Roadmap initiative Protg and Protg-OWL are supported by grants and contracts from the NIH Daniel Rubin and Andrew Spear for contributing to slides and handout. 210 End

Recently Viewed Presentations

  • 2017 Geospatial Summit Guest Presenter Instructions As a

    2017 Geospatial Summit Guest Presenter Instructions As a

    2017 Geospatial Summit Guest Presenter Instructions As a guest speaker, we look forward to your providing a "reality check" for the plans NGS will be describing in the early part of the summit; both by helping us to reinforce the...
  • TRUST ADMINISTRATION Paul B Davis, Higgs & Johnson

    TRUST ADMINISTRATION Paul B Davis, Higgs & Johnson

    Section 82 - Managing Trustee. A trust instrument may contain provisions by virtue of which the exercise of any of the trustee's powers may be reserved to a managing trustee and no other trustee shall be liable for any of...
  • Diapositiva 1 - Ning

    Diapositiva 1 - Ning

    "Café Orgánico del Perú, Buena Vida, Buena Salud"… Fernando Rubio Tarapoto - Perú ¿CÓMO HACER PARA QUE MÁS PERUANOS CONSUMAN CAFÉ? I.
  • In Mrs Tilscher&#x27;s Class - English

    In Mrs Tilscher's Class - English

    What makes the speaker in 'Stealing' so fascinating? ... With your partner discuss what we are told in the poem. ... In Mrs Tilscher's Class. The poet tries to engage the reader's memories about primary school through the use of...
  • Robert Arneson Funk Ceramics Funky info about Arneson

    Robert Arneson Funk Ceramics Funky info about Arneson

    Dealt with nuclear proliferation in his work. Review California Artist Died in 1992 Helped to create Funk Ceramics which shocked and Robert Arneson Funk Ceramics Funky info about Arneson An initiator of the Funk movement in which clay sculpture amused,...
  • tHE traditions of the apostoles: obsolete?

    tHE traditions of the apostoles: obsolete?

    ESV Romans 16:3-5 . Greet Prisca and Aquila, my fellow workers in Christ Jesus . . . Greet also the church in their house. ESV Colossians 4:15 . Give my greetings to . . . Nympha and the church in...
  • Narrative Writing Task: training for markers

    Narrative Writing Task: training for markers

    Narrative Writing Task: training for markers ... The conversational "and" is an indication that students have not yet begun to master subordination. A complex sentence A complex sentence has a main or principal clause and a subordinate clause. ... And...
  • DIAZINES N N N N N Pyridazine 1,2-Diazine

    DIAZINES N N N N N Pyridazine 1,2-Diazine

    DIAZINES Only C-5 in pyrimidine NOT elctron def. Diazines-Reacts even less readily with electrophiles than pyridine-Reacts easily with nucleophiles (additions / substitutions)-Reacts with nucleophilic radicals (Minisci)-Reacts as dienes in DA cycloadd. (less aromatic than pyridine)