1 Armadillo Data Extraction Across Multiple Text Datasets

1 Armadillo Data Extraction Across Multiple Text Datasets

1 Armadillo Data Extraction Across Multiple Text Datasets for Arts and Humanities Research Mark Greengrass University of Sheffield 15 July 2007 (c) M.Greengrass 2 Response to the RePAH questionnaire (2005-6),

aggregate of all Arts and Humanities respondants (Repah: A User Requirements Analysis Report (2006), p. 102. 15 July 2007 (c) M.Greengrass 3 15 July 2007 (c) M.Greengrass Repah, A user requirements analysis

4 Some Distinctive Features of in Historians Approach to their Evidence Promiscuous range of sources consulted Firm distinction between primary and secondary sources Complex dialogue between existing historiography and constitutive source materials Reiterative process of open interrogation of source materials A coherent narrative consists of one composed (generally) from more than one 15 July 2007 (c) M.Greengrass

5 Historians Database Challenge Growing number of (mainly text-based) historical datasets in electronic media, furnished from a wide variety of providers These datasets utilise a variety of different historical sources They contain varying amounts of encoded information (dependant on the historical question being asked by the PI; and by the constraints of the particular source being used) The information is encoded in different ways The delivery formats used also vary widely 15 July 2007 (c)

M.Greengrass 6 15 July 2007 (c) M.Greengrass Sources The Marine Society Registers Prerogative Court of Canterbury Wills St. Martins Settlement Exams Index

WESTCAT Metropolitan London in the 1690s IHR The Westminster Historical Database The Proceedings of the Old Bailey Eighteenth Century Fire Insurance Policies Collage image databse

Guildhall Library Harbens Dictionary of London John Strypes Survey Selected Criminal Records TNA http://www.motco.com House of Lords Journals BOPCRIS 15 July 2007

AHDS Deposits (c) M.Greengrass 7 The Old Bailey Proceedings: XML

WilliamMawn was Tryed for stealing a Bay Gelding price 20 l. from one ThomasLane out of

Berkshire on the 25th of April. The Witness swore that the Horse was found in the Prisoner's custody in Smithfield, which the Prosecutor owned to be his. The Prisoner could not produce any Evidence to prove that he came honestly by the Horse only produc'd a Felonious person, that was no stranger to Newgate, who went under the Notion of his Man, he declared that the Prisoner bought the Horse upon the Road beyond Uxbridge. The Prisoners being found in several faultering stories, he was found Guilty .

[Death. See summary.]

15 July 2007 (c) M.Greengrass

8 Canterbury Wills: Delimited Text 2530553 2530553 W W Agnes Kervill or Kervytt Andrew Bridham London 2530553

W Andrew London 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553 2530553

2530553 2530553 2530553 2530553 W W W W W W W W W W W W W W

W W W Austin Hawkyns Cecilia Foster Christian Chepman Christian Cust David Syadine Bristol, Edmund Bybbesworth Edward Wellys Hadley, Ellen Lacy Widow Saint Pe Gerard Heshull

Guy Shuldham Helmingus Leget Henry Porter Henry Warlegh Keynesha Henry Wellis Hugh Caundyssh Hugh Geynesburgh Rector Isabelle Woodhill 15 July 2007 (c) M.Greengrass

Pykeman 9 10 The Issues Can the technologies developed for the semantic web help us: To structure the (different) encoded information across varying sources in a way that the user community will find (research) fruitful? To understand the way in which these different sources relate to one another, such that they can be used in an intelligent fashion? To bootstrap relevant historical/semantic information from one source, by using another? 15 July 2007

(c) M.Greengrass 11 Data Sharing and Data Reuse Reuse means to build new applications, assembling components already built 15 July 2007 (c) Sharing is when different applications use the same resources

Oscar Korcho (with acknowledgement) 12 Ontologies Problem Solving Methods escribe domain knowledge in a generic wayDescribe the reasoning process of a dataset d provide agreed understanding of a domain (Knowledge-Based System) in a domain-independent manner Interaction Problem Representing Knowledge for the purpose of solving some problem is strongly affected by the nature of the problem

and the inference strategy to be applied to the problem Bylander Chandrasekaran, B. Generic Tasks in knowledge-based reasoning.: the right level of abstraction for knowledge acquisitio In B.R. Gaines and J. H. Boose, EDs Knowledge Acquisition for Knowledge Based systems, 65-77, London: Academic Press 1988. 15 July 2007 (c) O. Corcho (with acknowledgement) Definitions of an Ontology 1. An ontology defines the basic terms and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define 13

Neches R, Fikes RE, Finin T, Gruber TR, Senator T, Swartout WR (1991) Enabling technology for knowledge sharing. AI Magazine 12(3):3656 extensions to the vocabulary 2. An ontology is an explicit specification of a conceptualization 3. An ontology is a formal, explicit specification of a shared conceptualization Gruber TR (1993a) A translation approach to portable ontology specification. Knowledge Acquisition 5(2):199220 Studer R, Benjamins VR, Fensel D (1998) Knowledge Engineering: Principles and Methods. IEEE Transactions on Data and Knowledge

Engineering 25(1-2):161197 4. A logical theory which gives on explicit, partial account of a conceptualization Guarino N, Giaretta P (1995) Ontologies and Knowledge Bases: Towards a Terminological Clarification. In: Mars N (ed) Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing (KBKS95). University of Twente, Enschede, The Netherlands. IOS Press, Amsterdam, The Netherlands, pp 2532 5. A set of logical axioms designed to account for the intended meaning of a vocabulary Guarino N (1998) Formal Ontology in Information Systems. In: Guarino N (ed) 1st International Conference on

Formal Ontology in Information Systems (FOIS98). Trento, Italy. IOS Press, Amsterdam, pp 315 15 July 2007 (c) O. Corcho (with acknowledgement) 14 Key Components of an Ontology Concepts are organized in taxonomies Relations R: C1 x C2 x ... x Cn-1 x Cn Subclass-of: Concept 1 x Concept2 Connected to: Component1 x Component2

Functions F: C1 x C2 x ... x Cn-1 --> Cn Mother-of: Person --> Women Price of a used car: Model x Year x Kilometers --> Price Instanc Elements es Axioms Sentences which are always true 15 July 2007 (c) M.Greengrass Semantic Continuum and Formality Shared human

consensus Semantics hardwired; used at runtime Text descriptions Informal [explicit] Implicit e.g. Language Formal (for humans) e.g. dictionaries

15 July 2007 (c) e.g. library catalogues M.Greengrass, after Corcho Semantics processed and used at runtime Formal [for machines] E.g. see below 15

16 15 July 2007 (c) M.Greengrass 17 15 July 2007 (c) M.Greengrass 18 http://www.vicodi.org

15 July 2007 (c) M.Greengrass 19 Webbased seconda ry historical writing Primary sources (historica l documen ts;

images; artefacts) in elecronic 15 July 2007 (c) M.Greengrass top-down ontologies (generated from discipline-accepted taxonomies) middle-out ontologies (generated by intelligent

iteration) bottom-up ontologies (generated from a representative sample of canonical data 20 15 July 2007 (c) M.Greengrass 21 John Wilkins, An Essay

towards a Real Character and a Philosophical Language (1668) 15 July 2007 (c) M.Greengrass 22 15 July 2007 (c) M.Greengrass 23

15 July 2007 (c) M.Greengrass 24 15 July 2007 (c) M.Greengrass 25 15 July 2007

(c) M.Greengrass 26 15 July 2007 (c) M.Greengrass 27 15 July 2007 (c) M.Greengrass

28 15 July 2007 (c) M.Greengrass 29 15 July 2007 (c) M.Greengrass 30

Armadillo a Semantic Agent Retrieves information according to pre-agreed ontologies Takes account of deviations in spelling, typographic formatting and contextual information Makes use of delimited fields and tagged data as oracles to provide firm instantiations of elements in an ontology to apply to electronic materials which have no such structure 15 July 2007 (c) M.Greengrass

31 15 July 2007 (c) M.Greengrass 32 15 July 2007 (c) M.Greengrass 33 15 July 2007

(c) M.Greengrass 34 15 July 2007 (c) M.Greengrass 35 15 July 2007 (c)

M.Greengrass 36 15 July 2007 (c) M.Greengrass 37 15 July 2007 (c) M.Greengrass 38

15 July 2007 (c) M.Greengrass 39 15 July 2007 (c) M.Greengrass 40 15 July 2007

(c) M.Greengrass Automated Text-Mining, used for tagging purposes in Central Criminal Court records 41


Held on Monday, December 17th, 1866, and following days,

BEFORE THE RIGHT HON. THOMAS GABRIEL, LORD MAYOR of the City of London; Sir JOHN MELLOR, Knt., one of the Justices of Her Majesty's

Court of Queen's Bench; WILLIAM TAYLOR COPELAND, Esq., THOMAS CHALLIS, Esq., THOMAS QUESTED FINNIS, Esq., Sir ROBERT WALTER CARDEN, Knt., and WILLIAM Automated Text-Mining, used for tagging purposes in Central Criminal Court records with less success!



Held on Monday, July 22nd, 1912, and following days.

Before the Right Hon. Sir THOMAS BOOR CROSBY, M.D., LORD MAYOR of the said City of London; the Right Hon. Lord COLERIDGE, one of the Justices of His Majesty's High Court; Sir HENRY KNIGHT, Knight; Sir HORATIO DAVIES, K.C.M.G.; Sir JOHN POUND, Bart.; Sir GEORGE W. TRUSCOTT, Bart.; Sir CHARLES JOHNSTON, Knight; and Sir HORACE B. MARSHALL, Knight, LL.D., Aldermen of the said City;

Sir FORREST FULTON, Not identified Knight, K.C., Recorder of the said City; Sir FK. Not ALBERT 15 July 2007 (c) M.Greengrass identified BOSANQUET, K.C., Common Serjeant of the said City;

Recently Viewed Presentations

  • World CP Day 2016

    World CP Day 2016

    World CP Day: October 6. World CP Day. is an opportunity to: Celebrate. the lives and achievements of people with CP. Create a powerful . voice. for people with CP and their families to change their world. Showcase the best...
  • Dos-response Japan Nuclear Radiation

    Dos-response Japan Nuclear Radiation

    dos-response japan nuclear radiation Chernobyl Nuclear Disaster Introduction Occurred on 26th April 1986 at reactor No. 4 of nuclear power plant at Chernobyl. The operators switched off an important control system -> reactor reached unstable state -> A sudden power...
  • Fudge a Mania by Judy Blume

    Fudge a Mania by Judy Blume

    Fudge-a-Mania by Judy Blume. Chapters 9-11. antique - an old piece of furniture or other object that is usually valuable. A typewriter is an antique. We don't use them anymore. Now we use computers.
  • Generation Gaps - Metropolitan State University of Denver

    Generation Gaps - Metropolitan State University of Denver

    Generations Today. For the first time, there are four different generations working together in the workforce. This assists in creating an environment of collaboration and innovation.
  • Intro to Major Schools of Critical Theory

    Intro to Major Schools of Critical Theory

    According to the dictionary: a philosophical approach to culture, and especially to literature, that seeks to confront the social, historical, and ideological forces and structures that produce and constrain it. ... "What does this word choice add to the text...
  • "When I pray, I speak to G-d; when I study, G-d speaks to me."

    "When I pray, I speak to G-d; when I study, G-d speaks to me."

    (Pirkei Avot II:6 - see Siddur Sim Shalom for Shabbat and Festivals, The Rabbinical Assembly, New York, 1998, p. 261) St. Vincent DePaul's teaching - that it is better to teach a man to fish than to give him a...
  • Extreme Windstorms Catalogue: A tool for re-insurers and ...

    Extreme Windstorms Catalogue: A tool for re-insurers and ...

    Lavers et al. (2011, 2012) linked narrow bands of moisture convergence (Atmospheric Rivers) to winter flooding in the North-West of England. Atmospheric River - defined using vertically integrated horizontal water vapour transport (IVT), present for 18 hours, extending over 2000km,...
  • It's All GrEEk to Me.. Accountability for Beginners

    It's All GrEEk to Me.. Accountability for Beginners

    It's All GrEEk to Me…Accountability for Beginners. Kim Gilson. ... A vertical scale is a scale score system that allows for direct comparison of student test scores across grade levels within a content area. ... Given that it is more...