The Geometry of IR Keith van Rijsbergen (lost

The Geometry of IR Keith van Rijsbergen (lost

The Geometry of IR Keith van Rijsbergen (lost in Hilbert space!) Tampere 15th August, 2002 CvR 1 Unscripted comments I States

Observables Measurement => Reality? Projection Postulates Cognitive State Changes CvR 2 Unscripted comments II (quoting John von Neumann) However, all quantum mechanical probabilities are defined by

inner products of vectors. Essentially if a state of a system is given by one vector, the transition probability in another state is the inner product of the two which is the square of the angle between them. In other words, probability correspond precisely to introducing the angles geometrically. Furthermore, there is only one way to introduce it. The more so because in the quantum mechanical machinery the negation of a statement, so the negation of a statement which represented by a linear set of vectors, correponds to the orthogonal complement of this linear space. Unsolved problems in mathematics, typescript, September, 1954 CvR 3

What is this talk about? Not about quantum computation. see Nielsen and Chuang, CUP, 2000 Not about Logic see Engesser and Gabbay, AI, 2002 History (von Neumann, Dirac, Schroedinger) Motivation (complementarity) Duality (Syntax/Semantics) Measurement (Incompatibility) Projections (subspaces) Probability (inner products) IR application (feedback, clusters, ostension) CvR

4 CvR 5 CvR 6 Images not Text: how might that make a difference?

no visual keywords (yet) - tf/idf issue aboutness revisable (eg Maron) relevance revisable (eg Goffman) feedback requires salience aboutness -> relevance -> aboutness CvR 7 This is not new! Goffman, 1969: ..that the relevance of the information from one document depends upon what is already known

about the subject, and in turn affects the relevance of other documents subsequently examined. Maron, : Just because a document is about the subject sought by a patron, that fact does not imply that he would judge it relevant. CvR 8 Marons theory of indexing ..in the case where the query consists of single

term, call it B, the probability that a given document will be judged relevant by a patron submitting B is simply the ratio of the number of patrons who submit B as their query and judge that document as relevant, to the number of patrons, who submit B as their search query CvR 9 In 1949 D.M Mackay wrote a paper Quantal aspects of scientific information, SER, vol 41, no.314,

in which he alluded to using the quantum mechanics paradigm to IR JPEG Image CvR 10 Expectation Catalogue It (-function) is now the means for predicting probability offunction) is now the means for predicting probability of measurement results. In it is embodied the momentarily-function) is now the means for predicting probability ofattained sum of theoretically based

future expectation, somewhat as laid down in a catalogue. It is the relation-and-determinacy-bridge between measurements and measurements...... It is, in principle, determined by a finite number of suitably chosen measurement on the object.....Thus the catalogue of expectations is initially compiled. Schrdinger, 1935 &1980 CvR 11 Hypotheses Cluster Hypothesis: closely associated documents tend to be

relevant to the same requests. (1971) [co-ordination is positively correlated with external relevance, Jackson, 1969] Association Hypothesis: If an index term is good at discriminating relevant from non-relevant documents then any closely associated index term is also likely to be good at this. (1979) [co-occurrence of terms within documents is a suitable measure of similarity between terms, Jackson,1971] CvR 12 Navigation - Browsing

T-space D-space CvR 13 DUALITY Direct file/Inverted file Statespace/Space of Projections d = (x,y,z,u,v,w)d =(u,v,w,k,l,m) [[u]] = {d,d}; [[x]] = {d}; [[m]] = {d} Boolean Logic: [[ux]] = {d}; [[xm]] ={d,d}

Quantum Logic: [[ux]] = same; [[xm]] = different CvR 14 The mathematics you need Hilbert space (complex!!!) CvR inner product norms

||x||2 = operator (linear) Hermitian A*=A tr(A) =aii trace eigenvalues Ax = x 15 Crash course on Dirac notation

|x> : vector (called ket) *: functional (bra) = (row vector)(column vector)= xi*yi |x> I = |i>

P0 = Pn = I P1 = |1><1| P2 = |1><1| +|2><2| . . . Pn = |1><1| ++|n>

Relevance/Aboutness Observables Queries Operators State function Documents Operators can be applied to state function; and operators can be decomposed into projectors.

A= CvR aP i i 18 That is the relevance or irrelevance of a given retrieved document may affect the users current state of knowledge

resulting in a change of the users information need, which may lead to a change of the users perception/ interpretation of the subsequent retrieved documents. Borlund, 2000 CvR 19 Relevance/Aboutnes is Interaction/User dependent

N Y N T CvR Y R T

Y N T 20 probability as inner product |t> |t>|2 |t>

CvR 21 |t=0> |r=0> x |r=1> |t=1>

CvR 22 An operator T is of trace-class provided that T is positive ( 0, x) and trace of T is finite () T is a density operator if T is trace-class and tr(T) = 1 T = aiPi is a density operator if 0 ai and ai = 1 CvR

23 Theorem Letbe any measure on the closed subspaces of a separable (real or complex) Hilbert space H of dimension at least 3. There exists a positive self-adjoint operator T of trace class such that, for all closed subspace L of H, (L) = Tr(TPL) If is to be a probability measure, thus requiring that (H) = 1, then Tr(T) =1, that is, T is a density operator. CvR

24 Conditional Probability P(LA|LB) = tr(PBDPBPA) / tr(DPB) Note that PA could be E -> F CvR 25 What is T? without blinding you with science

-Relevance Feedback ( a mixture with log weights) -Pseudo relevance feedback (a mixture with similarity weights) -Clustering (superposition of members?) -Ostension (a history) CvR 26 Conclusions? Is it worth it? Does it matter? - images

- logic/probability/information/vectors - language CvR 27 Useful References Readings in Information Retrieval,Morgan Kaufman, Edited by Sparck Jones and Willett Advances in Information Retrieval: Recent Research from CIIR, Edited by Bruce Croft. Information Retrieval: Uncertainty and

Logics,Advanced Models for the Representation and Retrieval of Information, Edited by Crestani, Lalmas, Van Rijsbergen. Finding out about, Richard Belew. CvR 28

Recently Viewed Presentations

  • Human Communication - srsiwok

    Human Communication - srsiwok

    Human Communication: The Basic Course Twelfth Edition PowerPointâ„¢ Presentations prepared by: Naomi Young University of California, San Diego Joseph A. DeVito Hunter College of the City University of New York ... Naomi Young University of California, San Diego Members and...
  • Abstract Art & Photography 1 Objectives  Learn about

    Abstract Art & Photography 1 Objectives Learn about

    What might the artist be trying to say about former Prime Minister Tony Blair? * David Hockney, Mother 1, 1885 Pablo Picasso, Dora Maar Seated, 1941 Artist David Hockney used photographs to create photo montages which he has termed joiners....
  • Lojban - MIT ESP

    Lojban - MIT ESP

    The words fa, fe, fi, fo, fu can be placed right before a sumti to indicate the place in the selbri's place structure that the sumti fills. sumti that are not preceded by a case tag fill the lowest numbered...
  • Antidepressant Use in Children and Adolescents

    Antidepressant Use in Children and Adolescents

    The book takes place as Conrad tries to readjust to life back at home after a psychiatric hospitalization for a suicide attempt. He sees a psychiatrist, Dr. Berger, who tries to address Conrad's chief desire upon presenting (e.g., he wants...
  • + Monitoring Access Tools and Accommodations in a

    + Monitoring Access Tools and Accommodations in a

    Accoms must be available - however, this does not mean a student MUST use them. our current system has no way of tracking if a student uses or not - and this will be addressed later in presentation. Show next...
  • Indian Festivals There are lots and lots of

    Indian Festivals There are lots and lots of

    Deepawali or Diwali The Indian New Year festival is called Diwali. It is on the day of New Moon. All houses and streets are decorated with thousands of lights There are lots of fireworks in every house to celebrate it....
  • On coding for networks with errors

    On coding for networks with errors

    Non-multicast demands, multiple sources, rateless codes, non-uniform link capacities . New capacity bounding and coding techniques . Applications. Streaming (non-multicast nested network) Distribution of keys/decentralized data (multi-source network) Computationally limited networks (rateless codes)
  • MAN THE BLOCK CITYWIDE - MAY 16, 2016 - Fatherhood

    MAN THE BLOCK CITYWIDE - MAY 16, 2016 - Fatherhood

    In 2015, after having implemented the Man The Block Richard Wright Safe Passage Program with success through before and after school posts, as well as bike and pedestrian escorts teams, the Man The Block initiative was expanded through the combined...