Artificial intelligence

Artificial intelligence

Artificial Intelligence The different levels of language analysis Fall 2008 professor: Luigi Ceccaroni The different levels of language analysis A NL system must use considerable knowledge about the structure of the language itself, including: What the words are

How words combine to form sentences What the words mean How word meanings contribute to sentence meaning 2 The different levels of language analysis We cannot completely account for linguistic behavior without also taking into account: humans general world knowledge their reasoning abilities For example, to participate in a conversation it is necessary: Knowledge about the structure of the language,

but also Knowledge about the world in general Knowledge about the conversational setting in 3 particular The different levels of language analysis The following are some of the different forms of knowledge relevant for NL understanding:

Phonetic and phonological knowledge Morphological knowledge Syntactic knowledge Semantic knowledge Pragmatic knowledge Discourse knowledge World knowledge 4 Phonetic and phonological knowledge It concerns how words are related to the sounds that realize them Such knowledge is crucial for speechbased systems

5 Morphological knowledge Use of lexicons Reference books containing an alphabetical list of words with information about them Lexeme Information cant- cantar V / Infinitive -o/-es/-a/-em/-eu/-en 6

Morphological knowledge Versi simple: utilitzaci de formaris (llista de formes amb informaci morfolgica i els lexemes corresponents) Morfema = lexema (o arrel) o gramema Lexema Gramema cant o es a em en Morphological knowledge Analitzadors morfolgics:

Diccionaris de morfemes: diccionari darrels (lexemes), de sufixes, prefixes, infixes Morfotctica: regles de combinaci de morfemes Variacions fonolgiques: canvis al combinar els morfemes (ex., ploure, plovisquejar) Tipus danalitzadors FSA (finite state automaton) FST (finite state transducer) cascada de FSTs Syntactic knowledge Detection of tractable units: paragraphs and sentences:

Location of marks of punctuation: ., ?, !, Problems: acronyms, initials Machine learning and classification techniques They take into account contextual information 9 Syntactic knowledge Detection of units of meaning Recognition and adequate grouping of words /Parlar/ /sens dubte/ /de/ /les/ /reestructuracions/ /urbanes/ /a/ /Sant Cugat/ Assignment of grammatical categories

Addition of semantic information to lexical units (use of ontologies and dictionaries) Acknowledgment and classification of 10 proper names and entities Syntactic knowledge Correspondence between orthographic and grammatical words dna-mho, dmelo (1 orthographic word, 3 grammatical words) sens dubte, sin embargo (2 orthographic words, 1 p. grammatical word) Homonymy Same form, different grammatical categories:

wheel (rueda, ruleta): noun wheel (empujar): verb 11 Syntactic knowledge Polysemy Same form and category, different meanings: bank (grupo): data bank, electronic resource bank bank (banco): bank account, bank balance bank (ribera): river bank, river + burst its banks Acronyms The cells DNA sample was identified by PRC, a process approved by the official UBI. 12

Syntactic knowledge Abbreviations For example, the word "abbreviation" can itself be represented by the abbreviation "abbr." or "abbrev. Formulae and units of measurements One of many famous formulae is Albert Einstein's E = mc. In the US the inch is now defined as 0.0254 m, and the avoirdupois pound is now defined as 453.59237 g. 13 Syntactic knowledge Syntactic categories: adjective (ADJ)

adjective phrase (ADJP) adverb (ADV) adverbial phrase (ADVP) article (ART) auxiliary verb (AUX) determiner (DET) noun (N) noun phrase (NP) preposition (P) prepositional phrase (PP) pronoun (PRO) relative clause (REL) relative pronoun (RELPRO) quantifying determiner (QDET) sentence (S)

verb (V) verb phrase (VP) 14 Syntactic knowledge Problema de la granularitat (verb -> transitiu/intransitiu) Propietats sintctiques de concordana gnere (mascul/femen) nombre (singular/plural) persona (primera, segona...) cas (acusatiu,datiu..) Representation Altres propietats sintctiques: Tipus de complement del verb Preposicions que accepta una paraula

Categoria semntica Informaci morfolgica Derivaci: prefixos/infixos/sufixos plov + -isquej- + ar re- + estructura + -cio + -ns prefix arrel sufix sufix Representation Informaci lxica

repetici nom plura l re- + estructura + -cio + -ns prefix arrel sufix sufix

Representation Informaci lxica diminutiu infinitiu plov + -isquej- + ar arrel infix sufix Syntax and semantics Examples: John sold the book to Mary. The book was sold to Mary by John.

*After it fell in the river, John sold the book to Mary. After it fell in the river, the book was sold to Mary by John. *John are in the corner. *John put the book. Flying planes are dangerous. Flying planes is dangerous. Collaboration between parsers Anlisi Sintctica sense sintaxi sense semntica procs en cascada (1) sintaxi | semntica Interpretaci Semntica

procs en cascada (2) {sintaxi + filtre semntic} | semntica procs en parallel {sintaxi, semntica} Pre-process Segmentaci Localitzaci dunitats (paraules) Lematitzaci, anlisi morfolgica

Desambiguaci morfosintctica (POStagging) Etiquetat semntic Desambiguaci semntica (WSD) Detecci i classificaci dentitats amb nom (named entity recognition, NER) Quina es la capital de Frana? resultat de l'anlisi morfolgica quina quin s sser la el capital NCMS000 de Frana

? ? DT0FS00 quina VMIP3S0 TDFS0 ell capital AQPCS00 de SPS00 frana NP00000-loc Fit resultat del POS-tagging quina quin

s sser a el capital de de Frana ? ? DT0FS00 VMIP3S0 TDFS0 capital NCFS000 SPS00 frana NP00000-loc

Fit Example NCFS000 PP3FSO00 la capital NCFS000 I capital Post-process Anlisi semntica - pragmtica Anlisi illocutiva Reconeixement dintencions

The component steps of communication A typical communication episode, in which speaker S wants to inform hearer H about proposition P using words W, is composed of 7 processes: Intention. Speakers S decides that there is some proposition P that is worth saying to hearer H. Generation. The speaker plans how to turn the proposition P into an utterance that makes it likely that the hearer can infer the meaning P (or something close to it). 24 The component steps of communication Synthesis. The speaker produces the

physical realization W of the words W. This can be via ink on paper, vibrations in air, or some other medium. Perception. H perceives W as W2 and decodes it as the words W2. When the medium is speech, the perception is called speech recognition; when it is printing, it is called optical character recognition (OCR). Both are now commonplace. 25 The component steps of communication Analysis. We divide it into 3 main parts: Syntactic interpretation (or parsing) is the process of building a parse tree for an input string. The interior nodes of the parse tree represent phrases and the leaf nodes

represent words. Semantic interpretation is the process of extracting the meaning of an utterance. Utterances with several possible interpretations are said to be ambiguous. Pragmatic interpretation takes into account the fact that the same words can have different meanings in different situations. 26 The component steps of communication Disambiguation. H infers that S intended to convey Pi (where ideally Pi = P). Communication works because H does the work of figuring out which interpretation is the one S probably meant to convey. Disambiguation is a process that depends heavily on

uncertain reasoning. Incorporation. H decides to believe Pi (or not). A totally naive agent might believe everything it hears. A sophisticated agent treats the speech act as evidence for Pi, not confirmation of it. 27 Syntactic analysis Objectives: Determining if a sentence is syntactically correct Creating a structure with information which can be used during the semantic analysis Syntactic analysis Alphabet (vocabulary):

Concatenation operations * (free monoid): set of all strings that can be formed with symbols of Language: L * * Given a string w1n of *: w1n = w1, , wn wi We have to determine if w1n L Ways to define membership Grammar G L(G) w1n L(G) ? Language model P(w1n) si P(w1n) > 0 w1n L

Corpora (sentences, patterns), which define correct sentences: Syntactic dictionaries Composition rules Most usual way: grammar Non-terminal vocabulary (set of variables) Initial variable Production set Terminal vocabulary (alphabet) V=

V = vocabulary S V Grammars and parsing To examine how the syntactic structure of a sentence can be computed: Grammar, a formal specification of the structures allowable in the language Parsing technique, the method of analyzing a sentence to determine its structure according to the grammar The most common way of representing how a sentence is broken into its major subparts (constituents), and how those subparts are broken up in turn, is a tree.

Grammars and sentence structure Tree representation for the sentence Adri menja el bacall: S VP NP NAME NP V ART Adri

menja el N bacall Grammars and sentence structure The sentence (S) consists of an initial noun phrase (NP) and a verb phrase (VP). The initial noun phrase is made of the simple NAME Adri. The verb phrase is composed of a verb (V) menja and an NP, which consists of an article (ART) el and a common noun (N) bacall. In list notation this same structure could be represented

as: (S (NP (NAME Adri)) (VP (V menja) (NP (ART el) (N bacall) ))) Grammars and sentence structure To construct a tree structure for a sentence, you must know what structures are legal. A set of rewrite rules describes what tree structures

are allowable. These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols. A set of rules constitutes a grammar: 1. 2. 3. 4. 5. 6. 7. 8. S NP VP VP V NP NP NAME NP ART N

NAME Adri V menja ART el N bacall Grammars and sentence structure Rule 1 says that an S may consist of an NP followed by a VP. Rule 2 says that a VP may consist of a V followed by an NP. Rules 3 and 4 say that an NP may consist of a NAME or may consist of an ART followed by an N. Rules 5 - 8 define possible words for the categories. Grammars consisting entirely of rules with a single symbol on the left-hand side, called the mother, are called context-free grammars (CFGs).

Grammars and sentence structure Context-free grammars (CFGs) are a very important class of grammars because: the formalism is powerful enough to describe most of the structure in natural languages, yet is restricted enough so that efficient parsers can be built to analyze sentences.

Symbols that cannot be further decomposed in a grammar (the words Adri, menja) are called terminal symbols. The other symbols, such as NP and VP, are called nonterminal symbols. The grammatical symbols such as N and V that describe word categories are called lexical symbols. Many words will be listed under multiple categories. For example, poder would be listed under V (can) and N (power). Grammars have a special symbol called the start symbol. Usually, the start symbol is S (also meaning sentence). Grammars and sentence structure A grammar is said to derive a sentence if there is a sequence of rules that allow you to rewrite the start symbol into the sentence, for instance, Adri menja el

bacall. This can be seen by showing the sequence of rewrites starting from the S symbol, as follows: S => NP VP => NAME VP => Adri VP => Adri V NP => Adri menja NP => Adri menja ART N => Adri menja el N => Adri menja el bacall (rewriting S) (rewriting NP)

(rewriting NAME) (rewriting VP) (rewriting V) (rewriting NP) (rewriting ART) (rewriting N) Grammars and sentence structure Two important processes are based on derivations: The first is sentence generation, which uses derivations to construct legal sentences. A simple generator could be implemented by randomly choosing rewrite rules, starting from the S symbol, until you have a sequence of words. The preceding example shows that the sentence Adri menja el

bacall can be generated from the grammar. The second process based on derivations is parsing, which identifies the structure of sentences given a grammar. Parsing as a search procedure In derivations, there are two basic methods of searching: A top-down strategy starts with the S symbol and then searches through different ways to rewrite the symbols until the input sentence is generated, or until all possibilities have been explored. The preceding example demonstrates that Adri menja el bacall is a legal sentence by showing the derivation that could be found by this process. Parsing as a search procedure

In a bottom-up strategy, you start with the words in the sentence and use the rewrite rules backward to reduce the sequence of symbols until it consists solely of S. The left-hand side of each rule is used to rewrite the symbol on the right-hand side. A possible bottom-up parse of the sentence Adri menja el bacall is: => NAME menja el bacall (rewriting Adri) => NAME V el bacall (rewriting menja) => NAME V ART bacall (rewriting el) => NAME V ART N (rewriting bacall) => NP V ART N (rewriting NAME) => NP V NP (rewriting ART N) => NP VP (rewriting V NP) => S (rewriting NP VP) A tree representation can be viewed as a record of the CFG rules that account for the structure of the sentence.

What makes a good grammar In constructing a grammar for a language, you are interested in: generality, the range of sentences the grammar analyzes correctly; selectivity, the range of non-sentences it identifies as problematic; understandability, the simplicity of the grammar itself. Generative capacity Grammatical formalisms based on rewrite rules can be compared according to their generative capacity, which is the range of languages that each formalism can describe. It turns out that no natural language can be characterized precisely enough to define the

generative capacity. Formal languages, however, allow a precise mathematical characterization. Generative capacity Consider a formal language consisting of the symbols a, b, c and d (think of these as words). Then consider a language L1 that allows sequences of letters in alphabetical order. For example, abd, ad, bcd and abcd are all legal sentences. To describe this language, we can write a grammar in which the righthand side of every rule consists of one terminal symbol possibly followed by one nonterminal. Such a grammar is called a regular grammar. For L1 the grammar would be: S -> a S1 S -> b S2 S -> c S3

S -> d S1 -> b S2 S1 -> c S3 S1 -> d S2 -> c S3 S2 -> d S3 -> d Generative capacity Consider another language, L2, that consists only of sentences that have a sequence of as followed by an equal number of bsthat is, ab, aabb, aaabbb, and so on. You cannot write a regular

grammar that can generate L2 exactly. A context-free grammar to generate L2, however, is simple: S -> a b S -> a S b Generative capacity Some languages cannot be generated by a CFG. One example is the language that consists of a sequence of as, followed by the same number of bs, followed by the same number of c's - that is, abc, aabbcc, aaabbbccc, and so on.

Similarly, no context-free grammar can generate the language that consists of any sequence of letters repeated in the same order twice, such as abab, abcabc, acdabacdab, and so on. There are more general grammatical systems that can generate such sequences, however. One important class is the contextsensitive grammar, which consists of rules of the form: A where A is a symbol, and are (possibly empty) sequences of symbols, and is a nonempty sequence of symbols. Generative capacity Even more general are the type 0 grammars, which allow arbitrary rewrite rules. Work in formal language theory began with Chomsky (1956). Since the languages generated by regular grammars are a subset of those generated by context-free grammars, which in turn are a subset of those generated by

context-sensitive grammars, which in turn are a subset of those generated by type 0 languages, they form a hierarchy of languages (called the Chomsky Hierarchy). Languages associated to Chomsky-hierarchy grammars Grammar Languages Automaton Type-0 Recursively enumerable

Turing machine Type-1 Linear-bounded Context-sensitive non-deterministic Turing machine Type-2 Context-free Non-deterministic pushdown automaton

Type-3 Regular Finite state automaton Grammaticality condition A sentence w (a string of *) pertains to the language generated by grammar G, if grammar G can derive w starting from S, using production rules. Obtaining the grammar Definition of the terminal tagset () Definition of the non-terminal tagset (V) Definition of grammar rules (P):

Manual construction Automatic construction Grammatical inference (induction) Semiautomatic construction Grammars for language processing Minimum: context-free grammars (CFGs) Is NL a context-free language? No. Are CFGs sufficient? No. Solutions: CFG + {procedural addition of context} Logical grammars (unification) Grammars enriched with statistical information (SCFGs)

CFG parsers and extensions Lets start from a simplification of CFGs CFGs RGs Finite state automata (FSAs) Extension of FSAs TNs (Transition Networks) RTNs (Recursive Transition Networks) ATNs (Augmented Transition Networks) W. A. Woods (1970) in "Transition Network Grammars for Natural Language Analysis" claims that by adding a recursive mechanism to a finite state model, parsing can be achieved much more efficiently. Charts (M. Kay, 1980) WFSTs (well-formed substring tables)

Transition networks (TNs) Autmat finit Estats associats a parts de la frase Transicions Etiquetes que fan referncia a categories morfosintctiques Una transici s acceptable si la paraula t la mateixa categoria que apareix etiquetada a larc No determinisme Ms dun estat inicial Una paraula amb ms duna categoria possible Ms dun arc per la mateixa categoria TNs: example

DET ADJ q1 q0 DET q 5 N q2 V q4

N V NAME q3 El gat menja bacall DET N V N N q6 ADJ

TNs: limitations Limitat a llenguatges regulars No es pot dir que analitzi Reconeix No-determinisme backtracking Ineficincia No separaci entre gramtica i analitzador gramtica descripci del model sintctic analitzador (parser) control Recurrent TNs (RTNs) Collecci de xarxes de transici (TNs) etiquetades amb un nom

Arcs Etiquetats amb categories com xarxes normals Etiquetes terminals Etiquetats amb identificadors de xarxes de transici (TNs) Etiquetes no terminals: els estats finals de les TNs causen el retorn a lestat dest de la transici que ha causat la crida Les RTN son dbilment equivalents a les CFG RTNs: examples NP S

2 VP 3 1 NP N DET 1 2

PP 3 N ADJ NAME RTNs: examples V 1 VP NP 2 3

V P PP 1 NP 2 3 RTNs: limitations Transicions noms depenen de les categories (poc expressiu) Llenguatge de context lliure Reconeixen per no analitzen

Ineficincia inherent al backtracking Charts Intenten eliminar redundncies en lanlisi (alleugeriment del cost del backtracking) memoritzant estructures parcials ja construdes. No afecten lestratgia de lanlisi Inconvenients: espai, temps de construcci, noms guarden components ben formats Charts Chart = graf dirigit que es construeix de manera dinmica i incremental a mesura que es realitza lanlisi. Els nodes corresponen al principi i final de la frase i a les separacions entre paraules (N+1

nodes) 1 2 La frase aquesta 3 a 4 5 analitzar

6 s 7 Charts Els arcs es creen dinmicament. Un arc de la posici i a la j (j i) engloba totes les paraules que estan entre la posici i i la j. Els arcs poden ser actius = objectius o hiptesis per completar inactius = components completament analitzades La frase 1 aquesta

2 3 a 4 analitzar 5 s 6 7 Charts: notation Regla puntejada (DR, dotted rule): producci de la gramtica que cont

algun punt en la seva part dreta. Per exemple, de la regla A > BCD es poden derivar les segents regles puntejades: A > . B C D A > B . C D A > B C . D A > B C D . (corresponent a un arc actiu) (corresponent a un arc inactiu) Charts: notation Arc dun chart: < i , j , X a.b >

i,j: X ab X a.b nodes origen i dest producci de la gramtica DR < 3, 3 , VP .V NP > < 1, 3 , S NP.VP > < 3, 5 , VP V NP.> El 1 gat 2

menja 3 4 salm 5 Basic rule of combination Arc actiu: Arc inactiu: Resultat:

Ascendant strategy Regla bsica: Cada vegada que safegeix un arc inactiu al Chart , aleshores sha dafegir al seu extrem esquerre un arc actiu per cada regla B Ab de la gramtica Inicialitzaci: afegir els arcs inactius que corresponen a les categories lxiques (terminals). Ex: <1, 2, Det el.> Descendant strategy Regla bsica: Cada vegada que safegeix un arc actiu al Chart < i, j, A a.Bb >, aleshores, per cada regla B b de la gramtica, sha dafegir un arc actiu al seu extrem dret < j, j, B .b > Inicialitzaci: Igual que abans per a ms cal afegir larc

actiu que correspon a lobjectiu dobtenir una frase. Ex: <1, 1, oraci .SN SV> La regla bsica de combinaci amb lestratgia ascendent o descendent (o una combinaci de les dues) s el que ens proporciona el mtode danlisi Charts: example [Oracio GN GV ] [Oracio GN GV ] [Oracio GN GV] [GN n ] [GV vi] [Oracio GN GV] [GV vt GN]

[GN det n] [GN det n] [GN n] [GN det n] [GN det n ] [det] el 0 1 [GV vt GN] [GV vi ][GN n]

[n] [vt] [vi] gat menja 2 3 [n] peix 4 Features (rasgos)

Context-free grammars provide the basis for most of the computational parsing mechanisms developed to date. As they have been described, they would be inconvenient for capturing natural language. The basic context-free mechanism can be extended defining constituents by a set of features. This extension allows aspects of natural language such as agreement and subcategorization to be handled in an intuitive and concise way. Feature systems and augmented grammars In natural language there are often agreement restrictions between words and

between phrases. For example, the noun phrase (NP) un hombres is not correct because the article un indicates a single object while the noun hombres indicates a plural object. The NP does not satisfy the number agreement restriction. Feature systems and augmented grammars There are many other forms of agreement, including subject-verb agreement gender agreement for pronouns restrictions between the head of the phrase and the form of its complement. Features are introduced to handle such phenomena.

A feature NUMBER might be defined that takes a value of either s (for singular) or p (for plural) and we then might write an augmented CFG rule such as NP ART N only when NUMBER1 agrees with NUMBER2 This rule says that a legal NP consists of an article (ART) followed by a noun (N), but only when the number feature of the first word agrees with the number feature of the second. Feature systems and augmented grammars NP ART N only when NUMBER1 agrees with NUMBER2 This one rule is equivalent to two CFG rules that would use different terminal symbols for encoding singular and plural forms of all NPs, such as

NP-SING ART-SING N-SING NP-PLURAL ART-PLURAL N-PLURAL The two approaches seem similar in ease-of-use in this one example. Though, in the second case, all rules in the grammar that use an NP on the right-hand side would need to be duplicated to include a rule for NP-SING and a rule for NP-PLURAL, effectively doubling the size of the grammar. Feature systems and augmented grammars Handling additional features, such as person agreement, would double the size of the grammar again, and so on. Using features, the size of the augmented

grammar remains the same as the original one, yet accounts for agreement constraints. A constituent is defined as a feature structure (FS), a mapping from features to values that defines the relevant properties of the constituent. Feature systems and augmented grammars Example: a FS for a constituent ART1 that represents a particular use of the word un: ART1: (CAT ART ROOT un NUMBER s) It is a constituent in the category ART that has its root in the word un and is singular.

Usually an abbreviation is used that gives the CAT value more prominence: ART1: (ART ROOT un NUMBER s) Feature systems and augmented grammars FSs can be used to represent larger constituents as well. FSs themselves can occur as values. Special features based on the integers (1, 2, 3) stand for the subconstituents (first, second). The representation of the NP constituent for the phrase un pez could be: NP1: (NP NUMBER s 1 (ART ROOT un

NUMBER s) 2 (N ROOT pez NUMBER s)) Feature systems and augmented grammars The previous one can also be viewed as a representation of a parse tree: Subconstituent features 1 and 2 correspond to the subconstituent links in the tree. Feature systems and augmented grammars The rules in an augmented grammar are stated in terms of FSs rather than simple categories. Variables are allowed as feature values so that

a rule can apply to a wide range of situations. For example, a rule for simple NPs would be as follows: (NP NUMBER ?n): (ART NUMBER ?n) (N NUMBER ?n) This says that an NP constituent can consist of two subconstituents, the first being an ART and the second being an N, in which the NUMBER feature in all three constituents is identical. Feature systems and augmented grammars (NP NUMBER ?n): (ART NUMBER ?n) (N NUMBER ?n) According to this rule, constituent NP1 given previously is a legal

constituent. The constituent *(NP 1 (ART NUMBER s) 2 (N NUMBER s)) is not allowed by this rule because there is no NUMBER feature in the NP. The constituent *(NP NUMBER s 1 (ART NUMBER s) 2 (N NUMBER p)) is not allowed because the NUMBER feature of the N constituent is not identical to the other two NUMBER features. Feature systems and augmented grammars Variables are also useful in specifying ambiguity in a constituent.

The word crisis is ambiguous between a singular and a plural reading. The word might have two entries in the lexicon that differ only by the value of the NUMBER feature. Alternatively, we could define a single entry that uses a variable as the value of the NUMBER feature: (N ROOT crisis NUMBER ?n) This works because any value of the NUMBER feature is allowed for the word crisis. Feature systems and augmented grammars In many cases not just any value would work, but a range of values is possible. We introduce constrained variables, which are variables that can only take a value out of a specified list.

For example, the variable ?n{s p} would be a variable that can take the value s or the value p. When we write such variables, we will drop the variable name altogether and just list the possible values. The word crisis might be represented by the constituent: (N ROOT crisis NUMBER ?n{s p}) or more simply as (N ROOT crisis NUMBER {s p}) Feature systems and augmented grammars There is an interesting issue of whether an augmented CFG can describe languages that cannot be described by a simple CFG. The answer depends on the constraints on what can be a feature value. If the set of feature values is finite, then it would always

be possible to create new constituent categories for every combination of features. Thus it is expressively equivalent to a CFG. If the set of feature values is unconstrained then such grammars have arbitrary computational power. In practice, even when the set of values is not explicitly restricted, this power is not used, and the standard parsing algorithms can be used on grammars that include features. ATN: example TRETS: Number: Singular, Plural Default: empty Person: 1st, 2nd, 3rd Default: 3rd 6:Proper

ROL: Nucli-Subjecte 5:Pronoun 1:det f g 4:Noun 8: Send h 7:pp 2: Jump

3: Adjective Xarxa per a reconixer grups nominals (NP) ATN: example Inicialitzacions, condicions i accions NP-1: fDeterminerg A: Set Number to the number of * NP-4: gNounh C: Number is empty or number is the number of * A: Set Number to the number of * Set Nucli-Subjecte to * NP-5: fPronounh A: Set Number to the number of * Set Person to the Person of * Set Nucli-Subjecte to *

NP-6: fProperh A: Set Number to the number of * Set Nucli-Subjecte to * ATN: limitations Sn adequades per lanlisi descendent, per no resulta fcil implementar una anlisi ascendent o hbrida. Redundncia de les operacions de backtracking: ineficincia Problemes dexpressivitat notacional: la gramtica es barreja amb les accions CFG recognition using difference lists

An efficient implementation of CFGs can be obtained by making use of difference lists: a sophisticated Prolog technique. The key idea underlying difference lists is to represent the information about grammatical categories not as a single list, but as the difference between two lists. For example, instead of representing a woman shoots a man as [a,woman,shoots,a,man] we might represent it as the pair of lists [a,woman,shoots,a,man] [ ]. CFG recognition using difference lists Think of the first list as what needs to be

consumed (or if you prefer: the input list), and the second list as what we should leave behind (or: the output list). Viewed from this (rather procedural) perspective the difference list [a,woman,shoots,a,man] [ ]. represents the sentence a woman shoots a man because it says: If I consume all the symbols on the left, and leave behind the symbols on the right, I have the sentence I am interested in. CFG recognition using difference lists The sentence we are interested in is the difference between the contents of the two lists. Difference representations are not unique.

In fact, we could represent a woman shoots a man in infinitely many ways. For example, we could also represent it as [a,woman,shoots,a,man,ploggle,woggle] [ploggle,woggle]. Again the point is: if we consume all the symbols on the left, and leave behind the symbols on the right, we have the sentence we are interested in. CFG recognition using difference lists If we bear the idea of consuming something, and leaving something behind in mind, we obtain the following recognizer (Prolog notation): s(X,Z) :- np(X,Y), vp(Y,Z). np(X,Z) :- det(X,Y), n(Y,Z). vp(X,Z) :- v(X,Y), np(Y,Z). vp(X,Z) :- v(X,Z).

det([the|W],W). det([a|W],W). n([woman|W],W). n([man|W],W). v([shoots|W],W). The s rule says: I know that the pair of lists X and Z represents a sentence if (1) I can consume X and leave behind a Y, and the pair X and Y represents a noun phrase, and (2) I can then go on to consume Y leaving Z behind, and the pair Y Z represents a verb phrase. CFG recognition using difference lists The idea underlying the way we handle the words is similar. The code

n([man|W],W). means we are handling man as the difference between [man|W] and W. Intuitively, the difference between what I consume and what I leave behind is precisely the word man. CFG recognition using difference lists How do we use such grammars? Here's how to recognize sentences: s([a,woman,shoots,a,man],[]). yes This asks whether we can get an s by consuming the symbols in [a,woman,shoots,a,man], leaving nothing

behind. CFG recognition using difference lists Similarly, to generate all the sentences in the grammar, we ask s(X,[]). This asks: what values can you give to X, such that we get an s by consuming the symbols in X, leaving nothing behind? The queries for other grammatical categories also work the same way. For example, to find out if a woman is a noun phrase we ask: np([a,woman],[]). CFG recognition using

difference lists And we generate all the noun phrases in the grammar as follows: np(X,[]). It has to be admitted that this recognizer is not as easy to understand, at least at first, and it's a pain having to keep track of all those difference list variables. This is where DCGs come in.

Recently Viewed Presentations

  • William Stallings, Cryptography and Network Security 5/e

    William Stallings, Cryptography and Network Security 5/e

    * * * This photo of an Allied Hagelin machine was taken by Lawrie Brown at Eurocrypt'93 in Norway. Note pen for scale, and the rotating cipher wheels near the front. * * The basic principle of the rotor machine...
  • Selling Individual Disability Insurance to Small Business Owners

    Selling Individual Disability Insurance to Small Business Owners

    Gender-Neutral Pricing. 5% Discount. 1. Goal varies by case. Discount options for Associations. There is a 10% discount for eligible members of an approved association, depending on the product. discounts are available for The Protector, The Protector+, and The Business...
  • Austin Comerton Manager, Business Development 1 877

    Austin Comerton Manager, Business Development [email protected] 1 877

    Several satellites pick up a call, and this "path diversity" reduces the possibility of blocked or dropped calls. If buildings or terrain obstruct the phone's line-of-sight to a satellite, a "soft hand-off" takes place switching the call to an alternate...


    I graduated from Salisbury University in 1999 and obtained my Masters in Reading Instruction from Wilmington University in 2005. I was born and raised in Queen Anne's County and am a proud product of the school system. I currently live...
  • Development of Oxygen Electrodes for High Temperature and

    Development of Oxygen Electrodes for High Temperature and

    ECerS conference 2017, Budapest. Motivation for the work. Selection of materials. Electrochemical activity towards the OER. Chemical stability of materials at high temperature. Processing porous LaNi. 0.6. Fe. 0.4. O. 3. electrodes. 2017-07-13. Motivation and past results (high current density...
  • Multiplication and Division of Integers

    Multiplication and Division of Integers

    Multiplication and Division of Integers * Here's a way I can Remember! To remember whether your answer will be positive or negative when MULTIPLYING or DIVIDING, we'll use: Hi I'm BOB! When multiplying Integers, cover the Two signs you are...
  • America&#x27;s First Government(s) - Cabarrus County Schools

    America's First Government(s) - Cabarrus County Schools

    The first constitution of the US. Proposed by the Continental Congress November 17, 1777. Ratified by all the states on March 1, 1781. The . Confederation. created a "league of friendship" coming together for one purpose. ... America's First Government(s)
  • Experimental Evaluation of Co-existent LTE-U and Wi-Fi on

    Experimental Evaluation of Co-existent LTE-U and Wi-Fi on

    Proposed operation of LTE in unlicensed band (LTE-U) : possible coexistence with Wi-Fi networks. Coexistence leads to interference. Objectives: Experimental evaluation of Wi-Fi in the presence of LTE-U. To characterize the Wi-Fi throughput. To help to construct a solution for...