Authorship Attribution - Stanford NLP Group

Authorship Attribution - Stanford NLP Group

Authorship Attribution Erik Goldman & Abel Allison Problem Definition: Identification of the author of an anonymously written document given a set of candidate authors.

Applications: Historical Scholarship Investigative Forensic Identification Example: Fake Steve Jobs Related Work Support Vector Machine methods [Diederich et al. (2003)]

Document prototypes (interesting documents or part of extracted, salient texts, to match with a document database [Visa et al. (2001)] Numerical method of fractional counts [Burrel and Rousseau (1995)]

Approach 1. For each work in the training set, count various feature data (more on features next slide), store as histograms. 2. Input unknown document and make same counts. 3. Compare the histograms of each author with

those of the unknown. Each feature contributes a weighted vote. 4. Choose author with the highest comparison score Metrics Limit Word Frequency-Words frequently used by the author across multiple works.

Grapheme Frequency-Counts of alphanumeric and symbol characters. Part-of-speech Bigram Frequency Preterminal Tag Bigram Model - Histogram Comparisons Two Methods Used Chi-Squared Metric Difference Formula similar to

the Chi-Squared formula, except accounts for sparsity of bi-gram counts by normalizing them with respect to the average counts: Tests Used the power set of our set of authors. For each element in the power set, we ran our

tests using each of the authors as the unknown and recorded the results. Results

Recently Viewed Presentations

  • UFO: Verification with Interpolants and Abstract Interpretation Arie

    UFO: Verification with Interpolants and Abstract Interpretation Arie

    UFO Front End. In principle simple, but in practice very messy. CIL passes to normalize the code (library functions, uninitialized vars, etc.) llvm-gcc(without optimization) to compile C to LLVM bitcode
  • Thermodynamics and Kinetics

    Thermodynamics and Kinetics

    Entropy - is the measure of disorder or randomness of the particles that make up a system. ... increasing temperature increases rate. Catalysts. ... Chemical equilibrium is a state in which the forward and reverse reactions balance each other because...
  • Lesson 22.2

    Lesson 22.2

    Lesson 22.2 Electrophysiology and the Electrical Conduction System Copyright © 2012 by Mosby, an imprint of Elsevier Inc. All rights reserved.*
  • Statewide System of Support Foundational Services Illinois State

    Statewide System of Support Foundational Services Illinois State

    Step #1 asks educators to determine the quantitative measures of the text being read. When determining the quantitative measure, word length, word frequency, word difficulty, sentence length, text length and cohesion are usually evaluated by a system such as Fry's...
  • A Righteous Fist - White Plains Middle School

    A Righteous Fist - White Plains Middle School

    "A Righteous Fist," an article in the December 2010 issue of The Economist. The following are excerpts from the article. In a schoolyard in a village on the dusty north China plain, martial artists drill children in the stylized kicks...
  • Mars Exploration Rovers (MER) Entry, Descent, Landing, and

    Mars Exploration Rovers (MER) Entry, Descent, Landing, and

    The RAT is able to grind through hard volcanic rock in about two hours. SCRIPT: Robot Geologists! Mars Exploration Rovers are ideally equipped to "read the rocks." The rover twins will determine what the planet's conditions were like when the...
  • Exam 2 Wednesday 4/4 Homework 9 posted and

    Exam 2 Wednesday 4/4 Homework 9 posted and

    OXIDATION-REDUCTION (REDOX) REACTIONS REDOX REACTIONS ALL INVOLVE CHANGES IN ELECTRON OWNERSHIP EXAMPLE #1: (w/demo) Mgo + 2H+ Mg2+ + H2 0 +2 +1 0 Mg loses 2 e- Each H+ gains 1 e- Losing is oxidation Gaining is reduction Leo-Ger...
  • Changing Cities: London: a growing city 1/26/20 Homework.

    Changing Cities: London: a growing city 1/26/20 Homework.

    Eco-footprint of selected continents Eco-footprint North America Africa Europe Asia Pacific 9.4 1.1000000000000001 3.8 1.3. Produces 9% of UK GDP (Gross Domestic Product) 4th largest urban economy in the world.