Authorship Attribution Erik Goldman & Abel Allison Problem Definition: Identification of the author of an anonymously written document given a set of candidate authors.
Applications: Historical Scholarship Investigative Forensic Identification Example: Fake Steve Jobs Related Work Support Vector Machine methods [Diederich et al. (2003)]
Document prototypes (interesting documents or part of extracted, salient texts, to match with a document database [Visa et al. (2001)] Numerical method of fractional counts [Burrel and Rousseau (1995)]
Approach 1. For each work in the training set, count various feature data (more on features next slide), store as histograms. 2. Input unknown document and make same counts. 3. Compare the histograms of each author with
those of the unknown. Each feature contributes a weighted vote. 4. Choose author with the highest comparison score Metrics Limit Word Frequency-Words frequently used by the author across multiple works.
Grapheme Frequency-Counts of alphanumeric and symbol characters. Part-of-speech Bigram Frequency Preterminal Tag Bigram Model - Histogram Comparisons Two Methods Used Chi-Squared Metric Difference Formula similar to
the Chi-Squared formula, except accounts for sparsity of bi-gram counts by normalizing them with respect to the average counts: Tests Used the power set of our set of authors. For each element in the power set, we ran our
tests using each of the authors as the unknown and recorded the results. Results
UFO Front End. In principle simple, but in practice very messy. CIL passes to normalize the code (library functions, uninitialized vars, etc.) llvm-gcc(without optimization) to compile C to LLVM bitcode
Entropy - is the measure of disorder or randomness of the particles that make up a system. ... increasing temperature increases rate. Catalysts. ... Chemical equilibrium is a state in which the forward and reverse reactions balance each other because...
Step #1 asks educators to determine the quantitative measures of the text being read. When determining the quantitative measure, word length, word frequency, word difficulty, sentence length, text length and cohesion are usually evaluated by a system such as Fry's...
"A Righteous Fist," an article in the December 2010 issue of The Economist. The following are excerpts from the article. In a schoolyard in a village on the dusty north China plain, martial artists drill children in the stylized kicks...
The RAT is able to grind through hard volcanic rock in about two hours. SCRIPT: Robot Geologists! Mars Exploration Rovers are ideally equipped to "read the rocks." The rover twins will determine what the planet's conditions were like when the...
Eco-footprint of selected continents Eco-footprint North America Africa Europe Asia Pacific 9.4 1.1000000000000001 3.8 1.3. Produces 9% of UK GDP (Gross Domestic Product) 4th largest urban economy in the world.
Ready to download the document? Go ahead and hit continue!