Open Mind General

Open Mind General

1 Open Mind Initiative David G. Stork Ricoh Silicon Valley [email protected] 2 Outline One-sentence description Background Open Mind Initiative Sample projects Relation to Open Source and to Data mining

Related efforts elsewhere What do we do next Monday? 3 Open Mind Initiative A collaborative framework (based on Open Source methodology) for developing intelligent software, where... domain experts provide algorithms, tool developers provide software infrastructure and tools, and non-expert e-citizens provide raw data. 4 Background: Market need Speech recognition OCR Web searching ...... Some software (e.g., common sense)

too costly for a single company to build Background: E-community & Open Source Waves GNU SendMail Linux 10M lines; 10M seats; dbl. time 6 mo., 105 contributors Apache Half of all web servers Beowulf

Supercomputer power from networked PCs Newhoo! dmoz.org Open web directory (527,991 sites, 10,943 editors, 82,003 categories) Infomedia Open source encyclopedia 5 Growth of new software methods 1990 105 programmers 1995 Linux 1995 106 web authors 1999 Newhoo! 1999 109 e-citizens 2003 Open Mind

New communication allows communities and collaboration, and thus new software methods Opportunities expand to less-skilled users 6 Background: Pattern recognition/intelligent systems Recognizer = Theory + Model + Data Theory excellent Models depend on problem Never enough data the group with the most data wins e.g., OCR ... 7 8

Background: Tools Tools for customization/experimentation CSLU (Open Source) Nuance HTK S+ ... Non-experts can use these! Background: Infrastructure Collaborative software Animals (Shapiro 75, Lo & Stork 99) Answer Garden (Ackerman 90) BBN UNIPEN data collection software (Schwartz 97)

9 Infrastructure: Relevance rating DirectHit, Inc. improved web indexing by monitoring users selections FireFly target advertisements based on user profile Amazon.com book recommendations 10

11 Open Mind Initiative Three main functions provided by Domain Experts fundamental algorithms, process control, education/proselytizing, ... Tool developers software infrastructure, tools, ... e-citizens raw data, low-level bug reports, ... 12 Domain Experts

Provide algorithms (e.g., OCR, ...) Provide general algorithms (e.g., Bayes nets, ...) Process control, algorithm development and truthing detect outliers for review/rejection data voting catch trials signal dection theory (d) method of limits two-alternative forced-choice hidden staircase bias avoidance Trend to publish data and algorithms on the web

More university work will be done with Linux Tool/infrastructure developers Get maximum information for minimum e-citizen effort (e.g., informative patterns) Make it easy (fast) for contributors Web infrastructure Collaborative software (version control) Reward contributors 13 14 e-citizens

Incentives benefits in used system fun (games: Marathon, MUDD, ...) recognition (post names by amount of info. accepted) general interest (note progress: data and performance) altruism/philanthropy (cf. OED, SETI, ...) education (linguistics in schools, ...) lottery money

frequent flyer miles 1.5M inmates, 1M in nursing homes, ... Sample Projects (1) Handwritten isolated character OCR Recognizer: simple neural net, decision tree, nearestneighbor, ... Patterns presented on contributors browsers, cached, ... Synthetic data (rotate, skew, line thicken/thin) Learning with queries (ask informative patterns); each pattern more valuable than a sampled one

Cooperative improvement (submit characters over internet, download improved OCR the next day) Improved OCR 15 16 OCR example Open Mind host 4 9 9 4 4 9 4 9

4 9 4 9 e-citizens ... 4 9 4 9 4 9 Sample Projects (2) Handwritten word recognition Recognizer: off the shelf Words scanned from handwritten docs Three alternatives shown, best selected by naive contributor (as in commercial

speech recognizers) Improved handwritten OCR 17 Sample Projects (3) Open Mind chatbot game MUDD-like game Goal: find the route through the castle to the human choose the most natural paragraph Linguistic information learned in background More natural interfaces 18

Sample Projects (4) Common sense about computers Facts programs compiled, interpreted, run, ... a mouse is a peripheral early versions of code are generally buggy COBOL is a programming language More natural text interfaces 19 Sample Projects (5) Open Mind chess/go Chess/go = fast search + board scoring Allow contributors to score positions

weighted by FIDE chess rating/go dan weighted by score on on-line test weighted by confidence port to multiple PCs (Beowulf) for speed Improved beam search via improved scoring (more humanlike style?) 20 Sample Projects (6) Open Mind Animals 21 (Lam & Stork 99) challenges: truthing 2 legs?

Y N can fly? dog human N N bat forwarding errors to domain experts crediting contributors ordered by amount contributed avoid ID clashes; allow anonimity

query simplification reduce average number of queries/new animal tree simplification better taxonomy tree reflects the structure of domain N generalizable to other domains other forms of queries human-machine interface dog Y bug reporting arbitrary branching factor

mane? horse parrot Y N feathers? Y can swim? elephant Y insure valid animals name/synonym check insure data quality (voting, accept if used)

natural, show current query set (selectable) display progress number of animals, contributors, show tree Sample Projects (7) Open Mind Investment Assistant 99) DOL AMD TOY BTFD K XLNX MAT

ALTR ATT BRDCY GM AAPL F DELL MSFT IBM 22 (Lo & Stork Problems in Machine learning Relative value of learning with queries vs. iid samples

Data truthing/outlier detection Optimal learning strategies given... Bayes error probability of hostile data probability of data error Learn reliability of e-citizens, individually and as a group 23 24 Relation to Open Source Open Source no e-citizens expert knowledge (C++filt,gdbm) machine learning irrelevant web infrastructure useful most work is directly on the final software

hacker culture (10 ) 5 Open Mind e-citizens crucial informal knowledge (read, hear) machine learning essential web infrastructure essential most work is on the infrastructure e-citizen and business culture (10 ) 9 25 Relation to Data Mining Data Mining type of data may not be available for the project desired (e.g., OCR) no interactive queries

slower learning ambiguities not resolved relatively fixed amount of data little or no e-citizen support Open Mind data tailored to the project desired (e.g., OCR) interactive queries faster learning ambiguities resolved new data encouraged e-citizen support Open Mind project Taxonomy Benefit World OpenMind

OCR chess/go common sense H H L H comp c-s M H M dialog H M M

H M H speech grammar Animals H H M ease/ simplicity M L M L Use of e-citizens

M H H L H M M M M H H H 26 27 Related efforts elsewhere

Speech Macrophone Human phoneme project Linguistic Data Consortium VoiceControl (Open Source speech for Linux) CSLU (Center for Spoken Language Understanding) Open Source speech tools OCR NIST, CEDAR, ARPA, UNIPEN GNU dictionary Newhoo! 28 It is inevitable

Need is here Web is here Theory/Machine learning is here Intelligent systems Open Mind e-citizens knowledge This collaboration is going to happen! Less radical than Richard Stallman or Linus Torvald... Possible value to corporations Most companies could never develop most of this software, nor preserve a competitive advantage through proprietary software Expand functionality/niches for all Low-cost, possibly high-payoff research

Leverage university work 29 30 Technical Specifications Language: Java Portable Operating System: Linux Open Source, portable, multiprocessor version (Beowulf) Data representation: Resource Description Framework (RDF) Source: www.w3.org/RDF/

Code: lxr.mozilla.org/mozilla/source/rdf/base/ Docs: www.mozilla.org/rdf/doc/ 31 Licenses No license choice will satisfy everyone GNU: any linked code must include source and follow FSF copyright -- copyleft FreeBSD: do whatever you like (can charge) But... you cannot link GNU & FreeBSD! Practical (not moral) decision Open Source will benefit from competitive commercialization BSD license best for Open Mind

What do we do next Monday? Put up OpenMind.org Demonstration projects: Open Mind Animals Limited seeding (proselytizing) Solicit projects; introduce domain experts with tool developers Get corporate donations (e.g., books, CDs, ...) 32 33 Summary

Open Mind Collaborative framework for developing intelligent systems Experts, tool developers, e-citizens Projects Vision of the future 34 Questions/Comments... Contact: [email protected]

Recently Viewed Presentations

  • Grand Rounds Jinghua Chen, MD, PhD October 21,

    Grand Rounds Jinghua Chen, MD, PhD October 21,

    Multiple evanescent white dot syndrome. Multifocal choroiditis and panuveitis ( MCP ) Acute posterior multifocal placoid pigment epitheliopathy (APMPPE) Birdshot chorioretinopathy. Punctate inner choroidopathy(PIC) White Dot Syndromes (WDS) a collection of diseases characterized by localized,circumscribed whitish lesions in the RPE...
  • Chapter 18 The Age of Reform - Woodbridge Township School ...

    Chapter 18 The Age of Reform - Woodbridge Township School ...

    Chapter 28 The Progressive Era 1900-1920 * * * * * * It is true that many of the problems identified by the Progressives still plague us today: There are still dishonest sellers, unfair employment practices, and problems in schools,...
  • IQ - Partners for Housing

    IQ - Partners for Housing

    Direct Tax Committee DTLAB & DTALAB 2014 - Presentation to the SCoF 26 August 2014 Leon Coetzee Tracy Brophy Mardelle Kelbrick What is BASA? BASA is the industry representative body of Banks in South Africa The BASA Tax Committees have...
  • MARKETING I 1 Conceitos bsicos e Modelo de
  • Welcome to the new GE PPT template!

    Welcome to the new GE PPT template!

    Mission: Improve quality of care through rewards and incentives that (1) encourage providers to deliver optimal care (2) encourage patients to seek evidence-based care and self-manage their own conditions Focus: Reengineer office practices by adopting better systems of care Demonstrate...
  • ACG Clinical Guideline: Epidemiology, Risk Factors, Patterns ...

    ACG Clinical Guideline: Epidemiology, Risk Factors, Patterns ...

    Fibromuscular dysplasia of the superior rectal artery has been associated with CI. The colon is particularly susceptible to ischemia, perhaps owing to its relatively low blood flow, its unique decrease in blood flow during periods of functional activity, and its...
  • Applications of SAT to FPGAs - University of Toronto

    Applications of SAT to FPGAs - University of Toronto

    Applications of SAT to FPGAs FPGA Routing PLB technology mapping Verification of PLB Robustness LUT Mapping Problem Given: a subject graph representing the circuit. ... 4-bit Barrel Shifter. 16-bit Barrel Shifter. 6-bit Prioirity Function. 6-bit Set Reset Check. Add Compare....
  • Nashville, Brown County Presentation A

    Nashville, Brown County Presentation A

    Nashville, Brown CountyCRI Presentation B. David Terrell. Director, Indiana Communities Institute, Ball State University. Director, RUPRI Center for State Policy, Rural Policy Research Institute