1 Open Mind Initiative David G. Stork Ricoh Silicon Valley [email protected] 2 Outline One-sentence description Background Open Mind Initiative Sample projects Relation to Open Source and to Data mining
Related efforts elsewhere What do we do next Monday? 3 Open Mind Initiative A collaborative framework (based on Open Source methodology) for developing intelligent software, where... domain experts provide algorithms, tool developers provide software infrastructure and tools, and non-expert e-citizens provide raw data. 4 Background: Market need Speech recognition OCR Web searching ...... Some software (e.g., common sense)
too costly for a single company to build Background: E-community & Open Source Waves GNU SendMail Linux 10M lines; 10M seats; dbl. time 6 mo., 105 contributors Apache Half of all web servers Beowulf
Supercomputer power from networked PCs Newhoo! dmoz.org Open web directory (527,991 sites, 10,943 editors, 82,003 categories) Infomedia Open source encyclopedia 5 Growth of new software methods 1990 105 programmers 1995 Linux 1995 106 web authors 1999 Newhoo! 1999 109 e-citizens 2003 Open Mind
New communication allows communities and collaboration, and thus new software methods Opportunities expand to less-skilled users 6 Background: Pattern recognition/intelligent systems Recognizer = Theory + Model + Data Theory excellent Models depend on problem Never enough data the group with the most data wins e.g., OCR ... 7 8
Background: Tools Tools for customization/experimentation CSLU (Open Source) Nuance HTK S+ ... Non-experts can use these! Background: Infrastructure Collaborative software Animals (Shapiro 75, Lo & Stork 99) Answer Garden (Ackerman 90) BBN UNIPEN data collection software (Schwartz 97)
9 Infrastructure: Relevance rating DirectHit, Inc. improved web indexing by monitoring users selections FireFly target advertisements based on user profile Amazon.com book recommendations 10
11 Open Mind Initiative Three main functions provided by Domain Experts fundamental algorithms, process control, education/proselytizing, ... Tool developers software infrastructure, tools, ... e-citizens raw data, low-level bug reports, ... 12 Domain Experts
Provide algorithms (e.g., OCR, ...) Provide general algorithms (e.g., Bayes nets, ...) Process control, algorithm development and truthing detect outliers for review/rejection data voting catch trials signal dection theory (d) method of limits two-alternative forced-choice hidden staircase bias avoidance Trend to publish data and algorithms on the web
More university work will be done with Linux Tool/infrastructure developers Get maximum information for minimum e-citizen effort (e.g., informative patterns) Make it easy (fast) for contributors Web infrastructure Collaborative software (version control) Reward contributors 13 14 e-citizens
Incentives benefits in used system fun (games: Marathon, MUDD, ...) recognition (post names by amount of info. accepted) general interest (note progress: data and performance) altruism/philanthropy (cf. OED, SETI, ...) education (linguistics in schools, ...) lottery money
frequent flyer miles 1.5M inmates, 1M in nursing homes, ... Sample Projects (1) Handwritten isolated character OCR Recognizer: simple neural net, decision tree, nearestneighbor, ... Patterns presented on contributors browsers, cached, ... Synthetic data (rotate, skew, line thicken/thin) Learning with queries (ask informative patterns); each pattern more valuable than a sampled one
Cooperative improvement (submit characters over internet, download improved OCR the next day) Improved OCR 15 16 OCR example Open Mind host 4 9 9 4 4 9 4 9
4 9 4 9 e-citizens ... 4 9 4 9 4 9 Sample Projects (2) Handwritten word recognition Recognizer: off the shelf Words scanned from handwritten docs Three alternatives shown, best selected by naive contributor (as in commercial
speech recognizers) Improved handwritten OCR 17 Sample Projects (3) Open Mind chatbot game MUDD-like game Goal: find the route through the castle to the human choose the most natural paragraph Linguistic information learned in background More natural interfaces 18
Sample Projects (4) Common sense about computers Facts programs compiled, interpreted, run, ... a mouse is a peripheral early versions of code are generally buggy COBOL is a programming language More natural text interfaces 19 Sample Projects (5) Open Mind chess/go Chess/go = fast search + board scoring Allow contributors to score positions
weighted by FIDE chess rating/go dan weighted by score on on-line test weighted by confidence port to multiple PCs (Beowulf) for speed Improved beam search via improved scoring (more humanlike style?) 20 Sample Projects (6) Open Mind Animals 21 (Lam & Stork 99) challenges: truthing 2 legs?
Y N can fly? dog human N N bat forwarding errors to domain experts crediting contributors ordered by amount contributed avoid ID clashes; allow anonimity
query simplification reduce average number of queries/new animal tree simplification better taxonomy tree reflects the structure of domain N generalizable to other domains other forms of queries human-machine interface dog Y bug reporting arbitrary branching factor
mane? horse parrot Y N feathers? Y can swim? elephant Y insure valid animals name/synonym check insure data quality (voting, accept if used)
natural, show current query set (selectable) display progress number of animals, contributors, show tree Sample Projects (7) Open Mind Investment Assistant 99) DOL AMD TOY BTFD K XLNX MAT
ALTR ATT BRDCY GM AAPL F DELL MSFT IBM 22 (Lo & Stork Problems in Machine learning Relative value of learning with queries vs. iid samples
Data truthing/outlier detection Optimal learning strategies given... Bayes error probability of hostile data probability of data error Learn reliability of e-citizens, individually and as a group 23 24 Relation to Open Source Open Source no e-citizens expert knowledge (C++filt,gdbm) machine learning irrelevant web infrastructure useful most work is directly on the final software
hacker culture (10 ) 5 Open Mind e-citizens crucial informal knowledge (read, hear) machine learning essential web infrastructure essential most work is on the infrastructure e-citizen and business culture (10 ) 9 25 Relation to Data Mining Data Mining type of data may not be available for the project desired (e.g., OCR) no interactive queries
slower learning ambiguities not resolved relatively fixed amount of data little or no e-citizen support Open Mind data tailored to the project desired (e.g., OCR) interactive queries faster learning ambiguities resolved new data encouraged e-citizen support Open Mind project Taxonomy Benefit World OpenMind
OCR chess/go common sense H H L H comp c-s M H M dialog H M M
H M H speech grammar Animals H H M ease/ simplicity M L M L Use of e-citizens
M H H L H M M M M H H H 26 27 Related efforts elsewhere
Speech Macrophone Human phoneme project Linguistic Data Consortium VoiceControl (Open Source speech for Linux) CSLU (Center for Spoken Language Understanding) Open Source speech tools OCR NIST, CEDAR, ARPA, UNIPEN GNU dictionary Newhoo! 28 It is inevitable
Need is here Web is here Theory/Machine learning is here Intelligent systems Open Mind e-citizens knowledge This collaboration is going to happen! Less radical than Richard Stallman or Linus Torvald... Possible value to corporations Most companies could never develop most of this software, nor preserve a competitive advantage through proprietary software Expand functionality/niches for all Low-cost, possibly high-payoff research
Leverage university work 29 30 Technical Specifications Language: Java Portable Operating System: Linux Open Source, portable, multiprocessor version (Beowulf) Data representation: Resource Description Framework (RDF) Source: www.w3.org/RDF/
Code: lxr.mozilla.org/mozilla/source/rdf/base/ Docs: www.mozilla.org/rdf/doc/ 31 Licenses No license choice will satisfy everyone GNU: any linked code must include source and follow FSF copyright -- copyleft FreeBSD: do whatever you like (can charge) But... you cannot link GNU & FreeBSD! Practical (not moral) decision Open Source will benefit from competitive commercialization BSD license best for Open Mind
What do we do next Monday? Put up OpenMind.org Demonstration projects: Open Mind Animals Limited seeding (proselytizing) Solicit projects; introduce domain experts with tool developers Get corporate donations (e.g., books, CDs, ...) 32 33 Summary
Open Mind Collaborative framework for developing intelligent systems Experts, tool developers, e-citizens Projects Vision of the future 34 Questions/Comments... Contact: [email protected]
Multiple evanescent white dot syndrome. Multifocal choroiditis and panuveitis ( MCP ) Acute posterior multifocal placoid pigment epitheliopathy (APMPPE) Birdshot chorioretinopathy. Punctate inner choroidopathy(PIC) White Dot Syndromes (WDS) a collection of diseases characterized by localized,circumscribed whitish lesions in the RPE...
Chapter 28 The Progressive Era 1900-1920 * * * * * * It is true that many of the problems identified by the Progressives still plague us today: There are still dishonest sellers, unfair employment practices, and problems in schools,...
Direct Tax Committee DTLAB & DTALAB 2014 - Presentation to the SCoF 26 August 2014 Leon Coetzee Tracy Brophy Mardelle Kelbrick What is BASA? BASA is the industry representative body of Banks in South Africa The BASA Tax Committees have...
Mission: Improve quality of care through rewards and incentives that (1) encourage providers to deliver optimal care (2) encourage patients to seek evidence-based care and self-manage their own conditions Focus: Reengineer office practices by adopting better systems of care Demonstrate...
Fibromuscular dysplasia of the superior rectal artery has been associated with CI. The colon is particularly susceptible to ischemia, perhaps owing to its relatively low blood flow, its unique decrease in blood flow during periods of functional activity, and its...
Applications of SAT to FPGAs FPGA Routing PLB technology mapping Verification of PLB Robustness LUT Mapping Problem Given: a subject graph representing the circuit. ... 4-bit Barrel Shifter. 16-bit Barrel Shifter. 6-bit Prioirity Function. 6-bit Set Reset Check. Add Compare....