Utilizing Social Media: Text Mining Analysis for Government ...

Utilizing Social Media: Text Mining Analysis for Government ...

Utilizing Social Media: Text Mining Analysis for Government Financial Data Zamil S. Alzamil, Ph.D. Candidate, Rutgers FIBO University Tweets Deniz Appelbaum, Assistant Professor, Feliciano School of Business, Montclair State University ! Robert Nehmer, Professor, Oakland University Abstract In this paper we utilize a natural language processing implementation of the Financial Industry Business Ontology (FIBO) to extract financial information from the social media platform Twitter regarding financial and budget information in the public sector, namely the two public-private agencies of the Port Authority of NY and NJ (PANYNJ), and the NY Metropolitan Transportation Agency (MTA). FIBO is part of the Enterprise Data Management Council (EDMC) and Object Management Group (OMG) family of specifications. FIBO provides standards for defining the facts, terms, and relationships associated with financial concepts.

Design Science Research (DSR) methodology. We apply a frame and slot approach from the artificial intelligence and natural language processing literature to operationalize the FIBO ontology in a public sector/ municipalities business context. 2 Abstract One contribution of this paper is that it is the first to recognize that the FIBO structure provides a grammar of financial concepts. We show that this grammar can be used to mine semantic meaning from unstructured textual data. Twitter streams will be monitored and analyzed with frames derived from FIBO and key words. The ability of the FIBO frames to detect semantic meaning in tweets is compared with nave key word analysis.

Using FIBO frames, constituent semantic structures can be uncovered to predict reactions to policies and programs more quickly than by following the feeds manually. 3 Construct an Efficient Methodology to Extract BI from Twitter Feeds Using FIBO Collect the feed Extract relevant FIBO concepts into slots and frames Validation of the overall use of FIBO: nave keywords test Collect descriptive statistics on the slots Use the descriptive statistics to construct efficient frame(s) Eliminate empty or dominated slots

Use and operators between slots that identify separate subpopulations Test frame against new feed population 5 Design Hypothesis: We plan to perform multiple tests aiming at finding what best constitute a frame. H0: Searches using both FIBO terms and FIBO synonyms will find more tweets than searches using only FIBO terms. H1: Searches using FIBO terms, synonyms and role performing items will find more tweets than searches using either FIBO terms only or FIBO term and synonyms. Test of H0: Text terms 1: Text terms 1 or T1 = S1,1 S2,1 S3,1 S4,1 . S18,1; where refers to the logical function OR

Text terms 2: T2 = S1,1 S1,2 S2,1 S2,2 S3,1 S3,2 S4,1 S4,2 . S18,1 S18,2 Text terms 3: T3 = S1,1 S1,2 S1,3 S2,1 S2,2 S2,3 S3,1 S3,2 S3,3 S4,1 S4,2 S4,3 . S18,1 S18,2 S18,3 6 Contd H0 will be tested by comparing the results of text terms 1 and text terms 2. H1 will be tested by comparing the results of text terms 3 against first text terms 1 and then text terms 2. The multiple tests above are part of our design hypothesis for frame construction. As by these tests, we use the logical condition OR or for the slots. The question is what constitute or best constitute a frame? In order to explore whether all slots need to be included in the search or not: We plan to develop descriptive statistics about the frequency of occurrence of each term

We will then also develop statistics on the frequency of the slots being filled. Based on these descriptive statistics, we hope to develop an efficient method for extracting frames form the Twitter stream. 7 After running the experiment, we can conclude how many slots we need to better represent a frame. Introduction PANYNJ and the MTA - New York and northern New Jerseys transportation infrastructure Public benefit corporations: quasi-private corporations that serve the public good Funded by self-issued debt (bonds) and the tolls that they collect. Chronic huge operating deficits: state subsidies and frequent fare increases!

Often, interested stakeholders will seek or contribute information at various social media outlets, such as Twitter (Syed et al 2013). The financial problems of the MTA and PANYNJ are becoming likely subjects for social media feeds, such as Twitter. Potentially meaningful to analysts and other stakeholders We provide structure to this task by implementing FIBO ontology rules to Twitter data feeds about the quasi-public PANYNJ and MTA funds. 8 Literature Review Twitter: Problem identification Twitter is an online news and social networking service where users post and interact with posts called tweets. Twitter data is usually classified as unstructured big data (Warren et al 2015). Analyzed by businesses, governments, stock market analysts, journalists.

Twitter data has been found to be relevant for predictive sentiment analysis (Pak and Paroubek 2010). The process of structuring unstructured data or tweets to obtain high quality information about accounting and financial information is challenging as this type of big data is unfamiliar to the profession. Standardized semantic understanding and natural language processing is required to differentiate words and phrases. 9 This Photo by Unknown Author is licensed under CC BY-NC Ontology based accounting research applied to Twitter: Define the objectives of the solution Although previous research discusses data standards for analysis of the softer qualitative data in financial statements (Warren et al 2015), research has not been found that discusses formalizing financial textual information about municipal bonds in social media sources such as Twitter. This paper applies a frame and slot methodology from the artificial intelligence and natural language

processing literature to operationalize the FIBO ontology in a public sector/municipalities business context. FIBO provides standards for defining the facts, terms, and relationships associated with financial concepts. FIBO concepts are vetted by subject matter experts (SMEs) so they should reflect high quality financial concepts. 12 Derivation of the slot and frame structure: Design and development of an artifact which meets some of the objectives Frames: Useful for simulating commonsense knowledge, which is a very difficult area for computers to master They represent related knowledge about a narrow subject that has much default knowledge. A frame system would be a good choice for describing a mechanical device, for example a car.

The frame contrasts with the semantic net, which is generally used for broad knowledge representation. There are no standards for defining frame-based systems. A frame is analogous to a record structure, where the fields and values of a record = the slots and slot fillers of a frame. A frame is basically a group of slots and fillers that defines a stereotypical object. 14 Derivation of the slot and frame structure: Design and development of an artifact which meets some of the objectives Car frame a generic subframe of property Table 2 Generic Car Frame Broad Meaning Slot Filler Name

Car Specialization-of a-kind-of property Types (SUV, compact, luxury) Maker (Honda, Ford, Subaru) Engine (gasoline, hybrid, diesel, electric) Transmission (manual, automatic) Instance 16 Derivation of the slot and frame structure: Design and development of an artifact which meets some of the objectives An instance of a car frame: Table 3 An Instance of a Car Frame Slot

Filler Name Zamils Car Specializationof Type isa car Maker Subaru Engine hybrid Transmission automatic luxury Slot = Primary Key! 17 Derivation of the slot and frame structure: Design and development of an artifact which meets some of the objectives Table 4 Government Issued Bond Frame from FIBO

Slot Filler Municipal Security Municipal Debt Issuer Municipal Bond Debt Obligor Funds Usage Municipal Bond Capital Type Municipal Bond Refund Terms Municipal Trustee Ad valorem tax provision Municipal Bond Type (Build America, Tax Allocation, Special Tax, Special Obligation, General Obligation, Revenue, Special Assessment, Consolidated Bond) 21 Demonstration of the system: Implementation of the twitter feed Seven Components of the proposed FIBO-Twitter Framework: 1) Targeted Tweets: PANYNJ and the MTA. e.g., (NYCT Subway, #MTA, #MTATransparency, NYCTSubway, NYCTBus, @MTA, LIRR, NYC Subway, #nycsubway) 2)

Twitter API: Twitter Micro-blogging social media platform. 3) Data Collection and Assembly: By accessing the Twitter Application Programming Interface (API), we wrote a Python code using Python 2.7 to fetch all Twitter stream that contains at least one of the targeted keys mentioned in Step One. 4) Data Aggregation and Preprocessing: After collecting the raw data, we cleaned and aggregate six fields from each tweet and put into our database. 5) The Financial Industry Business Ontology (FIBO): After data collection, aggregation and preprocessing, we search the databases for data structures which fill the slots of the frame for government bonds developed from the FIBO ontology. 6) Comparison to Nave Key Word Search: After collecting all the tweets related to the bond information of the two agencies, we plan to compare the results to a nave key word search of the database. 7) Evaluation of Final Results. 22 Demonstration of the validation: Implementation of the twitter feed system

23 Demonstration of the validation: Implementation of the twitter feed system 24 Demonstration of the validation: Implementation of the twitter feed system After initial data aggregation and preprocessing during the period from Jan 29th, 2018 to August 27th, 2018, the intermediate datasets consist of the following: Table Name PANYNJ # of Records 101,416 MTA 432,519 PATH 87,419 Total 621,354 Date 1/29/2018

8/27/2018 1/29/2018 8/27/2018 1/29/2018 8/27/2018 25 Initial validation of the implemented system on a real twitter feed Nave key word search of our implementation (from PANYNJ and MTA tables): Searching for the terms: Bond. Funds. Trustee. 26 DateTime 1/29/20 18 20:08 Initial validation of the system on a real twitter

Feed: Illustration of some of the findings nave keywords 1/29/20 18 22:06 1/29/20 18 20:33 2/8/201 8 0:43 Tweet User_id # of Likes follower s The Port Authority of 2725172 212 0 New York and New 07 Jersey Consolidated Bonds Two Hundred Seventh Series and Tw... https:\/\/t.co\/JD4Jg2T pLF # of

Posts @NYGovCuomo Fix 2428975 670 the Subway the MTA 2 the Port Authority & hand over those CFE funds to the Cities that are supposed to get them. The state-funded 6311942 6758 @MTA has paid the state over $328 million dollars in bond-issuance fees over the last 15 years. 3300 867 1280 2273

We are very proud to 1980947 136161 announce that 1 Charles Bolden Jr. has rejoined the Board of Trustees; coming from @NASA and his\ u2026 27 https:\/\/t.co\/i7Wu1M HU7Y 7197 1988 2236 FIBO synonym s for the system on a real twitter feed FIBO Concept FIBO Synonym Government Issued Bond Municipal Security

Sovereign Bond, Treasury Bond Contextualized Synonym Or Role-performing Item Government Debt Municipal Debt Instrument Muni Municipal Debt Issuer Municipal Bond Debt Obligor Muni Issuer, Muni Bond, Muni Owing Party, Borrower Issuer, Municipality Funds usage Funds Purpose, Disbursement Purpose Loan Purpose, Credit Facility Purpose, Credit Purpose Municipal Bond Capital Muni Capital Type Type Municipal Bond Refund Muni Refund Terms, Terms Muni Trustee Municipal Trustee Ad valorem tax

provision Municipal Bond Type Build America N/A Build America Bond Tax Allocation Tax Allocation Bond Special Tax Special Obligation Special Tax Bond Special Obligation Bond General Obligation General Obligation Bond Revenue Special Assessment Revenue Bond Special Assessment Bond Obligor Capital Type Refund Terms Trustee, Property Tax Provision, Real Property Tax Provision, Sales Tax Provision

28 Initial validation of the system on a real twitter Feed: Illustration of some of the findings nave keywords vs. FIBO Terms Search 1. FIBO Terms Search # of records = 253 records 2. Nave keywords Search # of records = 71 records Retweets = 1 # of false Positive = 37 Retweets = 87 # of false Positive = 5 29

Construction of The Frame-Based System: A frame consists of set of slots which are filled by values, procedures, or links to other frames: Formalize the municipal bonds-type frames taken from FIBO. Slots Representation: We assume the slots are represented as it appears in the table below: 30 Conclusions, future research, and take-aways Currently in process: requires extensive data set rich over time. Captures tweets about past and pending economic events regarding PANYNJ and the MTA. Recently, PANYNJ bond series were graded by Moodys and the Authority announced that it would be seeking funding to upgrade one of its airports, indicating that there may be another bond series issued in the near future. Useful to other government bodies and analysts who might want measure finances and performance.

There will be an issue of the units of analysis: individual tweet vs. thread. We may run into thesaurus problems. Theoretical interest: exploring the usefulness of ontologies in leveraging the conceptual knowledge in big data, such as twitter feeds. 31 Conclusions, future research, and take-aways Although this study has been carefully grounded in DSR and accounting ontology theory, we should mention several assumptions made in our study that are typical for most examinations of social media. First, there is an underlying assumption that Twitter feeds represent the true population.

Only represent the tweets of those who choose to Tweet. Many retweets and passive readers (non-tweeters) Most Twitter studies do not capture the tweets of the broad population Another assumption is that tweets represent the participants actual meanings (semantic state). Twitter posts only display what participants elect to post, and as such could be abbreviated and/or modified. Some participants may feel comfortable posting in a manner similar to an unstructured stream of consciousness and others might post in a more measured and structured manner. The latter points to the potential benefits of using a structured ontology for understanding tweets of a more financial nature. 32 Thank you! 33

Recently Viewed Presentations

  • Think Hard - University of Arizona

    Think Hard - University of Arizona

    Think Hard A seminar on critical thinking A Brain Teaser What is the measure of the angle between the hands of a clock then it is 3:15? A Mind Bender Can you place ten coins into 5 rows, with each...
  • I/O Pads

    I/O Pads

    When signal exceeds 5+Vb volts, then D2 is forward biased and discharges the excess voltage. When signal is below -Vb, then a similar discharging process occurs through D1. D2 D1 Diode 1 D1 in schematic Diode 2 D2 in scehematic...
  • Narratology

    Narratology

    Narration. Narration is the telling of the story. The narrator is the person (entity) who tells the story. Storytelling levels in literature. Source: Jahn, Narratology. ... Three major types of POV (Jahn) View of narrator. Homodiegetic first-person.
  • From Requirements to the Market A Lean journey

    From Requirements to the Market A Lean journey

    Kanban (Card) comes from minimizing work in process (WIP) in manufacturing. It is a visual way to see Software developmentefforts. Kanban workflow principles: Visualize the workflow. Limit WIP - Little's law. ... Little's Law - WIP (Kanban)
  • Introduction to Entrepreneurship

    Introduction to Entrepreneurship

    The Business Solutions Center is a public-private partnershipcomposed of EIU, businesses, financial institutions, foundations & individuals that contribute to provide:. ENTREPRENEURSHIP. RESEARCH. YOUTH. EDUCATIONAL. ACTIVITIES. TRAINING FOR. BUSINESSES/ NONPROFITS. MARKET RESEARCH. FINANCIAL ANALYSIS. MANAGEMENT GUIDANCE THAT CANT BE GIVEN BY...
  • Presentación de PowerPoint - FUNREDES

    Presentación de PowerPoint - FUNREDES

    C/Ramon Ramirez N0.27 Ens.Luperon. 538-8699 -850-7243. Ramón L. Rodriguez. Eduvige Reyes (limpieza) 1 809 871 4907. Yoli. 809-8041705. Evelyne . 695 6202. 238 6431. Carolina. mañana a la tarde. a las 2:30. mañana a la mañana. Merguez. 563 9004 ....
  • Masterbrand template

    Masterbrand template

    The Truth and Reconciliation. A total of 7 calls to action were made by the TRC to improve the health and well-being of Aboriginal people, including a recommendation calling upon all levels of government to provide cultural competency training for...
  • Understanding Intercultural Communication, Second Edition

    Understanding Intercultural Communication, Second Edition

    Arial MS Pゴシック Gill Sans MT Wingdings 2 Verdana Calibri Times New Roman Solstice 1_Solstice 2_Solstice 3_Solstice 4_Solstice Understanding Intercultural Communication Second Edition TODAY'S MENU I. Unpacking Culture Shock I. Unpacking Culture Shock I. Unpacking Culture Shock I. Unpacking Culture...