Using R for Business Analytics

Using
for Business Analytics
Matthew A. Lanham ([email protected])
Doctoral Candidate/Merchandise Data Scientist*
MatthewALanham.com
Department of Business Information Technology
*Advance Auto Parts, Inc.
Outline
State of Business Analytics
• What is Business Analytics?
• Putting the BA domains together
• INFORMS CAP framework
What is R and RStudio
• Brief history of R
• Installing R & RStudio
I. Business Problem (Question) Framing
II. Analytical Platform Framing
III. Data
IV. Methodology (Approach) Selection
V. Model Building
VI. Deployment
VII. Model Life Cycle Management
Conclusions
Resources & References
Appendix A: Free R Programming course
SEDSI 2015
Copyright © 2015, MatthewALanham.com
15 mins
45 mins
Fun on
your own
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Business Analytics
What is Analytics?
• “…the extensive use of data, statistical and quantitative analysis, exploratory and predictive models,
and fact-based management to drive decisions and actions (Davenport & Harris, 2007).”
•
Refers to the skills, technologies, applications, and practices for continuous iterative exploration and
investigation of past business performance to gain insight and drive business planning (Wikipedia,
2014).”
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
What has happened?
What will happen?
What is the best course of action?
Exploratory Data Analysis (EDA)
Forecasting/Pattern recognition
Optimization/Heuristics
Uni- and Multivariate Summaries
Classification
Simulation
Clustering - Segmentation - Profiling
Ensemble Modeling
Computational Stochastic
Optimization
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Integrating the BA Domains Together
Big Picture Idea
• Example based on my dissertation research and industry work for a Fortune 500 retailer
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Growth in BA Programs
Growth in Business Analytics Programs
• Since 2010, there have been 35 analytics/business analytics programs established in schools
of business, and another 29 analytics/data science programs established in other colleges
within universities (Rappa, 2014).
Num of New Analytics/Data Science Programs per Year
Number of Academic Programs
35
30
25
20
15
10
5
0
2002
2007
2010
2011
2012
2013
2014
2015
Year
• Note: These numbers do not include the analytics/data science concentrations or tracks
added to legacy programs such as engineering, information systems, and statistics.
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
INFORMS Certified Analytics Professional (CAP)
Analytics is a Process
• “Analytics doesn’t take anything away from Operations Research (OR). Outside our
community, OR seen as a toolkit but Analytics seen as a process.” – Anne Robinson,
Verizon (President of INFORMS)
• Similar to CRISP-DM
INFORMS 7 CAP Domain Areas
I. Business Problem (Question) Framing
CRISP-DM
II. Analytics Platform Framing
Business
Understanding
III. Data
Data Understanding
Data Preparation
IV. Methodology (Approach) Selection
Modeling
V. Model Building
VI. Deployment
VII. Model Life Cycle Management
Evaluation/
Deployment
https://www.informs.org/Certification-Continuing-Ed/Analytics-Certification/Candidate-Handbook
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
INFORMS Certified Analytics Professional (CAP)
INFORMS 7 Domain Area Framework
Created via Job Task Analysis
1. Domain = 7 different domains (Industry independent)
2. Tasks = things that analytics professionals need to do
3. Knowledge = things that professionals need to know to do those tasks
Idea
• Create T-shaped professionals (1990s) rather than I-shaped professionals
• The “T” represents
1. Breadth of skills across the top
2. Depth in one area represented by the vertical bar
Analytics & Technologies
Assortment Planning
(Dissertation Topic)
SEDSI 2015
T-shaped professionals can
1. more easily work in interdisciplinary teams than
those with less breadth
2. can be more effective than those without depth
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
About the R Language
What is R?
• R was developed by New Zealand Professors Robert Gentleman and Ross Ihaka who wanted
a better statistical software platform for their students (Ihaka & Gentleman, 1996).
• It’s more than just statistics today!!
Can my students use it?
• R is an open-source (i.e. FREE!!) and freely accessible software language under the GNU General Public
License, version 2 (Free Software Foundation, 1991)
• R works with Windows, Macintosh, Unix, and Linux operating systems.
• It has a nice balance of object-oriented and functional programming constructs, and unlike most
commercial software, the majority of packages contain many knobs to allow for tuning and
customization of a procedure (Hornick & Plunkett, 2013).
What makes it better than the others?
• As of 2014 there were 5800 available user-developed packages (also referred to as libraries) (Cortez).
• There are 72 different packages offering libraries that have functions to do nearly any machine learning
methodology. You will not find many of these techniques in the commercial packages.
• There is also a growing community of research on prescriptive (optimization) analytics (Cortez). As of
2014, there are more than 60 available packages for optimization and mathematical programming
(Theussl, 2014)
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
R Popularity
TIOBE Programming Community index
• an indicator of the popularity of programming languages.
• “The TIOBE index lists various of these statistical programming languages available, e.g. Julia
(position #126), LabView (#63), Mathematica (#80), MATLAB (#24), S (#84), SAS (#21), SPSS
(#104) and Stata (#110). Most of these languages are getting more popular every month.
The clear winner of the pack is the open source programming language R. This month it
jumped to position 12, while being at position 15 last month.”
Source: TIOBE Index for November 2014
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
R Popularity
2013 KDNuggets.com Survey
• “What programming/statistics languages you used for an analytics / data mining / data
science work during the past year?”
• R (60.9%), Python (38.8%), and SQL (36.6%) were the top three languages used by
practitioners.
Percentage of Respondents
0.70
Languages used Over Past Year by Analytics
Professionals
“hybrids”
0.60
0.50
0.40
0.30
“BDA”
0.20
0.10
0.00
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Technologies in Demand
O’Reilly’s 2013 Data Science Salary Survey
• “What commercial languages and software have you used for an analytics, big data, data
mining, or data science during the past 12 months for a real project”
Language/Software Used for Real Projects Over Past
Year
SAS/SPSS
Language/Softwaare Used
Ruby
Mahout
D3
Tableau
Data role
JavaScript
Non-data role
Network/Graph
Java
Hadoop
Excel
Python
R
SQL
0%
20%
40%
60%
80%
Respondent Percentage
• Similar to KDnuggets annual survey on analytics software, SQL, R, and Python were the
top three most popular[source: O’Reilly]
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Download and Install R and RStudio
1. Download and install the latest version of R from the CRAN-R website (http://cran.rproject.org/).
•
•
Video for Windows Users (https://www.youtube.com/watch?v=LII6of-5Odw)
Video for Mac Users (https://www.youtube.com/watch?v=xokJUwn0mis)
2. Next, download the popular R IDE, RStudio
(http://www.rstudio.com/products/rstudio/download/).
•
Video for Windows/Mac (https://www.youtube.com/watch?v=7rFMLnm3sAE)
We will be using the Windows version but the functionality in Mac is expected to be the
same. This may not be the case for Linux users.
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Brief Intro to R Programming
•
Show the basics of R in RStudio
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Understanding and Discussing the Problem
Business Problem (Question) Framing
• Try to make the assignments or projects big picture focused
• Taught well in TQM, Software Engineering, and Project Mgmt. courses
• I spend a lot of time here and revisit this domain area regularly to make sure we are
creating a solution for the real problem(s)
• Recently interviewed several graduates from statistics and computer science to work with
me and most had a difficult time with this
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Understanding and Discussing the Problem
Retail Example
Every period a category manager (a.k.a. Merchant) for an auto parts retailer will review their
product category by removing SKUs that are not selling as expected and add SKUs that have
higher potential to sell. Historically the CM has used a forecast generated from the company’s
commercial inventory and planning system that was designed for grocery and clothing
retailers to gauge selling potential for all products.
New
Inventory
Added
Unproductive
Inventory
Removed
The CM believes the forecast is too basic, does not account for unique aspects of the auto
parts business, and thus is not providing her adequate decision support. She believes she
could improve her category performance if she could analyze SKU demand in other ways, but
is at a loss at where to begin.
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Understanding and Discussing the Problem
Retail Example
Some unique aspects about auto parts retailers as compared to other retail segments such as
grocery stores and clothing stores
• the products do not expire,
• there are few seasonal trends among products,
• product lines do not have major adjustments over seasons, and most interestingly
• it is common for many products stocked within a store to sell only one or two units per
year, especially for products stocked in the backroom of the store.
Tasks 1/2 – Requirements elicitation/Stakeholders
• She would like have another measure that will gauge the selling potential of a SKU within a
particular store that more easily differentiates one SKU from another.
• She is the project owner as this is her category and we are only focusing on her “Wigit”
category. However, stores and commercial customers are also impacted by this project.
Tasks 3 – Decisions?/ Data?
• She has control over all SKUs in the Wigit category and which stores each Wigit will be
stocked in.
• She does not control how many SKUs are stocked in a store. That is a firm decision
supported by the firm’s commercial inventory and replinishment system.
• Measures about a store’s demographics, SKU’s characteristics, previous POS behavior, etc.
is available and is analyzed by her (the domain expert) in helping her make stocking
decisions
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Understanding and Discussing the Problem
Tasks 4/5 – General problem specification
• Decisions – Which SKU (yes/no) should be added to a particular store?
• Objectives – She wants to increase the Wigit category’s sales
• Constraints/Parameters
• She does not want to add SKUs to a store that have “low” potential to sell
• She as fixed initial budget ($) for wigits
• Stores have a finite amount of shelf space dedicated for wigits
• Business KPIs/Benefits – Improve the wigit assortment based by reducing non-working
inventory (NWI) and increasing category store sales
Tasks 6 – Stakeholder agreement
• We review the stated tasks as we understand them as she agrees we should proceed to
developing an analytical solution
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Analytical Platform Framing
Tasks 1 – Analytics problem formulation
• Generate probability estimates for store-SKU combinations
Task 2 – Drivers?
• Higher probability predictions will lead to a greater chance of being incorporated
in the wigit assortment for a store
Task 3 – Key assumptions?
• Store-SKU probability estimates generated will be similar to historical
proportions that actually sold
• Not all store-SKU combinations exist historically, but we can estimate them based
on a validated model of actual historical store-SKU combinations that were
stocked for at least one year
Task 4 – Success metrics?
• Higher probability predictions will lead to a greater chance of being incorporated
in the wigit assortment for a store
Task 5 – Stakeholder agreement
• She believes having a store-SKU probability is exactly what would help her with
her assortment planning decision and wants us to proceed
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Tasks 1 – Data requirements
• Obtain variables believed to be important to probability to sell based on her domain knowledge
• Examples: Application counts, demographics
Task 2 – Acquire data
Task 3 – Get data ready for analysis
• Store-SKU probability estimates generated will be similar to historical
Task 4 – Document & Report findings
• Higher probability predictions will lead to a greater chance of being incorporated in the wigit
assortment for a store
Task 5 – Refine business problem
• She believes having a store-SKU probability is exactly what would help her with her assortment
planning decision and wants us to proceed
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Task 3 – Get data ready for analysis
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
In Rstudio
• Double-click data and it will show you the first 1000 records in a table
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
In Rstudio
• This is nice if you just want to visually initially inspect the data without using the head() function.
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
In Rstudio
• Generic summary() function
•
Five number summary() for a specific variable
•
Generating a quick basic plot
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Reproducible Research in R
Idea - make analytic data and code available so that others may reproduce findings
1. Make data available
2. Make computations/code available
These allow others to validate and come to the same conclusions you reached
• This can be difficult in Excel unless you make sure to script everything using VBA and do not point-andclick on anything
• Golden rule of reproducible research: Script everything!!
• Think about your own research process… or working with multiple people (Excel is a nightmare to me
in these situations)
knitR
• Developed by http://yihui.name/knitr/
Pros
• A variety of documentation languages can be used (RMarkdown, LaTex, HTML)
• Uses the R programming language, but others are allowed
• Can export to PDF, HTML, Word
• Can be used to create manuals, short/medium-length technical documents, tutorials, reports
(especially if generated periodically), and data preprocessing documents/summaries
Cons
• Not good for long research articles
• Complex time-consuming computations
• Documents that require precise formatting
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
knitR
• File -> New -> R Markdown
What’s going to happen here
1. You write the RMarkdown document (.Rmd)
2. knitr produces a Markdown document (.md)
3. knitr converts the Markdown document into HTML
(by default)
4. .Rmd  .md  .html
* Note: You only change the .Rmd file
• All we are going to
do is paste our code
chunks within these
things
```{r}
#our code here
```
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
knitR
• Here we can .html document of our analysis so far.
• Setting echo=FALSE will hide the R code from being displayed in the file but still run the code and
return the results. echo=TRUE is default
•
•
Your code will not be “knitted” if you have an error, thus ensuring reproduciblity
Also, you can customize your knitR file and make it look really nice for documentation and presentation
purposes
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
RPubs
• a FREE place to publish your analysis (https://rpubs.com/)
•
•
This might be a great place for you to check your student’s work or project and they can just send you a
URL
Also, this is a good venue for students to begin a portfolio of analytics projects to show to employers
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Back to our dataset
• You can create your own functions in R. I created one called DataQualityReport.R and will use that to
check our dataset quality
•
Once you load your function, it is available to you during your current R session. It will show up in
RStudio in your global environment
•
So we can see there are some issues so far in our dataset. We need to make sure R knows what the
types of our variables are. DC is really a factor/categorical variable. It’s numeric value means nothing.
Same for STORE_NUMBER
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Defining the correct data type
• We only had to change a handful of these. Other options you might use as.integer(), as.ordered(),
as.character(), as.numeric()
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Creating a couple custom functions that use R packages
• Some variables have missing data so lets impute these values
•
Quickly walk them through the basics of the Impute.R code
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Data
Task 4 – Document & Report findings
• Higher probability predictions will lead to a greater chance of being incorporated in the wigit
assortment for a store
• Using knitR
1. Create a data definitions table and review this with the domain expert or other stakeholders to
make sure the data is measuring what you expect
2. Impute values, remove oddities, delete or correct observations as needed
3. Summarize the dataset with tables, statistics, and graphical summaries
Task 5 – Refine business problem
• She believes having a store-SKU probability is exactly what would help her with her assortment
planning decision and wants us to proceed
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Methodology Selection
Task 1 – Identify possible methods
• Here we are doing predictive modeling so something under this domain
• Several possible classification algorithms are available
Task 2 – Select software tools
• Here we chose R because it provides all the functionality we need to provide the CM the solution she
needs
• We will take advantage of the caret package in R
Task 3 – Test approaches
• We will perform cross-validation using a verification & validation using an 80/20 rule (80% to train, 20%
to test)
Task 4 – Select approaches
• Lets pick a couple basic classification methods such as Logistic regression and a classification tree
• We will choose the model that has the been test accuracy and use the probabilities from that model
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Model Building
Task 1 – Identify model structures
• Here we are doing predictive modeling so something under this domain
• Several possible classification algorithms are available
Task 2 – Run and evaluate models
• Here we chose R because it provides all the functionality we need to provide the CM the
solution she needs
Task 3 – Integrate the models
• We will perform cross-validation using a verification & validation using a 60/40 rule (60% to
train, 40% to test)
Task 4 – Document and communicate findings with stakeholders
• Lets pick a couple basic classification methods such as Logistic regression and a
classification tree
• We will choose the model that has the been test accuracy and use the probabilities from
that model
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Model Building
Caret package
• The caret package provides users one unifying framework using just one function to predict
without have to specify all the possible options.
• Currently 180 different R modeling packages have been integrated with caret
• Check out http://topepo.github.io/caret/index.html for more information
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Model Building
Caret package
• Partition data into training and testing sets
• For an example, I next rebalance the training data as follows. There are various ways one
might want to rebalance.
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Model Building
Caret package
• Now lets fit a logit model using the training data
• Fit a classification tree
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Model Building
Caret package
Pretty horrible results – but just for demonstration purposes. The data isn’t real 
Logit results
SEDSI 2015
CART results
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Task 1 – Perform business validation of model
• Does not mean that we provide false answers or provide a simpler model, but we might need charts,
tables, words, etc. so that you allow them to process the information you are trying to convey
• May require revisiting the business problem framing and analytics problem framing
• Ensure answer is tied to the question
• Requires knowledge of client & their business
• Provide explanations in their language/context
Task 2 – Deliver report with findings
• Ensure the findings are reference to the problem
• State/display findings so as to be understood – not our language but the stakeholder’s language
• Use appropriate visual techniques to reinforce
Task 3 – Create production requirements
Task 4 – Create production model/system
• We will use the Shiny package here
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Task 4 – Support deployment tasks
• The job isn’t done here - How has the model helped them or hindered them?
• Stakeholders may request a laundry list of things they want here and negotiate what is necessary
• Provide them a roadmap showing them that in month 1 you’ll get A, month 2 you’ll get B and C, and so
on
• Make allies, not adversaries
• Present strategy
• Ensure understanding of how this relates to original problem statement
• Make sure the answers you continue to provide them are useful to them
• Identify those affected by deployment
• Periodically review with client
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Shiny package in R
• Shiny allows one to create web applications without having to know all the typical web
development stuff such as javascript, CSS, etc.
• Shiny uses a reactive programming paradigm, meaning reactive expressions are used that
update values whenever their reactive values are changed
• Default style is based on Twitter’s bootstrap theme, but can be customized
• Check out the examples to get an idea of what your students could be able to do
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Walk through one basic example
Shiny applications need only 2 components:
1) A user-interface script (ui.R)
controls the layout and appearance of your app
Specify the type of inputs and outputs as well as positions
2) A server script (server.R)
contains the instructions that your computer needs to build your app
Data processing functions, the outputs, and any reactive objects
ui.R
server.R
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Basic Shiny Example
Moving the value will dynamically change the histogram
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Basic Shiny Example
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Lets create our platform
Steps
1. Create a folder for your web app called “webapp”
2. Put server.R and ui.R files in this folder
3. In R studio set R’s working directory to where this folder is located via
setwd("C:/URBA/webapp/")
4. You can load the Shiny package in R via
library(shiny)
5. Run the web app using the runApp() function
runApp(webapp)
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Business Analytics
R & RStudio
Business Framing
Analytical Framing
Data
Methodology
Model Building
Deployment
Deployment
Check these out!!
• http://shiny.rstudio.com/gallery/
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
Life Cycle Management
Task 1 – Document initial structure
• Very important to client & to future business to document initial structure
• Prepare client instructions
• Ensure client understanding
Task 2 – Track model quality
• Follow-up, review to ensure model is returning same answer
• Review the question as well
• Investigate and make chances if answer is
Task 3 – Recalibrate and maintain the model
• Set a schedule to review and refit model(s) as new data become available, new attributes are found
that have predictive power, etc.
Task 4 – Support training activities
• Deliver documented model
• Ensure understanding of all affected
• Don’t just ask people if they understand – most people will say yes even if they don’t really
• Ask open ended questions and get them to say what their understanding is
• Develop training/informational presentations (something more detailed than this presentation)
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
Life Cycle Management
Task 5 – Evaluate the business benefits
• Follow-up to ensure that model is consistent
• Investigate any changes in model response
• Ensure that changes are include d in documentation & training
• Say specifically what has changed and what they can expect to see
• Over-communicate and expect to repeat yourself (just like talking to students)
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
Life Cycle Management
Task 1 – Document initial structure
Here we are doing predictive modeling so something under this domain
• Several possible classification algorithms are available
Task 2 – Track model quality
• Here we chose R because it provides all the functionality we need to provide the CM the
solution she needs
Task 3 – Recalibrate and maintain the model
• We will perform cross-validation using a verification & validation using an 80/20 rule (80%
to train, 20% to test)
Task 4 – Support training activities
• Lets pick a couple basic classification methods such as Logistic regression and a
classification tree
Task 5 – Evaluate the business benefits
• Lets pick a couple basic classification methods such as Logistic regression and a
classification tree
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
Conclusions
Pros
1. Many resources
• http://www.r-bloggers.com/
• Springer’s USE R! series
2. Great graphics capabilities
• ggplot2 (not Tableau or SAS Visual Analytics, but pretty nice)
3. Descriptive-Predictive-Prescriptive-BDA
Cons
1. Memory
• R works in memory so if you’re working with large data sets (>5 gb) you’ll probably
need more than 8 GB of RAM
• I have 24 GB of RAM and rarely have issues
2. Package Quality
• ~85 percent of the packages I use are high quality
• Some authors code inefficiently which may cause the user to run out of memory or
increase run time even on small data sets
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
References & Resources
The books outlined in red are free online (click the book to go to the url).
Descriptive & Predictive Analytics
Prescriptive Analytics
R books on anything from
Springerlink’s UseR! Series
R Programming
(free .pdf downloads through
university)
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: R Programming
B: Find packages
References & Resources
R blogs
• http://www.r-bloggers.com/
R books
• So many, Google “Springer UseR!”
SEDSI 2015
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics
Life Cycle Mgmt.
Conclusions
Resources
A: Some packages
B: R Programming
C: Project Ideas
R Programming
• If you like video instruction – check out Coursera’s (free) P programming course
https://www.coursera.org/course/rprog
Data Structures
Subset Data
• Partial matching
• Remove missing values
Vector Operations
• Partial matching
• Remove missing values
Control Structures
• If
• For
• While
• Repeat
• Next
• Return
Functions
• If
• For
Scoping Rules
• If
• For
SEDSI 2015
Dates and Times
Loop function
• Partial matching
• Remove missing values
Debugging
• Partial matching
• Remove missing values
Profiling
• Partial matching
• Remove missing values
Simulation
• Partial matching
• Remove missing values
Copyright © 2015, MatthewALanham.com
Using R for Business Analytics