I-COM Data Science Board: Output
With the advent of the rise of Data Science in Marketing and the I-COM community, we created a Data Science Board in August 2014 to address key industry issues and challenges. The overall goal of the I-COM Data Science Board is to provide a useful community by which they can collaborate to progress the development of their industry sector. As part of the work this Board will be publishing agreed concepts that help them work towards this Board's goal. This page will feature their publications.
Data Science Definition
Data Science involves the theoretical and practical approach to extracting value, which is knowledge and insights, from data. Data Science builds on techniques and methods from various fields, including mathematics, statistics, economics, behavioral sciences, computer science, information technology and engineering.
The data scientist, a practitioner of data science, combines scientific experimental techniques (the testing of hypotheses) with various technologies, methodologies and theories to reveal otherwise hidden patterns in data with the aim of arriving at solutions to business problems.
A primary output of data scientists, results in key business decisions being made that are based on relevant data and tested hypothesis.
Data scientists should be equally strong at managing data, conducting analyses and communicating the findings to various stakeholders across the organisation with varying levels of expertise in managing and understanding data.
The Data Science Education Matrix is a reference document created by the I-COM Data Science Board to clarify what qualifications a data scientist needs. The Matrix will be used by the I-COM Data Science Hackathon Jury as a point of reference for the qualitative review of the entries.
|DATA SCIENCE DOMAIN||DATA SCIENCE EDUCATION||FORMAL EDUCATION||TECHNICAL SKILLS - TOOLS||TECHNICAL SKILLS - METHODS / CONCEPTS||EXPERIENCE|
|MATH / STATS||Statistics, Social Sciences - Economics, Psychology, Natural Sciences - Physics, Mathematics, Quantum Mechanics||Using R, SAS, Julia, SPSS, Wolfram, Mathematica, MATLAB, C++ (with Blitz++, Boost libraries), SAS||Regression, Modelling, Probability, Neural, Networks, Bayesian, Inference, Sampling, Cross Validation||Jobs involved in Analytics, Research,Insights, Data Science|
|PROGRAMMING / HACKING||Computer Science, Engineering,
|Languages: R, VBA, Julia, Python, PHP, C/C++/C#/etc, Java, Ruby Several other languages specific to databases (Pig / Hive/||Data visualization, RESTful API interfacing, Multi-core/node programming, Machine learning, Natural language processing,||Programmer, Hobbyist|
|Databases: Relational (SQL- mySQL, PostegreSQL, SQLServer, Netezza, Impalla, SQLite, etc.) noSQL ( MongoDB, Cassandra, Hbase, Marklogic, etc.)||Relational and non-relatoinal database architechture, indexing optimization, distributed filesystems, database administration.||Database Administrator, Systems admin|
|SUBJECT / BUSINESS KNOWLEDGE||MBA, Business School Courses||Microsoft Office Suite,Tableau, Sharepoint,etc.||SWOT Analysis, Business Plans||Marketing, Controlling,Accounting, Product Mgmt, IT, C-Level|
The purpose of the Data Science Skills Matrix is to allow various stakeholders to agree on an effective model to facilitate data science decision making, specifically in relation to hiring, skill development and team performance.
|beginner / low||advanced / ok||mastery / excellent|
|BUSINESS||work experience||1-3 years||3-10 years||+10 years|
|time management||compromises both scope and time||sometimes compromises scope||delivers on scope and time|
|communication||only able to communicate with peers||able to communicate with management||able to communicate with clients|
|commitment to excellence||follows instructions||sometimes surprises with results||always exceeds expectation|
|ability to drive business||able to optimise existing business||able to create new business opps|
|responsibilities||execute established offerings||conduct customized offerings||create new offerings|
|result delivery||able to contribute content to presentation||able to present findings to client||able to create and deliver presentation|
|RESEARCH||view||may confuse correlation with causality||understanding of gross causality||profound understanding of causality|
|scientific method||able to implement method||able to clearly understands caveats||able to extend current method|
|general skills||avoids common sampling violations||able to communicate caveats clearly||able to bring in unique theory|
|data modelling||probability, stats and CS 101||seeks out the optimal model||comes up with propriety models|
|statistical packages||comfortable with at least one tool||can work across tools||creates propriety tools|
|COMPUTATION||programming skills||stat package level scripting||proficiency in high level programming||proficiency in low level programming|
|system monitoring||overall system level||program > system components||code > cpu threads|
|I/O style||disc spinning common||everything runs from memory||cache computing|
|computational efficiency||poor awareness||performance aware||(energy) efficiency aware|
With the increasing surge of interest in data over the past two years (see illustration below), it is not surprising that more people are researching some potential career paths to enter the field. Unfortunately, identifying a clear path can be more challenging than we might assume. This article attempts to simplify the process.
Sources of the Challenge
Career paths for data scientists vary more than most fields for two reasons:
- Data science is a very young field, with the term itself (“data science”) being less than two decades old.
- Advanced analytics and computer programming form two of the three pillars for data science (domain expertise being the third), and it is rare to find a career path that blends both seamlessly.
- Human Resources professionals, the traditional “gatekeepers” to positions on the career path ladder, as well as many hiring managers, find it challenging to clearly understand and define criteria for selection and job postings, leading to a bit of a jumble accessing a new organization in a data science role.
Basic research tools do not do a great deal to simplify the issue, either. Very little has been written in traditional career path venues, such as human resources sites, making it more difficult to find adequate research.
Rather than create yet another philosophical discussion on the career path for data scientists, this article researches actual job postings and profiles to substantiate what appears in the “real world.” It reviews approximate 30,000 job postings appearing on LinkedIn within the second week of June 2015. While this certainly is not a complete research project, it takes steps in that direction.
Here are some of our observations and conclusions:
- Outside of the traditional sources for Data Science career paths, it is important to remember:
- Data Scientists are not always given a title of “Data Scientist.” You need to search wider, included Advanced Data Analytics, Predictive Modeler, etc.
- Incubator and boot camp programs often served as untapped resources for candidates, positions, and inside knowledge for Data Science.
- The data science career path will involve the three primary skill sets:
- Advanced analytics/ machine learning/ data mining
- Coding with a number of languages and multiple platforms
- Domain expertise, meaning depth and breadth of expertise in employer’s field of endeavor
- Popular requirements
- Job postings regularly require a Masters Degree and frequently request a doctorate
- Software expertise
- For Advanced Analytics: ‘R’ and python lead the pack, and within python, sci-kit is frequently specified
- Big data is almost universally referenced, and therefore Hadoop is an almost universal request, along with MapReduce, Pig, and hive.
- Perhaps most importantly, the absolute number of platforms, languages and applications seem to matter
- SAS, once the ubiquitous requirement, has all but vanished
- Steps in the career path ladder
- Data science positions have a warped distribution in terms of number of positions and level, with mid-senior positions occupying almost two-thirds of all postings.
- Entry-level openings have prospects that are fairly bleak. Here are the results, presenting some of the smallest numbers for the lowest rungs on the ladder:
- Internships (1%)
- Associate (4%)
- Other entry-level (18%)
This leads to two final questions
- How can the organisation manage and cultivate data scientist talent?
Organisations need to assist data scientists in:
- Becoming a contributing member of their departments and the larger organisation. This can be achieved through bringing them into strategy discussions, encouraging interactions with others, and “shadowing” other team members for a day.
- Continually updating their skills through training. Many applications considered commonplace today did not exist 10 years ago, and the rate of new tools and technologies is actually accelerating.
- Testing new approaches to problem-solving, or their skill sets will grow atrophied.
- Where does data science / scientist sit in the organisation?
While data scientists may belong to their own departments, most successful companies are taking Facebook’s approach, where these experts physically reside in the department areas they serve. This improves communication and the day-to-day experience of becoming part of the team they serve, rather than being isolated.
From an article authored and published by I-COM Data Science Board Member T. Scott Clendaniel - VP Analytics, Morgan Stanley.
For any comments, ideas and discussion on the article's contents, you can contact the author directly via LinkedIn