Data Science

Data Science

Apply to Course Now

About the Data Science

Course Code: Data1001

Duration: 3 days

Course Outline

Module 1 – Introduction to Data Science

  • Origins of Data Science and a brief history of the Big Data revolution.
  • The Big Data landscape.
  • How much data is there really, and does it matter?
  • Un-siloing data: use paradigms for organisational data and public data.
  • Descriptive, predictive and prescriptive analysis.
  • From recommendations to insights: black-box and white-box analytic.

Module 2– Data as an Asset

  • The V’s of Big Data: Volume, Velocity, Variability, Veracity.
  • Data business strategies.
  • Data sources, synergies and differentiation.

Module 3– Data Life Cycle

  • The analytic value chain.
  • Overview of the data analysis cycle, connecting data science to the business problems.
  • Work cycle of a data scientist: wrangling, modelling and validation.
  • Managing research.

Module 4 – Privacy and Ethics in Big Data

  • The societal impacts of Big Data.
  • Privacy in Australia and global perspectives.
  • Big Data ethics: history and current thinking.
  • Opportunities and risks for organisations and individuals

Module 5 – Data Engineering for Analysis

  • Data Science engineering and its drivers for change.
  • Data volumes, data structures, and how they vary.
  • Data Science architectures: the common stages.
  • The Usual suspects: Distributed File Systems, Map Reduce, Spark

Module 6 – Visualisation

  • Practical and effective visualization: beyond bar charts.
  • Finding the unexpected: the role of visualization in exploratory analysis.
  • Communicating findings: the role of visualization in communicating Data Science outputs.
  • Standard tools: R, Tableau, D3.

Module 7 –Data Wrangling and Exploratory Analysis

  • Determining data quality. Data cleansing.
  • Entity matching.
  • Imputation.
  • Background modelling.
  • Exploratory analysis.

Module 8 – Fundamentals of Statistics

  • Types of data: numerical, categorical, ordinal.
  • Statistical summaries: mean, standard deviation, quantities, correlation.
  • Simple data visualization: histograms, boxplots, time plots and scatterplots.
  • Cross-tabulations.
  • Causality vs. association, independence.
  • Randomisation and random sampling.
  • Statistical inference using bootstrapping.

Module 9 – Model Creation and Validation

  • Prediction: linear regression, nonparametric regression, k-NN.
  • Forecasting: auto.arima and Error-Trend-Seasonal exponential smoothing algorithms.
  • Hold-out sets, cross-validation, AIC.
  • Classification: logistic regression, classification trees, SVM.
  • Clustering: k-means, hierarchical clustering.
  • Supervised vs. unsupervised vs. semi-supervised learning.
  • Dimension reduction: principal components.
  • Languages and environments (e.g. R, Python, MATLAB or even Excel) and standards (PMML).

Module 10 –Operationalisation and the Model Life Cycle

  • Determining the needs: on how much data must decisions be taken, how often and how quickly must they be made, how often must models be refreshed?
  • Plugging into existing data paths and choosing appropriate technologies.
  • Stale models and  model refreshing.
  • Operationalisation from a business perspective: determining value and making Data Science outputs part of standard business and decision-making processes.

Module 11 – Panel: Building a Data – Driven Enterprise

  • Data Science as a process, rather than as a point event.
  • The role of high-level management in enabling data-driven decisions.
  • The role of direct management: on the un-Gantt-ability of research.

Module 12 – Case Study

  • Operational efficiency by predictive analytics
  • Architectural choices for integration and efficacy


Best it courses i took compare with other training centers. Great training courses!

Thank you for all your help and assistance over the years with our staff training.
I would have no hesitation in recommending you to my friends.

patient and kind training staffs. Good service.

Alison Guan