# Data science reference guide:

## 1 Data science

Data science is an interdisciplinary field that uses scientific methods, process, algorithms and systems to extract knowledge and insights from structured and unstructured data https://en.wikipedia.org/wiki/Data_science

### 1.1 Machine learning

#### 1.1.4 Causal inference

Machine learning concepts

#### 1.1.6 Cluster, feature selection

1. Curse of dimensionality
2. Bias-variance tradeoff, neural networks, SVM, etc.
3. Statistical language such as R, or Python-Python
4. Scripting languages as python, sh, php, perl

### 1.2 Techniques - Algorithms

#### 1.2.1 Linear regression

Is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables

#### 1.2.3 Linear SVM and Kernel SVM

Linear Support Vector Machine (SVN) and Kernel SVN

#### 1.2.9 PCA, SVD and LDA

1. PCA - Unsupervised method to understand global properties

Use Scipy, scikit-learn

2. Least squares and polynomial fitting for datasets with low dimensions

Use Numpy, sip

3. Constrained linear regression, weights do not misbehave

Use scikit-learn

4. k-means, unsupervised clustering algorithm, expectation maximization algorithm

Use scikit-learn

5. Logistic regression, nonlinearity (sigmoid function), classification

Use scikit-learn

6. SVM, support vector machines, linear models->Loss function

Use scikit-learn

#### 1.2.10 Feedforward neural networks, Multilayered logistic regression classifiers many layers separated by non-linearity

Use scikit-learn->Neural networks, keras

## 2 Tools

### 2.1 Numpy

Scientific tools for Python. https://numpy.org/

### 2.2 Scipy

Open-source software for mathematics, science, and engineering. https://www.scipy.org

### 2.3 Scikit-learn

A set of python modules for machine learning and data mining https://sklearn.org/

#### 2.3.1 ScikitLearn.jl

implements the popular scikit-learn interface and algorithms in Julia, it supports both models from the scikit-learn library vi PyCall and Julia scosystem https://github.com/cstjean/ScikitLearn.jl

### 2.4 Keras

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. https://keras.io/

### 2.5 Tensorflow

Library for computation using data flow graphs for scalable machine learning https://www.tensorflow.org

### 2.6 JuliaML

One-stop-shop for learning models from data. It provides general abstractions and algorithms for modeling and optimization, implementations of common models, tools for working with datasets, and much more https://juliaml.github.io/

### 2.7 LIBSVM

A Library for Support Vector Machines, https://www.csie.ntu.edu.tw/~cjlin/libsvm/

Interfaces and extensions to LIBSVM:

### 2.8 MLJ

A Machine Learning Framework for Julia https://github.com/alan-turing-institute/MLJ.jl

1. Mineracao