Introduction to Data Science

Pinterest LinkedIn Tumblr

Solving problems by using all about data is called data science. The problem could be decision making such as identifying which email is spam and which is not. Data Science is a blend of many tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. 

So, Data Science is primarily used to make decisions and predictions making use of predictive causal analytics, prescriptive analytics (predictive plus decision science) and machine learning.

  • Predictive causal analytics: If we want a model that can predict the possibilities of a particular event in the future, we need to apply predictive causal analytics. 
  • Prescriptive analytics: If we want a model that has the intelligence of taking its own decisions and the ability to modify it with dynamic parameters, we certainly need prescriptive analytics for it. This relatively new field is all about providing advice. 
  • Machine learning for making predictions: If we have transactional data of a finance company and need to build a model to determine the future trend, then machine learning algorithms are the best bet. 
  • Machine learning for pattern discovery: If we don’t have the parameters based on which we can make predictions, then we need to find out the hidden patterns within the dataset to be able to make meaningful predictions. 

The most common algorithm used for pattern discovery is Clustering.

Lifecycle of Data Science

Discovery: Before we begin the project, it is important to understand the various specifications, requirements, priorities and required budget. 

Data preparation: In this phase, we require an analytical sandbox in which we can perform analytics for the entire duration of the project. 

Model planning: In this step, we will determine the methods and techniques to draw the relationships between variables. These relationships will set the base for the algorithms which you will implement in the next phase. We will apply Exploratory Data Analytics (EDA) using many statistical formulas and visualization tools.

Operationalize:  In this phase, we deliver final reports, briefings, code and technical documents. In addition, sometimes a pilot project is also implemented in a real-time production environment.         
Communicate results: Now it is important to evaluate if we have been able to achieve our goal that we had planned in the first phase. So, in the last phase, we identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure based on the criteria developed in Phase 1.

According to udacity the essential skills of a data scientist are:

  • Programming
  • Machine-learning
  • Statistics
  • Data wrangling: data processing, formatting and transformation
  • Data visualization and communication
  • Data intuition 

Bioinformatics uses statistics, machine learning and other informatics techniques while data science is used in fields like bioinformatics and biostatistics, and in others areas as well. Bioinformatics has become an essential interdisciplinary science for life science and biomedical sciences. 

Presently a large list of bioinformatics tools and softwares are available which are based on machine learning and data science. The twin of bioinformatics, called computational biology, have emerged largely into development of software and application using machine learning and deep learning techniques for biological image data analysis. 

Write A Comment