Monday, November 6, 2017

What is Engineering data

Engineering data is the foundation for all of the recent, current, and future data hypes: machine learning, deep learning, big data, data science, etc. The success and adoption of these hypes is predicated on data being structured properly and available. However, when customers (internal and external) are not clear on what their expectations are and/or the big picture of what they are trying to use the data for, data engineers are often blamed. 

Machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; These analytical models allow researchers, data scientists, engineers, and analysts to produce reliable, repeatable decisions and results.

Communication is key! Couple of points, "big data" has always been there, just ask Statisticians, as for the data science hype, now every one calls themselves data scientist, somehow knowing SQL makes a person data scientist now. Data Science graduates on a daily basis and many are lacking basic analytical skills and believe Data Science and Data Analysis are all about having basic end-user level knowledge of a new fancy software. It is a hype for a fact.
We can have the best people using the data for analytics or modeling, but if we don’t have people that know how to build the systems to make our data available in a consistent reliable manner then you will just be part of the hype. Having multiple data science teams leads to friction between the teams. While they, mainly concerned about product delivery, has an open door policy across the Enterprise, the other, more concerned about, well we don't know what, treats everyone else with a sense of superiority and thinks that their work is super secret somehow.

The reason for it's crashing is most likely because the software you are using is not a sever software/service, eg. if you use excel everything is processed in memory(like MS products is doing it all the time) if you had a db then it had been paging to disk when then the assign memory get full (short cut explanation).  

My understanding of Data scientist is they try to hard code/program different scenarios to get an answer.  If you know some about philosophy you cans tart there. You say what if condition 1 =1, then they add different condition with different variables.  It's not actually that complicated from my understanding. 

The data science hype now is focused on AI. After the dust settles they will realize that not all big data and data science is what's going on in Google or Facebook. Corporate needs another big data and data science. They need to make sense of their own data and solve their own problems using whatever techniques. I believe Optimization and statistical models are more important than AI for most of corp-orates. They can buy the hard stuff like speech and text analytics from Google or Amazon or anyone of this scale and capacity. Yet they have to retain a team to solve their specific problems using science.

That the industry does not want to reveal the pressing questions they want to get answered - obviously due to competition and conflict of interest. There are some good concepts behind data science and modeling (both mathematical and statistical) lies at the heart of obtaining directed insights. But yes, academia-industry partnership is very much lacking, which has the potential to make data science a hope or a hype.

No comments:

Post a Comment