The story of how data scientists started to be sexy is mainly the story of this coupling of older discipline of statistics with a really small one–computer science. The phrase “Data Science” has emerged just lately to specifically designate a brand new career which is actually anticipated to make good sense of the huge retailers of big data. But making sense of information has a great deal of history and has been reviewed by others, computer scientists, librarians, statisticians, and scientists for years. The next timeline traces the evolution of this phrase “Data Science” and the use of its, tries to explain it, and associated terms.
1962 John W. Tukey writes in “The Future of Data Analysis”: “For a rather long time I believed I would have been a statistician, keen on inferences from the specific to the common. But as I’ve seen mathematical statistics evolve, I’ve had cause to question and question I’ve come to believe my main interest is actually in?data evaluation Data evaluation, as well as the components of stats that adhere to it, must take on the qualities of science instead of those of mathematics information evaluation is intrinsically an empirical science How essential and just how important is actually the rise of the stored program electronic computer? In most instances the answer might shock many by being very important but not important,’ even though within others there’s no question but what the laptop has become vital.’ In 1947, Tukey coined the phrase bit that Claude Shannon utilized in his 1948 newspaper A Mathematical Theory of Communications. In 1977, Tukey published?Exploratory Data Analysis, arguing that much more emphasis had to be put on using information to recommend hypotheses to evaluate which Exploratory Data Confirmatory and Analysis Data Analysis “can – as well as must – proceed edge by side.”
1974 Peter Naur publishes Concise Survey of Computer Methods in Sweden and the United States. The book is a survey of contemporary data processing methods that are used in a wide range of applications. It is organized around the concept of data as defined in the IFIP Guide to Concepts and Terms in Data Processing: “[Data is] a representation of facts or ideas in a formalized manner capable of being communicated or manipulated by some process.“ The Preface to the book tells the reader that a course plan was presented at the IFIP Congress in 1968, titled “Datalogy, the science of data and of data processes and its place in education,“ and that in the text of the book, ”the term ‘data science’ has been used freely.” Naur offers the following definition of data science: “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”
1977 The International Association for Statistical Computing (IASC) is established as a Section of the ISI. “It is the mission of the IASC to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.”
1989 Gregory Piatetsky-Shapiro organizes and chairs the first Knowledge Discovery in Databases (KDD) workshop. In 1995, it became the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
September 1994 BusinessWeek publishes a cover story on “Database Marketing”: “Companies are actually collecting mountains of info in relation to you, crunching it to anticipate exactly how likely you’re buying a product, along with making use of that expertise to craft a marketing email precisely calibrated to help you to do so… An previous flush of interest caused through the spread of checkout scanners within the 1980s ended for prevalent disappointment: Many businesses had been very overwhelmed by the large amount of information to do something helpful considering the information… Still, a lot of businesses assume they’ve no option but to brave the database marketing frontier.”
1996 Members of the International Federation of Classification Societies(IFCS) meet in Kobe, Japan, for their biennial conference. For the first time, the term “data science” is included in the title of the conference (“Data science, classification, and related methods”). The IFCS was founded in 1985 by six country- and language-specific classification societies, one of which, The Classification Society, was founded in 1964. The classification societies have variously used the terms data analysis, data mining, and data science in their publications.
1996 Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth publish “From Data Mining to Knowledge Discovery in Databases.” They write: “Historically, the idea of seeing useful patterns in information has been given a selection of names, like data mining, know-how extraction, info discovery, info harvesting, information archeology, as well as information pattern processing… In the view of ours, KDD [Knowledge Discovery inside Databases] describes the general practice of finding useful knowledge from information, and data mining describes a specific stage in this particular process. Data mining is the use of certain algorithms for removing patterns from data… the extra stages in the KDD procedure, such as information planning, data selection, information cleaning, incorporation of adequate prior knowledge, and the right interpretation of the outcomes of mining, are actually vital to make sure that useful knowledge is actually derived from the information. Blind application of data mining solutions (rightly criticized as information dredging within the statistical literature) could be a lethal task, effortlessly resulting in the discovery of invalid and meaningless patterns.”
1997 In his inaugural lecture for the H. C. Carver Chair in Statistics at the University of Michigan, Professor C. F. Jeff Wu (currently at the Georgia Institute of Technology), calls for statistics to be renamed data science and statisticians to be renamed data scientists.
1997 The journal Data Mining and Knowledge Discovery is launched; the reversal of the order of the two terms in its title reflecting the ascendance of “data mining” as the more popular way to designate “extracting information from large databases.”
December 1999 Jacob Zahavi is quoted in “Mining Data for Nuggets of Knowledge” inside Knowledge Wharton: “Conventional statistical techniques work effectively with tiny data sets. Today’s databases, nonetheless, can involve large numbers of scores and rows of columns of information Scalability is a big problem in data mining. Another technical problem is actually creating designs which can do a much better job analyzing information, detecting non linear associations as well as interaction between components Special data mining equipment might have to be created to deal with web site decisions.”
2001 William S. Cleveland publishes “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” It’s a strategy “to enlarge the main aspects of specialized work of the area of statistics. Because the program is actually driven and implies significant change, the altered area will be called’ information science.'” Cleveland places the proposed brand new discipline inside the context of computer science as well as the contemporary work of data mining: “…the advantage to the information analyst has been restricted, because the expertise among computer experts about how you can think of and deal with the evaluation of information is actually limited, just like the expertise of computing locations by statisticians is actually limited. A merger of expertise bases would generate a strong force for innovation. This implies that statisticians must be to computing for knowledge these days simply as information science looked to mathematics within the past. … departments of information science should include faculty members that devote the careers of theirs to developments in computing with information as well as who create partnership with computer scientists.”
2001 Leo Breiman publishes “Statistical Modeling: The Two Cultures” (PDF): “There are 2 cultures in the usage of statistical modeling to reach out conclusions from information. One assumes that the information are produced by a specified stochastic data model. The various other applications algorithmic models as well as treats the information mechanism as unknown. The statistical society has been dedicated to the nearly extraordinary use of information models. This dedication has led to irrelevant concept, questionable conclusions, as well as has maintained statisticians from working on a big range of fascinating present problems. Algorithmic modeling, each in practice and theory, has created quickly in fields exterior statistics. It may be used both on big complicated data sets and as an informative and accurate more way to data modeling on smaller sized data sets. If the goal of ours as a field is actually using information to resolve problems, then we have to move away from extraordinary reliance on information models and adopt a far more diverse set of tools.”
April 2002 Launch of Data Science Journal, publishing documents on “the management of databases and data in Technology and Science. The range of the Journal contains descriptions of information systems, the publication of theirs on the net, applications and authorized issues.” The journal is actually posted through the Committee on Data for Technology and Science (CODATA) of International Council for Science (ICSU).
January 2003 Launch of Journal of Data Science: “By’ Data Science’ we entail practically anything that has a thing to carry out with data: Collecting, examining, modeling…… still the most crucial part is its applications–all kinds of uses. This log is actually dedicated to applications of statistical techniques at large…. The Journal of Data Science will offer a platform for all information workers to present the views of theirs and exchange ideas.”
May 2005 Thomas H. Davenport, Don Cohen, and Al Jacobson publish “Competing on Analytics,” a Babson College Working Knowledge Research Center report, describing “the emergence of a new form of competition based on the extensive use of analytics, data, and fact-based decision making… Instead of competing on traditional factors, companies are beginning to employ statistical and quantitative analysis and predictive modeling as primary elements of competition. ” The research is later published by Davenport in the Harvard Business Review (January 2006) and is expanded (with Jeanne G. Harris) into the book Competing on Analytics: The New Science of Winning(March 2007).
September 2005 The National Science Board publishes “Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century.” Among the suggestions of the article reads: “The NSF, doing work in partnership with the community and collection managers at large, ought to act to create as well as mature the career track for information scientists as well as to make certain that the analysis business has a sufficient amount of high quality information scientists.” The report defines information experts as “the info as well as computer scientists, data source &amp; software engineers &amp; programmers, disciplinary pros, curators as well as pro annotators, librarians, archivists, and some, that are important to the prosperous management of a digital information collection.”
2007 The Research Center for Dataology and Data Science is established at Fudan University, Shanghai, China. In 2009, two of the center’s researchers, Yangyong Zhu and Yun Xiong, publish “Introduction to Dataology and Data Science,” in which they state “Different from natural science and social science, Dataology and Data Science takes data in cyberspace as its research object. It is a new science.” The center holds annual symposiums on Dataology and Data Science.
July 2008 The JISC publishes the final report of a study it commissioned to “examine and make recommendations on the role and career development of data scientists and the associated supply of specialist data curation skills to the research community. “ The study’s final report, “The Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs,” defines data scientists as “people who work where the research is carried out–or, in the case of data centre personnel, in close collaboration with the creators of the data–and may be involved in creative enquiry and analysis, enabling others to work with digital data, and developments in data base technology.”
January 2009 Harnessing the Power of Digital Data for Science and Society is published. This report on the Interagency Working Group on Digital Data to the Committee on Science of National Science at the same time as Technology Council states that “The nation calls for determining and promote the progress of new disciplines as well as pros specialist in coping with powerful issues and the complex of digital upkeep, suffered re purposing, reuse, as well as gain access to of information. Lots of disciplines are really seeing the progress of a completely new kind of info science and control specialist, accomplished in the computer, information, and also info sciences arenas and in an extra domain name science. These folks are in fact essential to current results and the future of the scientific enterprise. Nevertheless, these individuals often receive little recognition for the efforts of theirs as well as have little career paths.”
January 2009 Hal Varian, Google’s Chief Economist, tells the McKinsey Quarterly: “I keep thinking the hot task in the next 10 years is going to be statisticians. Although who would have guessed that computer engineers would have been the hot task of the 1990s, people think I am joking? The capacity to take information – to have the ability to fully understand it, to thing to do it, to draw out worth from it, to imagine it, to communicate it – that is going to be a very significant ability in the following years Because at this point we truly do have ubiquitous and free essentially data. So the complimentary scarce element is actually the potential to comprehend the information as well as extract worth from it I do feel those abilities – of having the ability to access, realize, and talk the insights you receive from data analysis – are actually likely to be incredibly important. Administrators have to have the ability to get into and understand the information themselves.
March 2009 Kirk D. Borne and other astrophysicists submit to the Astro2010 Decadal Survey a paper titled “The Revolution in Astronomy Education: Data Science for the Masses “(PDF): “Training the future generation in the fine art form of deriving wise understanding from information is actually required for the good results of economies, businesses, agencies, projects, communities, and sciences . This’s the case for both experts (scientists) as well as non specialists (everyone else: the general public, students and educators, workforce). Specialists should find out and use new data science research methods to be able to improve our comprehension of the Universe. Non-specialists call for info literacy abilities as productive members of 21st century workforce, combining foundational abilities for lifelong learning in a realm frequently dominated by data.”
May 2009 Mike Driscoll writes in “The Three Sexy Skills of Data Geeks”: “…with the Age of Data upon us, those who can model, munge, and visually communicate data—call us statisticians or data geeks—are a hot commodity.” [Driscoll will follow up with The Seven Secrets of Successful Data Scientists in August 2010]
June 2009 Nathan Yau writes in “Rise of the Data Scientist”: “As we have many hear by today, Google’s chief economist Hal Varian commented within January which the following hot task in the next ten years will be statisticians. Clearly, I whole heartedly agree. Heck, I would go a step further and after that state they are hot now – physically and mentally. Nevertheless, in case you went on to check out the remainder of Varian’s job interview, you would understand this by statisticians, he really meant it as a basic title for a person who’s in a position to draw out info from big datasets and after that existing a thing of use to non data experts… [Ben] Fry… argues for a completely new area which fuses the abilities and abilities from usually disjoint aspects of expertise… [computer science; mathematics, stats, and also information mining; graphic design; human-computer interaction] and infovis. And after 2 many years of showcasing visualization on FlowingData, it appears collaborations between the fields are actually growing much more frequent, but even more important, computational info layout tips closer to simple fact. We are seeing& data scientists – individuals that could do it all – come out from the majority of the pack.”
February 2010 Kenneth Cukier writes in The Economist Special Report ”Data, Data Everywhere“: ”… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.”
June 2010 Mike Loukides writes in “What is Data Science?”: “Data researchers combine entrepreneurship with persistence, the willingness to create information products incrementally, the capability to explore, as well as the capability to iterate more than a solution. They’re inherently interdisciplinary. They are able to tackle all elements of a dilemma, from first data collection and information conditioning to pulling conclusions. They could think outside the package to think of new methods to see the issue, or perhaps to operate with extremely broadly identified problems :’ here’s a great deal of information, what will you create from it?”
September 2010 Hilary Mason and Chris Wiggins write in “A Taxonomy of Data Science”: “…we thought it would be useful to propose one possible taxonomy… of what a data scientist does, in roughly chronological order: Obtain, Scrub, Explore, Model, and iNterpret…. Data science is clearly a blend of the hackers’ arts… statistics and machine learning… and the expertise in mathematics and the domain of the data for the analysis to be interpretable… It requires creative decisions and open-mindedness in a scientific context.”
September 2010 Drew Conway writes in “The Data Science Venn Diagram”: “…one must know a great deal as they aspire to be a completely competent data scientist. Unfortunately, merely enumerating texts and tutorials doesn’t untangle the knots. Consequently, in an attempt to simplify the conversation, as well as add my own feelings to what’s currently a crowded market of suggestions, I show the data Science Venn Diagram… hacking abilities, mathematics along with stats expertise, and substantive expertise.”
May 2011 Pete Warden writes in “Why the term ‘data science’ is flawed but useful”: “There is no commonly accepted boundary for what is outside and inside of information science’s scope. Could it be simply a faddish rebranding of stats? I do not think so, though I also do not have a complete definition. I feel that the latest abundance of information has sparked something interesting in the planet, and once I look around I get folks with shared characteristics that do not fit into conventional types. These folks are likely to work outside of the narrow specialties that rule the institutional and corporate planet, handling everything from discovering the information, processing it at scope, imagining it and composing it up to be a story. Additionally, they appear to begin by looking at what the information are able to tell them, and next choosing fascinating threads to follow, instead of the conventional scientist ‘s strategy of picking the issue then and first searching for information to shed lighting on it.”
May 2011 David Smith writes in “’Data Science’: What’s in a name?”: “The terms’ Data Science’ and’ Data Scientist’ have just been in usage that is common for a little bit over a season, though they have truly removed from since then: a lot of businesses are currently employing for’ data scientists’, as well as whole conferences are actually operated under the title of’ data science’. But regardless of the prevalent adoption, several have resisted the shift from the more conventional words and phrases like’ statistician’ or’ quant’ or’ data analyst’…. I think’ Data Science’ much better describes what we really do: a blend of computer hacking, information evaluation, and nightmare solving.”
June 2011 Matthew J. Graham talks at the Astrostatistics and Data Mining in Large Astronomical Databases workshop about “The Art of Data Science” (PDF). He says: “To flourish in the new data-intensive environment of 21st century science, we need to evolve new skills… We need to understand what rules [data] obeys, how it is symbolized and communicated and what its relationship to physical space and time is.”
September 2011 Harlan Harris writes in “Data Science, Moore’s Law, and Moneyball” : “‘Data Science’ is actually identified as what’ Data Scientists’ do. What Data Scientists do continues to be extremely well covered, and also it operates the gamut from munging and data collection, through application of stats as well as related techniques and machine learning, communication, to interpretation, as well as visualization of the outcomes. Who Data Scientists are might be the much more essential question… I are likely to like the concept Data Science is identified by the practitioners of its, that it is a career path instead of a class of activities. In the conversations of mine with folks, it appears that individuals that consider themselves Data Scientists normally have eclectic career paths, which might in several ways seem not to earn much sense.”
September 2011 D.J. Patil writes in “Building Data Science Teams”: “Starting within 2008, Jeff Hammerbacher (hackingdata) and also I sat right down to discuss our encounters creating the information and analytics groups at Linkedin. and Facebook In ways that are many, that conference was the beginning of information science as an unique professional specialization… we recognized that as our businesses increased, we each had to determine what to call the individuals on our teams.’ Business analyst’ looked very limiting.’ Data analyst’ would have been a contender, though we felt that title could restrict what individuals can do. All things considered, a lot of the individuals on our teams had heavy engineering expertise.’ Research scientist’ was a sensible job title used by organizations as IBM., Yahoo, Xerox, HP, and Sun Nevertheless, we believed that the majority of research scientists worked on tasks which were abstract and futuristic, and the job was carried out in labs which were remote as a result of the merchandise growth teams. It may take many years for lab research to influence key items, in case it ever did. Rather, the emphasis of our teams was working on information uses which would’ve a massive and immediate effect on the company. The phrase which looked to fit perfect was information scientist: those that make use of both information and science to produce something new. “
September 2012 Tom Davenport and D.J. Patil publish “Data Scientist: The Sexiest Job of the 21st Century” in the Harvard Business Review.
Source:Gil Press (forbes.com)