Michael L. Brodie
The Emerging Discipline of Data Science - Principles and Techniques for Data-Intensive Analysis
Abstract – The Scientific Revolution (1550-1700) led to the increasing significance, potential, and risks of empiricism – 17th Century knowledge discovery - that in turn led over 400 years to the Scientific Method – a body of principles and techniques for investigating phenomena, acquiring new knowledge, and correcting and integrating previous knowledge.
The Computing Revolution (1940-1970) led to the increasing significance, potential, and risks of software – 20th Century knowledge work – including the software crisis (1968) that in turn led over 40 years to Software Engineering - "a body of principles and techniques for the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software" (Wikipedia).
The Digital Revolution (1970-) with the emerging Digital Universe and Big Data Revolution (2000-) is leading to the significance, potential, and risks of data-intensive analysis – 21st Century knowledge discovery – that is leading to the need for Data Science – an emerging discipline currently in its infancy, analogous to the scientific method and software engineering in their revolutions. The importance of Data Science can be seen in the potential impact on the quality of lives of the US Government’s Precision Medicine Initiative for "Delivering the right treatments, at the right time, every time to the right person".
This exploratory talk examines Data Science from data analysis to data-intensive analysis. Data analysis with roots in Babylonia (1700-1200 BCE) and India (1200 BCE) is applied in most human endeavors following well-established principles, e.g., statistics, and guidelines, e.g., the Cross-Industry Standard Process for Data Mining. The roots of data-intensive analysis are in Big Data (~2000) that, just emerging, is opening the door to profound change – to new ways of thinking, problem solving, and processing that in turn bring new opportunities and challenges. Since 2007, this Fourth Paradigm of science is being applied to evidence/data-based analysis in most human endeavors.
The talk presents an emerging data-intensive analysis workflow that augments the previously dominant data analysis phase with an equally important and substantial data management phase and correspondingly augments the scope of Data Science. Through use cases we identify opportunities and challenges across the data-intensive analysis workflow and their requirements for principles and techniques to measure and improve the correctness, completeness, and efficiency of data-intensive analysis.
Biography – Michael L. Brodie has served as Chief Scientist of a Fortune 20 company, an Advisory Board member of leading national and international research organizations, and an invited speaker and lecturer. In his role as Chief Scientist Dr. Brodie has researched and analyzed challenges and opportunities in advanced technology, architecture, and methodologies for Information Technology strategies. He has guided advanced deployments of emergent technologies at industrial scale, most recently Cloud Computing and Big Data. In his Advisory Board roles Dr. Brodie addresses current and emergent strategic challenges and opportunities that are central to the charter and success of the organizations. As an invited speaker Dr. Brodie has presented compelling visions, challenges, and strategies for our emerging Digital Universe in over 100 keynote speeches in over 30 countries and in over 100 books and articles.
Throughout his career Dr. Brodie has been active in both advanced, academic research and large-scale industrial practice attempting to obtain mutual benefits from the industrial deployment of innovative technologies while helping research to understand industrial requirements and constraints. He has contributed to multi-disciplinary problem solving at scale in contexts such as Terrorism and Individual Privacy, and Information Technology Challenges in Healthcare Reform.
Dr. Brodie holds a Doctor of Science (honoris causa) from the National University of Ireland and a PhD in Databases from the University of Toronto. He is concerned with the Big Picture including business, economic, social, application, and technical aspects of information ecosystems, core technologies, and integration. He currently holds positions as research scientists at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT) as well as adjunct professor of the National University of Ireland, Galway.