On November 18-19, the US National Institute of Standards and Technology (NIST) is going to be hosting an interesting Data Science Symposium that is focusing on benchmarking, measurement, reference datasets and related issues. Many of the goals of this symposium echo the ideas that have led NIST to play such a key role in advancing work in information retrieval through programs like TREC over the years.
Full information on the symposium is below.
Clifford Lynch
Director, CNI
_______________
NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY (NIST)
DATA SCIENCE SYMPOSIUM
www.nist.gov/itl/iad/data-science-symposium-2013.cfm
NOVEMBER 18-19, 2013
(CO-LOCATED WITH TREC, TAC)
· Registration for the inaugural NIST Data Science Symposium is now open!
· For those wishing to give presentations, participate as symposium panelists, or present posters at the symposium, NIST is accepting technical abstractsuntil Oct 4, 2013 (see details below).
SUMMARY:
Given the explosion of data production, storage capabilities, communications technologies, computational power, and supporting infrastructure, data science is now recognized as a highly-critical growth area with impact across many sectors including science, government, finance, health care, manufacturing, advertising, retail, and others. Since data science technologies are being leveraged to drive crucial decision making, it is of paramount importance to be able to measure the performance of these technologies and to correctly interpret their output. The NIST Information Technology Laboratory is forming a cross-cutting data science program focused on driving advancements in data science through system benchmarking and rigorous measurement science.
BACKGROUND:
A variety of tools and methods are emerging that process, analyze, and derive knowledge from large amounts of complex data in order to provide new insights that underpin key decisions. This has spawned the creation of Big Data technologies and an emerging data science discipline spanning new large-scale analytic tools and methods. Several approaches have emerged that combine many component technologies in multi-stage flows, which include machine-driven data transformation & processing, as well as human interactions and decision points. These approaches often lack the necessary measures for understanding: 1) the quality and context of the analyzed data, 2) the rigor of the analytic process and tools employed, 3) the impact of the human in the analytic process, and 4) the strength of the conclusions derived, questions answered, hypotheses tested, and discoveries made that emerge from the analytic process. The NIST Data Science program seeks to engage in benchmarking and the development of measurement methods to help advance the performance and efficiency (resource utilization, speed, etc.) of Big Data analytic components?-both independently and in the context of end to end systems and workflows.
SYMPOSIUM DESCRIPTION:
The inaugural NIST Data Science Symposium will convene a diverse multi-disciplinary community of stakeholders to promote the design, development, and adoption of novel measurement science in order to foster advances in Big Data processing, analytics, visualization, interaction, and lifecycle management. It is set apart from related symposia by our emphasis on advancing data science technologies through:
· Benchmarking of complex data-intensive analytic systems and subcomponents
· Developing general, extensible performance metrics and measurement methods
· Creating reference datasets & challenge problems grounded in rigorous measurement science
· Coordination of open, community-driven evaluations that focus on domains of general interest.
Why You Should Attend:
This event will be of interest to data science researchers, technologists, and data providers, as well as data science stakeholders in Industry, Government and Academia. The symposium will:
· Establish a broad multi-sector community of interest including researchers, end-users, and solution providers focused on advancing data science and Big Data technologies
· Contribute to the formulation of challenge problems to advance research and tools in data science
· Facilitate availability of reusable common reference datasets necessary to systematically compare approaches and measure performance improvements at all levels in Big Data analytic systems
· Foster advances in data science by formulating new measurement methods and benchmarks (e.g., accuracy, generalization, resource usage, cost, speed, etc.)
· Foster sharing of knowledge in a collaborative community-based forum with the goal of accelerating progress and eliminating gaps in data science methods and tools
REGISTRATION:
· Registration to attend the NIST Data Science Symposium is now open
· Registration is free, but it is necessary to register in order to attend
· The deadline for registration will be on or before Monday, November 11. Registration may close once the capacity of the venue is reached. Please note that only registered participants will be permitted to enter the NIST campus to attend the workshop.
To register, please go to: https://www-s.nist.gov/CRS/conf_disclosure.cfm?conf_id=6631
CALL FOR ABSTRACTS:
Participants who wish to give presentations of their technical perspectives or present posters (potentially with technical demonstrations) that address symposium topics should submit a brief one-page abstract and brief one-paragraph bio to datascience by October 4th, 2013. Submitters will be notified whether their perspectives have been selected for plenary or poster presentation by October 18th.
Speakers, panelists, and poster presenters will be selected by the organizers based on relevance to symposium objectives and workshop balance. Due to the technical nature of the workshop, no marketing will be permitted.
SYMPOSIUM TOPICS:
Below is a summary of the topics that will be addressed at the symposium. For a more complete list, please visit: http://www.nist.gov/itl/iad/data-science-symposium-2013.cfm
· Measurement methodologies, benchmarking, and common reference datasets needed to accelerate data science research and improve performance of Big Data analytic systems.
· Primary challenges in and technical approaches to complex workflow components of Big Data systems, including ETL, lifecycle management, analytics, visualization & human-system interaction.
· Generation of ground truth for large datasets and performance measurement with limited or no ground truth.
POINTS OF CONTACT:
Ashit Talukder (NIST/ITL; Chief, Information Access Division), Craig Greenberg (NIST/ITL)
In case of questions or if you would like to be added to our mailing list, please send email to datascience.