A need for data management courses in higher education
Submitted by dylan on Thu, 2008-01-17 00:00.
Working with large piles of complex data can be a difficult task, even for seasoned experts. What happens when a non-specialist is tasked with collecting, managing, and ultimately warehousing large amounts of painstakingly collected data? What happens when multiple non-specialists are concurrently working on these data? How can a revision history be maintained when a small set of files are being passed around via email, or appended to a "master" document?
These are some of the first questions that come to mind when watching most people deal with data storage. Researchers and applied science technicians commonly collect and manage a lot of data. The resulting flurry of spreadsheets doesn't usually cause problems until an error is discovered, or when someone used an incorrect formula to compute a range of cells. These are common mistakes, with simple solutions: version control systems, a separation of data and computation, and constraints imposed by an RDBMS. Why then, are these strategies not actively pursued outside of computer science and mathematics?
It has everything to do with training. All students working toward a career in science are required to take several technical writing classes. Writing is an essential part of research, and any student lacking in this respect would be expected to improve their writing skills- or else. Concepts of data management would therefore be a natural extension to technical writing courses, especially for those interested in research or applied science. Although it might take some arm twisting, I think that the time spent in a well designed, quarter-long data management course would be time well spent for the majority of new students out there.
A data management class might consist of several data and example-driven modules:
The above outline represents about 15 minutes of thought, and is by no means comprehensive or suitably generalized. However, I think that with a small amount of training it would be possible to educate a critical mass of individuals on the finer points of managing data. Getting past corporate and government agency habits, which are in many cases propped up by hacked together Excel-Access-VBA applications, would take considerably more effort.