Deans Blog: Education Online, A Data Science Opportunity

In 2007 we started a company SciVee.tv to communicate science in the still early days of streaming video. Believing that YouTube would only be the realm of cat videos and that serious science would appear on SciVee.tv proved to be wrong. 500 hours of video, including scientific content, are uploaded to YouTube every minute and over 1 billion hours of video are watched each day. Notwithstanding, I would contest that SciVee had a vision and tools to support that vision. For example, Pubcasts and Postercasts integrated traditional scientific discourse, papers and posters, respectively, with the, then, new video medium to provide an experience that (we claimed) deepened the understanding of the material being presented. Fast forward 13 years.

All that we envisaged with SciVee.tv has either come to pass, or failed to find an audience. If we envision what comes next in online education, as we did 13 years ago, what is the vision today and what does it have to do with the School of Data Science? Questions particularly pertinent in the time of a pandemic when online video content is a part of the lives of all our stakeholders, whether they like it or not. 

As the educational output of an institution goes digital think about that output in two ways – the content itself and the usage of that content. How content is used takes us to the realm of precision education. Vendors dabble with this through dashboards and other display tools, but little drives change at the level of the individual scholar in providing a tailored educational experience. So much can be said about this as regards to pedagogy and how we learn, much of which is beyond my expertise and is the domain of our School of Education and Human Development and others. Here, on behalf of data science, I will focus on the content as data, as I believe this opens up a yet largely unexplored opportunity and hope it sparks a broader conversation regarding the educational opportunities.

A course is made up of a series of lectures. Each previously had digital components, online reading, Powerpoints, etc. and analog components, the lecturer’s handwritten notes and a series of in person presentations, given but not preserved. Now we are preserving everything in digital form. For educational institutions this is an opportunity that may define their future, or lack thereof. Stating the obvious, an online lecture is an asset owned by the lecturer or their institution that can be reused, repackaged, mixed and matched and more. Thinking of educational content this way – as a commodity to be repackaged and reused – will likely upset some, but it can also be seen as an opportunity to showcase what is best about what we do. If the fear is to showcase the worst, then, over time, online delivery will lead to an improvement in the quality of content and how we teach. It’s hard to argue with that.

At the risk of aggravating readers who are not data scientists even further, as a data scientist, we can think of an online lecture as data – a time series. One of many time series being produced every day, related by belonging to the same course, but with so much more potential for being explored and used. Potential that grows as the corpus grows. Let’s look at some of that potential through five simple use cases, each from a different perspective, to illustrate what I am getting at.

The instructors perspective: I am about to cover convolutional neural networks (CNNs) {your topic here} in the machine learning course I am developing. It used to be I would work from one or more textbooks, or if one did not exist, write the book based on the lectures I was giving and use that in later years. Now that activity is supplemented by looking online for high quality video support material that is pertinent and drives the key points home. I can spend a lot of time doing that. How do I know I am getting quality material until I have spent time looking at it? Moreover, none of that maps to what my colleagues, teaching the same students, may have already touched upon. The result, a lot of time spent and potentially sub-standard and repetitive material being recommended. What if the content from my colleagues was indexed and I could quickly hone in on a review of what they have already taught? That could be done at different levels of granularity from the lecture to finer elements of what was delivered. Without getting too technical, a simple indexing of individual lectures applied consistently across the enterprise would be a great asset to the instructor. More specifically, an index of the close captioned content would allow the lecturer to hone in on relevant material – in minutes x-to-y of my colleagues lecture z she mentions CNN’s xx times, let me take a look.

The student’s perspective: Just as the instructor can enrich their own teaching through easily accessible indexed content (their own or that of others), so can the student enrich their understanding. Review of delivered material, either current or from past semesters, is easily found and reviewed. That the School of Data Science has created an index of all their lecture material is great. That they also allow students to confidentially rate that material leads to a ranking in a referral system. Something that becomes important as the online corpus builds and there are multiple instructors and multiple versions of, for example, CNN course material.

The program director’s perspective: As the director of, say, the Master’s in Data Science program, the analysis of the program in digital form offers a more informed view of what is being taught beyond what might occur by reviewing syllabi and talking to the instructor. There is a wealth of natural language processing approaches that could be applied to the voice to text of a complete course as well as the controlled vocabulary put in place to describe course content. Such tools can assist the program director in determining over coverage, undercoverage, no coverage (by comparison to other similar programs).

The educator’s perspective: Utilizing the complete corpus of all that is spoken in the “classroom” for what is spoken is one thing. Analyzing that same corpus for how it is spoken is another. Analyses such as sentiment analysis, degree of interaction in the online classroom, student retention and performance inform pedagogy. Students are already telling us we need to make adjustments as we settle into an online first academic environment. Let us be better informed about what adjustments to make.

The administrator’s perspective: The quality of education content and delivery goes a long way to defining the reputation of an institution. A reputation which builds as its successful alumni base builds. UVA has that strong reputation, built over 200 years, particularly the undergraduate experience for both education and student life. The School of Data Science is at the point of beginning to establish that reputation. Neither the old nor the new will be immune to the post-COVID world of education which will be different. Hybrid education models combining traditional classroom settings with online will play a larger part. Remote lifelong learners desiring more specialized remote online training is expected. In short, I predict that the quality and extent of the institution’s online content will play a much larger role in defining the quality and value of that institution. To not consider an institution’s online content as an asset to be carefully packaged and marketed will place an institution at a competitive disadvantage in a few short years from now.

At the School of Data Science we consider this as one example of eating our own dog food – operating on our own increasing corpus of data and subsequently fulfilling our mission – using data science for societal benefit. In this instance taking emerging digital online content and using data science to use it more effectively and to shape future content. Where to start?

Obviously we are not alone in this thinking, nor is this opportunity new, simply more urgent as a result of COVID and the stress on higher education pre-COVID and such stress on steroids post-COVID. Technologies have existed for sometime to support much of what we envisage in the form of learning management systems (LMS). LMS evolution has been hampered less by the technology that the cultural shift to take them to the next step in their evolution. In evolutionary parlance, COVID represents the Cambrian explosion. Fast adoption of new technologies and how we use educational materials being critical for survival. 

Such a vision is not without challenges. Challenges that must be considered in any data science outcome. Buy-in particularly by instructors, cultural change as regards reuse of material, the potential risk to privacy and the fragrant misuse, or at least unintentional consequences of such use. 

We must also identify past efforts that have not fulfilled their promise. I would suggest that while clunky technology might play a role in past failures, the main reason for failure is cultural. A sense of ownership of content, the content of a course itself, the preservation of a style of teaching etc. are all past impediments and reasons for failure. Then again we have never had an incentive like COVID before to drive a change in culture.

We must be vigilant and identify with what has come before as we explore this new frontier; but explore we must.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

About Phil Bourne

Stephenson Dean of the School of Data Science and Professor of Biomedical Engineering, University of Virginia