When I was a medical student at McGill University, one of my professors told me that the difference between a teaching hospital [i.e., associated with a University] versus a community hospital [which can deliver excellent care but is independent of an academic institution] is that in the teaching hospital, every patient should be part of a study. In other words, there is almost an unspoken agreement between patient and doctor in a university research hospital, that the patient will get the most up-to-date care but in return will in some way contribute to the education of individual doctors and/or the entire medical community.

In many cases, this patient contribution can simply be the data generated by day-to-day testing and treatment in the hospital. In other words, nothing’s changed in the way the patient is managed. Nevertheless, the success [or possibly failure] of the diagnostic process or treatment can still teach a great deal about medical care.

I want to be clear at this point that no study is ever done without the full and proper clearance of an institutional review board or IRB. The purpose of this board is to review every proposal for a research study and to make sure that patients’ rights and privacy are respected. Each patient must be formally asked whether their data can be used for various studies.

There is a distinction between types of studies. If I simply wish to review the lab results of patients after surgery, and these lab tests are done as part of the surgical regimen [not specifically for any research], then the IRB will require a less stringent consent from the patients. In this specific case, the data is generated in any case as part of the standard of care. There is no additional testing done as part of the research study. Understandably, this kind of research study is fundamentally different than one in which the patient must undergo a new and unproven procedure in order to study the effect on, for example, the patient’s cancer. In such a case, the patient will be extensively informed of the reason for the study, the possible benefits and side effects of the treatment and will have to sign a more detailed consent form.

The need for all of these review boards that validate the ethics of any proposed study, is directly related to the human experimentation done in World War II by the Nazies. Members of an IRB are fanatical, as they should be, about protecting patients from any questionable and/or inappropriate testing and treatments. Getting IRB clearance can be a long and drawn out process. But that is the acceptable price we all pay to make sure that no patient is mistreated for the purpose of studying disease.

The following article discusses a major problem in cancer research. Only a small percentage of cancer patients are formal members of a research trial. Well over 90% of cancer patients undergo extensive testing and receive considerable treatment without their data being formally studied (which also means that their data will be tracked and centralized for easier and more detailed review). The data recorded about these non-study patients may also be spread across doctor’s notes from across the world. Even if you limit yourself to the United States, there is still a tremendous amount of information that is sitting unused in oncologists’ offices. If the treating physicians, of the patient with cancer, are at least using an EMR, then it is easier to extract information in an automated fashion. But if the EMR is focused primarily on bureaucratic, administrative and financial data collection, much of the clinical information may be extremely difficult to review via a computerized system.

The two young people discussed in the article I linked to above, are trying desperately to create a mechanism for accessing as much of the data about cancer patients, as possible. Their hope is that this kind of data will add a whole new perspective to present cancer research. Basically, their hope is that by using advanced data analysis tools (that do not infringe patients’ rights), it will be possible to find trends and nuggets of clinical gold amongst the vast amounts of previously non-reviewed data about these patients.

Such data analysis can be very helpful even for the individual oncologist. For the thousands of patients that an oncologist will assess during their career, such computerized data review could be used to compare the individual oncologist’s success with that the general medical community. So an oncologist working in the community and not associated with a research hospital could still contribute to the general collection of data on cancer patients. In return, this particular oncologist could find out if his or her success with, for example, breast cancer is similar or at least close to the success rates seen in major cancer centers. The hope would be that community hospital oncologists would adjust their care if there was a significant discrepancy between their personal success rates and that of major centers.

Today, we have the ability to collect data 24 hours a day. With nothing more than a smart watch, there can be a constant feed of a patient’s pulse, temperature, blood pressure, oxygen levels in the blood and more, to a centralized database in the cloud. With the patient’s permission, this data could be added to their doctor’s EMR and then be used to study cancer treatments in a whole different way. For example, when patients are undergoing chemotherapy treatments, they are much more susceptible to infections. Knowing a patient’s temperature automatically, 24/7, may help identify infections before they become severe. While collecting such vital sign data can already generate overwhelming masses of information, the truth is that there really is no such thing as too much data. The key is to use or create new analysis tools that can intake these masses of data and output easy to understand graphs and tables that summarize the important correlations. Going back to our example of the cancer patient and measuring temperature, the hope is that advanced data analysis could automatically identify specific trends that are correlated with early infection. So the computer might find that a temperature above 38°C first thing in the morning is a key indicator to an oncoming infection. Only a computer with specialized algorithms can sift through all of the collected data to find such correlations. In time, the hope is that many more physiological measurements will be made 24/7. Eventually, it is hoped that with enough testing, it will be possible not only to better treat existing cancer but even predict an oncoming cancer before it causes any harm.

There are physicians and researchers who already feel overwhelmed by the amount of data being generated. Also, these professionals are concerned that data that is collected outside of a formal study will confuse the situation. A formal study has strict rules for how to collect the data and what specific data can be collected. This is NOT the case with non-study data.

But before ruling out any value from non-study data, one first has to analyze the non-structured data that is collected 24/7. Then, one can decide if this non-study non-structured data is more problematic than helpful. As data collection and analysis technologies and software tools become more widespread and easier to use, I believe that these concerns will fade. Instead, researchers will be desperately looking for new markers to track and then analyze in order to better understand cancer and every other disease. New things that we poorly understand and that are massive in quantity, are understandably frightening. But we need to overcome this fear and learn how to benefit from every technology and data point that are available. It is our hope and our prayer that such an approach will yield even greater and faster advances in the world of medicine.

Thanks for listening.