Precision Medicine is an approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person (Adams & Petersen, 2016). Precision medicine focuses on individualized treatment across the continuum of care and can be componentized into 3 specific care categories relative to the timing of care administration (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018):

  • Disease prevention: Prediction of disease risk before the disease symptoms manifest
  • Differential diagnosis: Timely/Instantaneous identification of an illness
  • Disease treatment: Strategies to cure or optimally treat once disease has been identified

In the context of disease prevention, the goal of precision medicine is accurate prediction of disease risk – even years in advance. This would need powerful predictive models that can leverage modifiable risk factors such as dietary habits, lifestyle attributes such as physical activity and sleeping patterns, environmental factors such as the amount of exposure to pollution and radiation and not just bio-static markers such as genes, ethnicity and gender. There is a gap in the current understanding of the impact of such non-genomic and non-bio-static modifiable risk factors on the accurate predictability of disease risk even before the disease symptoms manifest (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018).

With regards to differential diagnosis, the goal of precision medicine is to identify an illness or disease instantaneously where the timeframe for care is reduced to a matter of days or hours. This would need precision models that can not only ingest biogenetic markers and markers from rapid high-sensitivity tests but also rapidly correlate such captured markers with disease characteristics or phenotypes. The current challenge is that the descriptions of disease phenotypes are typically sloppy or imprecise and often fail to capture the diverse manifestations of common diseases or to define subclasses of those diseases that predict the outcome or response to treatment (Delude, 2015; Robinson, 2012). While there is a need to fully describe the phenotypes for diseases and their subclasses, there is also a need to research the correlations between such phenotypes and the biogenetic and rapid high-sensitivity tests markers (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018).

For disease treatment, the goal of precision medicine is to identify the most optimal treatment for the uncovered disease. Precision models are needed that would not only leverage patient’s genomic markers but also consider the markers from the patient’s metabolic profile such as glucose level, electrolyte levels, and fluid levels, drug exposures and treatment attributes such as medicines, diet, physical activity and predict the most optimal treatment options. However, the current treatments are derived mostly based on the knowledge of outcomes from the clinical trials conducted on patient populations (Schork, 2015). There is a need to study the predictability of a treatment based on not only the genomic markers of a patient but also markers from the patient’s metabolic profile, drug exposures and treatment attributes such as medicines, diet, and physical activity (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018).

The Research Problem

Although Precision Medicine aims at prevention, diagnosis and treatment, the main efforts have been centered around precision genomics, specifically pharmacogenomics, and the delivery of drugs based on patients’ specific genetic markers, the results of which have been underwhelming and have not delivered on its promises so far because the predictive ability and ensuing clinical utility of risk assessment from genetic variations has been found to be modest for many diseases (Krier, Barfield, Green RC, & Kraft, 2016; Paternoster, Tilling, & Davey Smith, 2017). There is a gap in the current understanding of the impact of non-genomic and non-bio-static factors such as dietary habits, lifestyle attributes and environmental factors on disease prevention (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018), a gap in the current understanding of correlations between well described phenotypes for diseases and the biogenetic and rapid high-sensitivity tests markers in the context of differential diagnosis (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018) and a gap in the current understanding of the predictability of a treatment based on not only the genomic markers of a patient but also markers from the patient’s metabolic profile, drug exposures and treatment attributes such as medicines, diet, physical activity (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018)

The Research Questions

The following are the key research questions pertaining to the research problem as stated above:

  1. Do non-genomic and modifiable factors of an individual, such as dietary habits, lifestyle attributes such as physical activity and sleeping patterns, environmental factors such as the amount of exposure to pollution and radiation, impact that person risking a disease?
  2. Is there a correlation between a patient’s biogenetic and rapid high-sensitivity tests markers and the well described phenotypes for diseases?
  3. Do genomic markers and the markers from the patient’s metabolic profile such as glucose level, electrolyte levels and fluid levels, drug exposures and treatment attributes such as medicines, diet, physical activity, impact the optimal treatment for that patient?

The Data and Analysis Requirements

Several sources of data are needed to conduct a comprehensive analysis. The following types of data about patients will be gathered by personal interviews and from various sources of digital data. All such data should be captured by strictly adhering to informed consent and patients approvals. Every patient included in the data collection would need to be provided with exact details of the purpose of the research, the types of information that would be captured for hypotheses testing and how such information will be anonymized and kept private and confidential.

  • Data pertaining to an individual such as behavioral, demographics, genetics, health, lifestyle etc. from sources such as EHR, mobile apps, social media posts, wearable devices, personal interviews, questionnaires etc.
  • Data pertaining to relationships such as family data, school data, work data, other activity data etc. from sources such as family history, school records, employment records, social media posts etc.
  • Data pertaining to community such as hospitals, libraries, neighborhood, parks etc. from sources such as area deprivation index, crime rates, food deserts, green areas, income, pollution levels etc.
  • Data pertaining to society such as social security, unemployment wages etc. from sources such as city councils, welfare services, state services etc.

Essentially, a database, capturing the ‘health avatar’ for each patient encompassing 360 degree view of an individual’s comprehensive health information, not just genetic, genomic information, will need to be built (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018).

However, data integration across multiple domains and sources will be a challenging task due to at least three known reasons (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018):

  1. The heterogeneity in the syntax of the data such as the different file formats and access protocols used
  2. Multiple schema or data structures
  3. The different or ambiguous semantics (e.g. meanings or interpretations)

As such, substantial effort is required to link and map different sources due to lack of clear semantic definitions of variables, measures, and constructs. One suggested solution is to adopt semantic interoperability, which allows exchange of data with unambiguous, shared meaning. Semantic interoperability is enabled through the use of ontologies. ontology formally and computationally represents a domain of knowledge, represented by a standardized and controlled vocabulary for describing data elements and the relationships between the elements. Many biomedical ontologies are already available and widely used in medicine, e.g. the International Classification of Diseases (ICD) or the Systematized Nomenclature of Medicine and Clinical Terms (SNOMED CT). An ontology-driven data integration framework can definitely be used to represent metadata, create global concept maps, automate data quality checks, and support high-level semantic queries. An ontology-driven framework can be applied at different levels of data interoperability (Mattia Prosperi, Jae S. Min, Jiang Bian and François Modave, 2018):

  • The data level, integrating across EHR systems and other data sources
  • The concept level, mapping terminologies and ontologies
  • The study design level, enabling standard operating procedures and reproducibility on other sources
  • The inference level, identifying proper statistical learning methods upon study design, scaling analyses on high performance computing, and building up models

Qualitative statistical methods such as multi-variate analysis, chi-square hypothesis testing and quantitative statistical methods such as correlational analysis will need to be used. There may also be a need to employ an operational research type of analysis such as combinatorial optimization to explore the best or optimal combination of highly correlative independent variables and the treatment characteristics resulting in the identification of the best treatment.