Research/Prototyping: Typed backend data-structures with SQL translation layer. #824

Open
opened 2026-05-07 17:57:31 +00:00 by f21marfo · 5 comments
Collaborator

Problem:

Current and proposed managing of back-end data structures is a little messy in defining, serializing and deserializing data-structures.

The proposed method of communicating SQL data-types is to include a string that contains the datatypes with the JSON.

sql_datatypes: database type info

An example of implementation in C# would be:

    new
     {
       sql_datatypes = "VARCHAR(20),TIMESTAMP,REAL",
       sensorId = "B-Building",
       timestamp = DateTime.UtcNow,
       temperature = Math.Round(rand.NextDouble() * 15 + 10, 2)
    }

There are a few issues with this, the most obvious being that the datatypes are only looslely tied to the variables, and a typo or improper ordering would obviously cause issues downstream. It also creates the problem of the recieving component/service being required to know what to do with the string. But there is a way to solve the the weaknesses of the solution.

The improvement: Reflection-based SQL mapping.
(Note: None of these represents completed solutions and are just illustrative concepts, final implementation may differ).

In a nutshell, this involves using C# class reflection (a method by which objects and their attributes can be analyzed, or even modify behavior at runtime) in order to map variables to correct SQL datatypes. This can either be done by converting the variable types directly. For example with:

public static string ToSqlType(Type t)
{
    if (t == typeof(string))
        return "VARCHAR";

    if (t == typeof(int))
        return "INT";

    if (t == typeof(DateTime))
        return "TIMESTAMP";

    if (t == typeof(double) || t == typeof(float))
        return "REAL";

    throw new NotSupportedException(t.Name);
}

or alternatively, be explicitly declared in the data class itself, using custom attributes, for example:

public class TempData
{
    [SqlType("VARCHAR(20)")]
    public string sensorId { get; set; }

    [SqlType("TIMESTAMP")]
    public DateTime timestamp { get; set; }

    [SqlType("REAL")]
    public double temperature { get; set; }
}

In order to practically use this conversion system, we can replace the sql_datatypes property with a Type property, which simply records the Class Name upon its creation, allowing for the ability to retrieve the relevant meta-data, directly from the authorative class-type itself. This eliminates the need of splitting and manipulating a messy datatypes-string.

Example of the class hierarchy and Type-discriminator:

  public abstract class SensorData
  {
    public string Type { get; }
    public SensorData()
    {
      Type = GetType().Name;
    }
  }

  public class TempData : SensorData
  {
    public string? sensorId { get; }
    public DateTime timestamp { get; }
    public double temperature { get; }

    public TempData(string id, DateTime time, double temp) : base()
    {
      sensorId = id;
      timestamp = time;
      temperature = temp;
    }
  }

In this example, the implementing class TempData invokes the base-class constructor upon instantiation, setting the value of the variable Type to TempData.

Reflection

I've been talking about reflection, but as of yet I haven't clearly explained how it actually works, here's a quick prototype program, to demonstrate the concept of what kind of data can be produced. In this case I am using the approach with a custom attribute called, as shown earlier, I.E. [SqlType("VARCHAR(20)")]

using System;
using System.Reflection;
using System.Reflection.Metadata;

class Program
{
    static void Main()
    {
        // Create an instance of TempData
        var sampleData = new TempData("Sensor-001", DateTime.UtcNow, 18.4);

        // Get the type of the object
        Type type = sampleData.GetType();

        // Iterate over all public instance properties
        foreach (PropertyInfo prop in type.GetProperties(BindingFlags.Public | BindingFlags.Instance))
        {
            // Get property name and value
            object value = prop.GetValue(sampleData);
            Console.WriteLine($"Property: {prop.Name}, Value: {value}");

            // Get custom attributes of type SqlType
            var attributes = prop.GetCustomAttributes(typeof(SqlType), false);
            foreach (SqlType attr in attributes)
            {
                Console.WriteLine($"  -> Attribute Description: {attr.GetName()}");
            }
        }
    }
}

This test program produces the following output.
image

Alternatively, we can also get all the metadata alone, without ever creating an instane of TempData, by just calling typeof(TempData) as seen in this modified version of the program below.

using System;
using System.Reflection;
using System.Reflection.Metadata;

class Program
{
    static void Main()
    {
        // Iterate over all public instance properties
        foreach (PropertyInfo prop in typeof(TempData).GetProperties(BindingFlags.Public | BindingFlags.Instance))
        {
            // Get property name
            Console.WriteLine($"Property: {prop.Name}");

            // Get custom attributes of type SqlType
            var attributes = prop.GetCustomAttributes(typeof(SqlType), false);
            foreach (SqlType attr in attributes)
            {
                Console.WriteLine($"  -> Attribute Description: {attr.GetName()}");
            }
        }
    }
}

Which results in the following output:
image

As one can see, the class contains all the data and metadata required, in order to construct a full SQL queries. The previously mentioned Type discriminator variable also creates a class reference that survives serialization and deserialization, which effectively eliminates the need to encode the metadata directly into the JSON, as with the sql_datatypes property approach

The actual SQL translation layer, could either be written as a standalone module that the middleware can refer to or integrated directly into the middleware itself, as a new set of helper functions, depending on what design approach would be preferrrable.

This is far from a complete and finalized solution, and there are details that need to be worked out, But i hope I have presented enough information for it to be possible to decide if this at all seems like a desirable path to take.

Thank you for your consideration!

**Problem:** Current and proposed managing of back-end data structures is a little messy in defining, serializing and deserializing data-structures. The proposed method of communicating SQL data-types is to include a string that contains the datatypes with the JSON. > sql_datatypes: database type info An example of implementation in C# would be: ``` new { sql_datatypes = "VARCHAR(20),TIMESTAMP,REAL", sensorId = "B-Building", timestamp = DateTime.UtcNow, temperature = Math.Round(rand.NextDouble() * 15 + 10, 2) } ``` There are a few issues with this, the most obvious being that the datatypes are only looslely tied to the variables, and a typo or improper ordering would obviously cause issues downstream. It also creates the problem of the recieving component/service being required to know what to do with the string. But there is a way to solve the the weaknesses of the solution. **The improvement: Reflection-based SQL mapping.** (Note: None of these represents completed solutions and are just illustrative concepts, final implementation may differ). In a nutshell, this involves using C# class reflection (a method by which objects and their attributes can be analyzed, or even modify behavior at runtime) in order to map variables to correct SQL datatypes. This can either be done by converting the variable types directly. For example with: ``` public static string ToSqlType(Type t) { if (t == typeof(string)) return "VARCHAR"; if (t == typeof(int)) return "INT"; if (t == typeof(DateTime)) return "TIMESTAMP"; if (t == typeof(double) || t == typeof(float)) return "REAL"; throw new NotSupportedException(t.Name); } ``` or alternatively, be explicitly declared in the data class itself, using _custom attributes_, for example: ``` public class TempData { [SqlType("VARCHAR(20)")] public string sensorId { get; set; } [SqlType("TIMESTAMP")] public DateTime timestamp { get; set; } [SqlType("REAL")] public double temperature { get; set; } } ``` In order to practically use this conversion system, we can replace the sql_datatypes property with a Type property, which simply records the Class Name upon its creation, allowing for the ability to retrieve the relevant meta-data, directly from the authorative class-type itself. This eliminates the need of splitting and manipulating a messy datatypes-string. Example of the class hierarchy and Type-discriminator: ``` public abstract class SensorData { public string Type { get; } public SensorData() { Type = GetType().Name; } } public class TempData : SensorData { public string? sensorId { get; } public DateTime timestamp { get; } public double temperature { get; } public TempData(string id, DateTime time, double temp) : base() { sensorId = id; timestamp = time; temperature = temp; } } ``` In this example, the implementing class TempData invokes the base-class constructor upon instantiation, setting the value of the variable _Type_ to _TempData_. **Reflection** I've been talking about reflection, but as of yet I haven't clearly explained how it actually works, here's a quick prototype program, to demonstrate the concept of what kind of data can be produced. In this case I am using the approach with a custom attribute called, as shown earlier, I.E. `[SqlType("VARCHAR(20)")]` ``` using System; using System.Reflection; using System.Reflection.Metadata; class Program { static void Main() { // Create an instance of TempData var sampleData = new TempData("Sensor-001", DateTime.UtcNow, 18.4); // Get the type of the object Type type = sampleData.GetType(); // Iterate over all public instance properties foreach (PropertyInfo prop in type.GetProperties(BindingFlags.Public | BindingFlags.Instance)) { // Get property name and value object value = prop.GetValue(sampleData); Console.WriteLine($"Property: {prop.Name}, Value: {value}"); // Get custom attributes of type SqlType var attributes = prop.GetCustomAttributes(typeof(SqlType), false); foreach (SqlType attr in attributes) { Console.WriteLine($" -> Attribute Description: {attr.GetName()}"); } } } } ``` This test program produces the following output. ![image](/attachments/d389ba12-0559-469d-b244-caf4c7cef2d2) Alternatively, we can also get all the metadata alone, without ever creating an instane of TempData, by just calling typeof(TempData) as seen in this modified version of the program below. ``` using System; using System.Reflection; using System.Reflection.Metadata; class Program { static void Main() { // Iterate over all public instance properties foreach (PropertyInfo prop in typeof(TempData).GetProperties(BindingFlags.Public | BindingFlags.Instance)) { // Get property name Console.WriteLine($"Property: {prop.Name}"); // Get custom attributes of type SqlType var attributes = prop.GetCustomAttributes(typeof(SqlType), false); foreach (SqlType attr in attributes) { Console.WriteLine($" -> Attribute Description: {attr.GetName()}"); } } } } ``` Which results in the following output: ![image](/attachments/2a214228-7d79-4d47-9eb3-5ddd71a9b263) As one can see, the class contains all the data and metadata required, in order to construct a full SQL queries. The previously mentioned Type discriminator variable also creates a class reference that survives serialization and deserialization, which effectively eliminates the need to encode the metadata directly into the JSON, as with the _sql_datatypes_ property approach The actual SQL translation layer, could either be written as a standalone module that the middleware can refer to or integrated directly into the middleware itself, as a new set of helper functions, depending on what design approach would be preferrrable. This is far from a complete and finalized solution, and there are details that need to be worked out, But i hope I have presented enough information for it to be possible to decide if this at all seems like a desirable path to take. Thank you for your consideration!
Author
Collaborator

Here's the complete test-project, that I ran above.

Here's the complete test-project, that I ran above.
Collaborator

This idée is good imo and the code is sound. We could probably extend this to a very generic class that can handle any backend API or sensor that might exist in the future. The only parts that might create a minor problem that i can see is the reader handling but the json object structure should be able to be made similar to what we have now so it shouldn't be any issue.

The only thing i would want with this if possible, if we go through with this refactoring, is to ensure that we create ONE generic parent class that generators call on to create their sensors or data generators. This to avoid stalling everyone once again needing to change major folder structure to each generator and reader. If we can make it so we just need to make modification to the generator program.cs and maybe the middleware.

This is not a end all be all if we have to do major refactoring if we go with this change that will just have to be what we do, but if possible lets try to make as few changes to implement this as possible to force it being as generic and usable as possible. It'll be extra work but would make creating future sensor or api backends easier simplifying workflow

This idée is good imo and the code is sound. We could probably extend this to a very generic class that can handle any backend API or sensor that might exist in the future. The only parts that might create a minor problem that i can see is the reader handling but the json object structure should be able to be made similar to what we have now so it shouldn't be any issue. The only thing i would want with this if possible, if we go through with this refactoring, is to ensure that we create ONE generic parent class that generators call on to create their sensors or data generators. This to avoid stalling everyone once again needing to change major folder structure to each generator and reader. If we can make it so we just need to make modification to the generator program.cs and maybe the middleware. This is not a end all be all if we have to do major refactoring if we go with this change that will just have to be what we do, but if possible lets try to make as few changes to implement this as possible to force it being as generic and usable as possible. It'll be extra work but would make creating future sensor or api backends easier simplifying workflow
Author
Collaborator

The readers should be minimally affected in an eventual refactor, as far as i can see. They already have their own datatype classes, which would simply be switched over to globally acessible datatype classes of the same basic structure, and the proposed Type-discriminator variable would simply replace the sql_datatypes variable.

Though I'm a little but uncertain on exactly what you mean with "ONE generic parent class that generators call on to create their sensors or data generators."? Because this proposal focuses specifically on how the data itself is processed and how its integrity can be ensured throughout the backend of the system.

For example. instead of:

  var data = new[]
  {
    new
    {
      sql_datatypes = "VARCHAR(15), TIMESTAMP, REAL",
      sensorid = "elec-sensor-001",
      timestamp = DateTime.UtcNow,
      kilowatt = Math.Round(rand.NextDouble() * 15 + 10, 2)
    },
   ....

The data being generated could look like something like this.

  var data = new[]
  {
    new EnergyData
    (
        "elec-sensor-001", 
        DateTime.UtcNow, 
        Math.Round(rand.NextDouble() * 15 + 10, 2)
    ),
    ....

This way it is strictly typed and we can guarantee that what goes in, comes out at the other end. Additionally, the Type-discriminator(which replaces sql_datatypes) never actually has to be set, as that is an internal function of the base-class constructor and occurs automatically. :)

**The readers** should be minimally affected in an eventual refactor, as far as i can see. They already have their own datatype classes, which would simply be switched over to globally acessible datatype classes of the same basic structure, and the proposed Type-discriminator variable would simply replace the sql_datatypes variable. Though I'm a little but uncertain on exactly what you mean with "ONE generic parent class that generators call on to create their sensors or data generators."? Because this proposal focuses specifically on how the data itself is processed and how its integrity can be ensured throughout the backend of the system. For example. instead of: ``` var data = new[] { new { sql_datatypes = "VARCHAR(15), TIMESTAMP, REAL", sensorid = "elec-sensor-001", timestamp = DateTime.UtcNow, kilowatt = Math.Round(rand.NextDouble() * 15 + 10, 2) }, .... ``` The data being generated could look like something like this. ``` var data = new[] { new EnergyData ( "elec-sensor-001", DateTime.UtcNow, Math.Round(rand.NextDouble() * 15 + 10, 2) ), .... ``` This way it is strictly typed and we can guarantee that what goes in, comes out at the other end. Additionally, the Type-discriminator(which replaces sql_datatypes) never actually has to be set, as that is an internal function of the base-class constructor and occurs automatically. :)
Collaborator

Good Insight and research! I definitely agree that the current implementation is flawed and just from reading the research this sound better. The fought i have is that with limited time in the project we need some stability to move forward and some things just get to be "Good enough" as this can effect other parts.

We can continue having a discussion but now i believe we document this and create a issue for coming years where we say how it is now and refer to this as how we would change it.

Good Insight and research! I definitely agree that the current implementation is flawed and just from reading the research this sound better. The fought i have is that with limited time in the project we need some stability to move forward and some things just get to be "Good enough" as this can effect other parts. We can continue having a discussion but now i believe we document this and create a issue for coming years where we say how it is now and refer to this as how we would change it.
Author
Collaborator

Final report:

In closing it should be noted that by default, the use of reflection does NOT guarantee ordering. This is especially problematic when inheritance is used, as the inherited properties are interleaved at random. The recommended approach to solve this would be adding an Order-Attribute to the data classes, as shown in the following example.

public abstract class Data
{
    [Order(0)] // Implementing classes MAY NOT use Order(0)!
    public string Type { get; }
    public Data()
    {
        // Note: This must be the FULL qualified name in order to be able to retrieve metadata and rebuild the original file.
        Type = GetType().FullName;
    }
}

public class DataTemp : Data
{
    [Order(1)]
    [SqlType("VARCHAR(20)")]
    public string sensorId { get; }

    [Order(2)]
    [SqlType("TIMESTAMP")]
    public DateTime timestamp { get; }

    [Order(3)]
    [SqlType("REAL")]
    public double temperature { get; }

    public DataTemp(string id, DateTime time, double temp) : base() //Alternatively an init-setter approach could be used.
    {
        // Properties are immutable and can only be set upon creation, a 
        sensorId = id;
        timestamp = time;
        temperature = temp;
    }
}

It is then possible, to use that attribute to sort and iterate over all the class properties in their intended order. This is essential for accurately constructing future SQL queries from the datatypes. This can be done by way of the following lambda .OrderBy(p => p.GetCustomAttribute<OrderAttribute>()?.Order ?? int.MaxValue) As seen below in the example of a wrapper-method.

    static void ProcessData(Data inputData, Action<PropertyInfo, Data> action)
    {
        Type type = inputData.GetType();

        foreach (PropertyInfo prop in type.GetProperties(BindingFlags.Public | BindingFlags.Instance).OrderBy(p => p.GetCustomAttribute<OrderAttribute>()?.Order ?? int.MaxValue)) //Can also use typeof(TempData) to avoid instantiation.
        {
            //Execute any desired method here.
            action(prop, inputData);
        }
    }

In the interest of completeness, here is also an example of how to use the above method.

   static void PrintDataInfo(PropertyInfo prop, Data sampleData)
    {
        // Get property name and value
        object value = prop.GetValue(sampleData);
        Console.WriteLine($"Property: {prop.Name}, Value: {value}");

        // Get custom attributes of type SqlType
        var attributes = prop.GetCustomAttributes(typeof(SqlType), false);
        foreach (SqlType attr in attributes)
        {
            Console.WriteLine($"  -> Attribute Description: {attr.Name}");
        }
    }

The two methods can then be used together like this.

        // Get the type of the object
        Type type = sampleData.GetType();

        // Iterate over all public instance properties
        ProcessData(sampleData, (x, y) => PrintDataInfo(x, y));

The above code will then give the following output:
image

Also including an updated Demo/Prototype project that showcases how an SQL query string can be derived from the metadata.
image

Hopefully this information will prove useful to future teams, Good Luck!

**Final report:** In closing it should be noted that by default, the use of reflection does NOT guarantee ordering. This is especially problematic when inheritance is used, as the inherited properties are interleaved at random. The recommended approach to solve this would be adding an Order-Attribute to the data classes, as shown in the following example. ``` public abstract class Data { [Order(0)] // Implementing classes MAY NOT use Order(0)! public string Type { get; } public Data() { // Note: This must be the FULL qualified name in order to be able to retrieve metadata and rebuild the original file. Type = GetType().FullName; } } public class DataTemp : Data { [Order(1)] [SqlType("VARCHAR(20)")] public string sensorId { get; } [Order(2)] [SqlType("TIMESTAMP")] public DateTime timestamp { get; } [Order(3)] [SqlType("REAL")] public double temperature { get; } public DataTemp(string id, DateTime time, double temp) : base() //Alternatively an init-setter approach could be used. { // Properties are immutable and can only be set upon creation, a sensorId = id; timestamp = time; temperature = temp; } } ``` It is then possible, to use that attribute to sort and iterate over all the class properties in their intended order. This is essential for accurately constructing future SQL queries from the datatypes. This can be done by way of the following lambda `.OrderBy(p => p.GetCustomAttribute<OrderAttribute>()?.Order ?? int.MaxValue)` As seen below in the example of a wrapper-method. ``` static void ProcessData(Data inputData, Action<PropertyInfo, Data> action) { Type type = inputData.GetType(); foreach (PropertyInfo prop in type.GetProperties(BindingFlags.Public | BindingFlags.Instance).OrderBy(p => p.GetCustomAttribute<OrderAttribute>()?.Order ?? int.MaxValue)) //Can also use typeof(TempData) to avoid instantiation. { //Execute any desired method here. action(prop, inputData); } } ``` In the interest of completeness, here is also an example of how to use the above method. ``` static void PrintDataInfo(PropertyInfo prop, Data sampleData) { // Get property name and value object value = prop.GetValue(sampleData); Console.WriteLine($"Property: {prop.Name}, Value: {value}"); // Get custom attributes of type SqlType var attributes = prop.GetCustomAttributes(typeof(SqlType), false); foreach (SqlType attr in attributes) { Console.WriteLine($" -> Attribute Description: {attr.Name}"); } } ``` The two methods can then be used together like this. ``` // Get the type of the object Type type = sampleData.GetType(); // Iterate over all public instance properties ProcessData(sampleData, (x, y) => PrintDataInfo(x, y)); ``` The above code will then give the following output: ![image](/attachments/f1a0ec0e-63c0-4c14-9a79-7f0fca68b65a) Also including an updated Demo/Prototype project that showcases how an SQL query string can be derived from the metadata. ![image](/attachments/4b64b260-f435-460b-8dba-41dd82d5d4b1) Hopefully this information will prove useful to future teams, Good Luck!
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Andras/BoundlessFlowCampus2K#824
No description provided.