openGame Project Update Thread
Moderator: Moderators
- Admiral Ghat
- Board-General
-
- Posts: 106
- Joined: 26 Jan 2012 04:10
Re: openGame Project Update Thread
I think the general idea is that anyone would be able to host, without the GR quirks.
Re: openGame Project Update Thread
So quick update:
At some point last week, it really started irking me that I basically put so much effort into this to get it 95% working, and then haven't been around to finish the last mile, so to speak. The internals are actually reasonably impressive at this point - the time it takes for OG to pull a packet from one interface, handle it, and place it on another is about 3 ten-thousandths of a second - is a shame every week I let go by without finishing this. I did actually get on and vent a few times to no avail, fortunately my brother did manage to help me run a full trace from both ends (this was a better setup, as I could call him via phone and keep the traces clear of VoIP traffic).
So, it seems one of my hunches a good while ago was correct - the culprit seems to be fragmentation during encapsulation (causing a lot of problems: late packets, and even some packet loss). Discovering the issue is the hardest part - now I just need to test how effective various solutions are. The future steps to FINALLY releasing this:
1) confirm with my bro by sending deliberately oversize packets, and tracing from both endpoints
2) see if simply lowering the MTU fixes this (in most cases it would, but I have a small fear GC may use a hard-coded constant instead of actually pulling the MTU of the adapter)
3) I implemented header reconstruction a long time ago (allowing OG to simply send packets without headers, and put them back together again at the endpoint, so no oversize packets), but I disabled it as OG didn't need it at the time (and it was buggy, etc). This may ultimately be the long term fix, but, at the moment, only certain packets can be reconstructed (which is fine, as we can reconstruct data packets, the most important), and the implementation is buggy (and actually should cause a crash if the internal identifier isn't delivered, etc).
Speaking of, I need to fix the bug involving / improve general reliability of the internal identifier. The bug is common enough that Nox, etc has run into it, though the program usually doesn't hit it. Also, I need to make the program actually reannounce to trackers, but that is mega-easy.
Just an update,
-Stat
At some point last week, it really started irking me that I basically put so much effort into this to get it 95% working, and then haven't been around to finish the last mile, so to speak. The internals are actually reasonably impressive at this point - the time it takes for OG to pull a packet from one interface, handle it, and place it on another is about 3 ten-thousandths of a second - is a shame every week I let go by without finishing this. I did actually get on and vent a few times to no avail, fortunately my brother did manage to help me run a full trace from both ends (this was a better setup, as I could call him via phone and keep the traces clear of VoIP traffic).
So, it seems one of my hunches a good while ago was correct - the culprit seems to be fragmentation during encapsulation (causing a lot of problems: late packets, and even some packet loss). Discovering the issue is the hardest part - now I just need to test how effective various solutions are. The future steps to FINALLY releasing this:
1) confirm with my bro by sending deliberately oversize packets, and tracing from both endpoints
2) see if simply lowering the MTU fixes this (in most cases it would, but I have a small fear GC may use a hard-coded constant instead of actually pulling the MTU of the adapter)
3) I implemented header reconstruction a long time ago (allowing OG to simply send packets without headers, and put them back together again at the endpoint, so no oversize packets), but I disabled it as OG didn't need it at the time (and it was buggy, etc). This may ultimately be the long term fix, but, at the moment, only certain packets can be reconstructed (which is fine, as we can reconstruct data packets, the most important), and the implementation is buggy (and actually should cause a crash if the internal identifier isn't delivered, etc).
Speaking of, I need to fix the bug involving / improve general reliability of the internal identifier. The bug is common enough that Nox, etc has run into it, though the program usually doesn't hit it. Also, I need to make the program actually reannounce to trackers, but that is mega-easy.
Just an update,
-Stat
Another GC archive:
Re: openGame Project Update Thread
Latest updates sound really promising. Thanks, man :0)
Shout if you need to test!
Shout if you need to test!
Re: openGame Project Update Thread
Partially because I had become extremely busy (and the updates were taking about an hour to commit to repo, write, etc), and partially because I wasn't sure how many people were monitoring this thread. Forum activity seemed to wane a bit, so I moved on in private.
There have been massive speed and reliability improvements, amongst others - I will come back to do a proper update sometime later this week. I managed a test with 3 people (Ghat and Nox) a little while ago - the first time testing with more than 2 peers. We ran into a few bugs, one that was easily fixed during our session (misreading IPs from the initial tracker response), and another that required a bit of work (intelligent IP allocation in the event of collision).
I haven't finished the protocol for the second bug; fortunately I expected this would eventually occur and began thinking of a solution months ago. Currently, if you have 3 peers, but not all of them can 'see' each other, the program acts as if any failed peers don't exist, and assigns internal IP addresses likewise. Ideal behavior would have the program check with successful peers and coordinate internal IPs, and the next natural extension would have a 'flexible topology' (meaning that peers could talk to failed peers through successful peers).
I can update the github as well - I probably should be updating that regardless of whether or not anyone is viewing it.
Disregarding trouble with multiple peers and disconnections, GC still has trouble forming rooms over it, and at this point I am getting a bit discouraged. I have actually downloaded specialized point-to-point integrity testing tools, amongst other network tools, and the openGame tunnel passes the data without error, so I am looking more to the arcane now. I am starting to believe there may actually something non-standard with the way that GC handles connections, but yet we also know that network conditions affects this to some unknown degree. I am to the point where I am XORing the data to see if routers (the routers that form the internet, not your home router) may be tampering or discarding data, or if setting special packet flags helps me track down the problem.
More info on this later though, way too tired right now.
-Statrill
There have been massive speed and reliability improvements, amongst others - I will come back to do a proper update sometime later this week. I managed a test with 3 people (Ghat and Nox) a little while ago - the first time testing with more than 2 peers. We ran into a few bugs, one that was easily fixed during our session (misreading IPs from the initial tracker response), and another that required a bit of work (intelligent IP allocation in the event of collision).
I haven't finished the protocol for the second bug; fortunately I expected this would eventually occur and began thinking of a solution months ago. Currently, if you have 3 peers, but not all of them can 'see' each other, the program acts as if any failed peers don't exist, and assigns internal IP addresses likewise. Ideal behavior would have the program check with successful peers and coordinate internal IPs, and the next natural extension would have a 'flexible topology' (meaning that peers could talk to failed peers through successful peers).
I can update the github as well - I probably should be updating that regardless of whether or not anyone is viewing it.
Disregarding trouble with multiple peers and disconnections, GC still has trouble forming rooms over it, and at this point I am getting a bit discouraged. I have actually downloaded specialized point-to-point integrity testing tools, amongst other network tools, and the openGame tunnel passes the data without error, so I am looking more to the arcane now. I am starting to believe there may actually something non-standard with the way that GC handles connections, but yet we also know that network conditions affects this to some unknown degree. I am to the point where I am XORing the data to see if routers (the routers that form the internet, not your home router) may be tampering or discarding data, or if setting special packet flags helps me track down the problem.
More info on this later though, way too tired right now.
-Statrill
Another GC archive:
Re: openGame Project Update Thread
Maybe a dumb question, but wouldn't it be possible to somehow check how GR handles passing that data which you are having trouble with?
Re: openGame Project Update Thread
Okay, so this update is rather large, as I hadn't even been updating the Github versions when I would make updates (oops)!
I am starting to get much more confident about this project. The most important change of the below is saving hard logs to disk in order of execution. Details on why below!
-Program now saves logs, uses queues to ensure log order is preserved
-Randomly assigned signal socket is now within private ports allocated for loopback
-MAC address polled from adapter, rather than from watching network traffic
-Improvements to asyncIO
-File handle to adapter now opened with no-buffering flag set (need to investigate benefit of this)
-Improvements to thread-safety: all thread variables now use threading.local
-Streamlining in adapter read/write threads
-Fixed bug with indexing multiple returned peers from tracker
-Various small bugfixes
Many of the above are old changes that I just hadn't committed yet, like most of the threadsafety, asyncIO, etc. Most of the trouble I have been having for a while is that I don't actually know what is going on in the internals of the program when it runs. I put a 'verbose' alert setting in, but I leave it off as the terminal can only hold so much data (plus it is a pita to get data out of the terminal), and writing that much data to terminal actually slows the program down a lot. Using wireshark to grab traces hasn't been helpful as there is no built-in way to compare two traces afaik, and I am not even sure wireshark records them in order of their arrival, or in their packet number order, etc. The new logs let me actually look into the program, and, boy, was this a good idea.
For starts, I can test my out-of-order hypothesis easily - I simply wrote another program to strip the logs to just data, compare the orders, and easily highlight the ones that are wrong.
Additionally, I have learned something else: I tried to write the program to be as thread safe as possible, where everything is atomic, etc. But I additionally learned (from the logs) that, under high load, sometimes things can execute in the wrong order. The problem seems to stem from the only being able to execute in one thread at a time, even though it is multithreaded. If one thread takes longer than expected on a job, then things can become a bit mixed up as the interpreter abandons the thread to jump to another thread, etc. I think I may need to implement processes (instead of threads) on mission-critical components, and I can't wait to test this. This seems like the 'hardest' lead I have so far.
I have also switched focus a bit - I am working on getting upnp working (what a royal pita this is, story later), and I am preparing to start getting a GUI up. I am also planning to experiment with XORing data, and setting diffserv flags - these may or may not be helpful, but I can't wait to test! (XORing data could help with a strange case where some older routers tamper with data, and diffserv *MAY* help the wider internet as a whole handle our data packets better.
The upnp story: I am trying to use a package called miniupnpc. It is a very streamlined package, with low overhead, fast execution, etc. The only catch is you have to compile it yourself. No problem right?
Well, the code writers originally wrote (and tested, and wrote documentation on) it on linux, using python version 2.7, in 32 bit. (This may be giving them too much credit, they originally wrote it for C++, python is barely supported). And, of course, I am using python, version 3.4, in 64 bits on windows. I finally got it to compile, but the linker that makes the python extension wont have any of this nonsense. After banging my head against a wall for nearly 3 hours, I am on the forums trying to get some assistance. What a nightmare.
-Stat[/quote]
I am starting to get much more confident about this project. The most important change of the below is saving hard logs to disk in order of execution. Details on why below!
-Program now saves logs, uses queues to ensure log order is preserved
-Randomly assigned signal socket is now within private ports allocated for loopback
-MAC address polled from adapter, rather than from watching network traffic
-Improvements to asyncIO
-File handle to adapter now opened with no-buffering flag set (need to investigate benefit of this)
-Improvements to thread-safety: all thread variables now use threading.local
-Streamlining in adapter read/write threads
-Fixed bug with indexing multiple returned peers from tracker
-Various small bugfixes
Many of the above are old changes that I just hadn't committed yet, like most of the threadsafety, asyncIO, etc. Most of the trouble I have been having for a while is that I don't actually know what is going on in the internals of the program when it runs. I put a 'verbose' alert setting in, but I leave it off as the terminal can only hold so much data (plus it is a pita to get data out of the terminal), and writing that much data to terminal actually slows the program down a lot. Using wireshark to grab traces hasn't been helpful as there is no built-in way to compare two traces afaik, and I am not even sure wireshark records them in order of their arrival, or in their packet number order, etc. The new logs let me actually look into the program, and, boy, was this a good idea.
For starts, I can test my out-of-order hypothesis easily - I simply wrote another program to strip the logs to just data, compare the orders, and easily highlight the ones that are wrong.
Additionally, I have learned something else: I tried to write the program to be as thread safe as possible, where everything is atomic, etc. But I additionally learned (from the logs) that, under high load, sometimes things can execute in the wrong order. The problem seems to stem from the only being able to execute in one thread at a time, even though it is multithreaded. If one thread takes longer than expected on a job, then things can become a bit mixed up as the interpreter abandons the thread to jump to another thread, etc. I think I may need to implement processes (instead of threads) on mission-critical components, and I can't wait to test this. This seems like the 'hardest' lead I have so far.
I have also switched focus a bit - I am working on getting upnp working (what a royal pita this is, story later), and I am preparing to start getting a GUI up. I am also planning to experiment with XORing data, and setting diffserv flags - these may or may not be helpful, but I can't wait to test! (XORing data could help with a strange case where some older routers tamper with data, and diffserv *MAY* help the wider internet as a whole handle our data packets better.
The upnp story: I am trying to use a package called miniupnpc. It is a very streamlined package, with low overhead, fast execution, etc. The only catch is you have to compile it yourself. No problem right?
Well, the code writers originally wrote (and tested, and wrote documentation on) it on linux, using python version 2.7, in 32 bit. (This may be giving them too much credit, they originally wrote it for C++, python is barely supported). And, of course, I am using python, version 3.4, in 64 bits on windows. I finally got it to compile, but the linker that makes the python extension wont have any of this nonsense. After banging my head against a wall for nearly 3 hours, I am on the forums trying to get some assistance. What a nightmare.
-Stat[/quote]
Another GC archive:
Re: openGame Project Update Thread
So I got in a quick test with my brother. The output of the comparator is below. The left column is the packet numbers from my brother, and the right column is corresponding packets from my log.
0: NO MATCH
[Truncated: I hadn't actually connected to him by this point]
63: NO MATCH
64: 1
65: NO MATCH
66: 0
67: 0
68: 4
69: 23
70: 24
71: 26
72: 28
73: 29
74: 35
75: 36
76: 37
77: 38
78: 39
79: 40
80: NO MATCH
81: 41
82: 42
83: NO MATCH
84: 43
85: 44
86: 45
87: NO MATCH
88: 46
89: 47
90: 48
91: 49
92: 50
93: 51
94: 52
95: 53
96: 54
97: 55
98: 56
99: 57
100: 59
101: 60
102: 58
103: 65
104: 67
105: 68
106: 61
107: 62
108: 69
109: 70
110: 63
111: 64
112: 71
113: 72
114: 66
115: 73
116: 74
117: 75
118: 76
119: 78
120: 79
121: 81
122: 77
123: 84
124: 80
125: 86
126: 87
127: 88
128: 82
129: 83
130: 89
131: 90
132: 91
133: 85
134: 92
135: 93
136: 94
137: 95
138: 96
139: 98
140: 99
141: 97
142: 104
143: 106
144: 107
145: 100
146: 101
147: 108
148: 109
149: 102
150: 103
151: 110
152: 111
153: 105
154: 112
155: 113
156: 114
157: 115
158: 116
159: NO MATCH
I shortened the log in places where I hadn't connected to him yet, etc. Curiously, we do see quite a bit of packet coming in all sorts of orders (the missing packets in the middle of the log weren't data), which led to me doing a bunch of research how python handles threads vs processes, etc.
Long story short, looks like the mission-critical sections (waiting on data from sockets or the adapter) will need to be rewritten with processes instead. This is advantageous as we can give the processes elevated priority as well, and see if we don't see an improvement. Yet another upside is that the program will get even faster!
I am also studying deeper my use of File_Flag_No_Buffering, in addition to using raw sockets (since this may allow us to rule out one more source of possible issues, the TCPIP stack).
I am excited to test the results of this,
-Stat
0: NO MATCH
[Truncated: I hadn't actually connected to him by this point]
63: NO MATCH
64: 1
65: NO MATCH
66: 0
67: 0
68: 4
69: 23
70: 24
71: 26
72: 28
73: 29
74: 35
75: 36
76: 37
77: 38
78: 39
79: 40
80: NO MATCH
81: 41
82: 42
83: NO MATCH
84: 43
85: 44
86: 45
87: NO MATCH
88: 46
89: 47
90: 48
91: 49
92: 50
93: 51
94: 52
95: 53
96: 54
97: 55
98: 56
99: 57
100: 59
101: 60
102: 58
103: 65
104: 67
105: 68
106: 61
107: 62
108: 69
109: 70
110: 63
111: 64
112: 71
113: 72
114: 66
115: 73
116: 74
117: 75
118: 76
119: 78
120: 79
121: 81
122: 77
123: 84
124: 80
125: 86
126: 87
127: 88
128: 82
129: 83
130: 89
131: 90
132: 91
133: 85
134: 92
135: 93
136: 94
137: 95
138: 96
139: 98
140: 99
141: 97
142: 104
143: 106
144: 107
145: 100
146: 101
147: 108
148: 109
149: 102
150: 103
151: 110
152: 111
153: 105
154: 112
155: 113
156: 114
157: 115
158: 116
159: NO MATCH
I shortened the log in places where I hadn't connected to him yet, etc. Curiously, we do see quite a bit of packet coming in all sorts of orders (the missing packets in the middle of the log weren't data), which led to me doing a bunch of research how python handles threads vs processes, etc.
Long story short, looks like the mission-critical sections (waiting on data from sockets or the adapter) will need to be rewritten with processes instead. This is advantageous as we can give the processes elevated priority as well, and see if we don't see an improvement. Yet another upside is that the program will get even faster!
I am also studying deeper my use of File_Flag_No_Buffering, in addition to using raw sockets (since this may allow us to rule out one more source of possible issues, the TCPIP stack).
I am excited to test the results of this,
-Stat
Another GC archive:
Return to Ground Control - General
Who is online
Users browsing this forum: No registered users and 2 guests