openGame Project Update Thread

Post all general things in here.

Moderator: Moderators

Board-General
User avatar
Posts: 106
Joined: 26 Jan 2012 04:10

Re: openGame Project Update Thread

Postby Admiral Ghat » 01 Jul 2015 01:23

I think the general idea is that anyone would be able to host, without the GR quirks.

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 01 Jul 2015 04:52

Marcos wrote:Statrill,you can make For next GC version '' window mode '' GC has only '' full screen mode ''?
Marcos, although this is just a networking program, there are tools (like DxWnd) to do this, they don't work with GR while they will with openGame.

Sarvik wrote:Would it maybe be a good idea to do a bit larger testing (4+ people) for GC connection issues? Like replicating that 'connection order' issue with only 2 people probably wont work....
I was referring to the connection order in GR :P Though larger testing will be useful, soon.

I think I figured out what the problem was from going back and studying the traces again. I arbitrarily set the maximum buffer size at 1500 thinking that the largest packets would be 1492 bytes (I don't remember why now), but it turns out that GC tries to send packets that are 1512 bytes (literally the largest you can send on normal networks). I feel so silly, and I have fixed it, but haven't been able to test it yet.

In case anyone is curious as to how I noticed this (skip if you are not): in the trace Nox attempted to connect to my room, and I attempted to connect to his. When I would connect to his, I would send packets of 1512 size, but when I would be receiving the same packets from Nox, they would be truncated to 1500 bytes. Lol, oops. 12 bytes - these things matter! The buffer is now set to a more common 4096 bytes.

I can't wait to actually be playing on this with you guys!
-Cameron

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 24 Aug 2015 15:52

So quick update:

At some point last week, it really started irking me that I basically put so much effort into this to get it 95% working, and then haven't been around to finish the last mile, so to speak. The internals are actually reasonably impressive at this point - the time it takes for OG to pull a packet from one interface, handle it, and place it on another is about 3 ten-thousandths of a second - is a shame every week I let go by without finishing this. I did actually get on and vent a few times to no avail, fortunately my brother did manage to help me run a full trace from both ends (this was a better setup, as I could call him via phone and keep the traces clear of VoIP traffic).

So, it seems one of my hunches a good while ago was correct - the culprit seems to be fragmentation during encapsulation (causing a lot of problems: late packets, and even some packet loss). Discovering the issue is the hardest part - now I just need to test how effective various solutions are. The future steps to FINALLY releasing this:

1) confirm with my bro by sending deliberately oversize packets, and tracing from both endpoints
2) see if simply lowering the MTU fixes this (in most cases it would, but I have a small fear GC may use a hard-coded constant instead of actually pulling the MTU of the adapter)
3) I implemented header reconstruction a long time ago (allowing OG to simply send packets without headers, and put them back together again at the endpoint, so no oversize packets), but I disabled it as OG didn't need it at the time (and it was buggy, etc). This may ultimately be the long term fix, but, at the moment, only certain packets can be reconstructed (which is fine, as we can reconstruct data packets, the most important), and the implementation is buggy (and actually should cause a crash if the internal identifier isn't delivered, etc).

Speaking of, I need to fix the bug involving / improve general reliability of the internal identifier. The bug is common enough that Nox, etc has run into it, though the program usually doesn't hit it. Also, I need to make the program actually reannounce to trackers, but that is mega-easy.

Just an update,
-Stat

Board-General
User avatar
Posts: 134
Joined: 24 Jan 2012 03:16

Re: openGame Project Update Thread

Postby _nOx_ » 24 Aug 2015 23:51

Latest updates sound really promising. Thanks, man :0)
Shout if you need to test!

Beginner
User avatar
Posts: 5
Joined: 05 Jun 2015 18:01

Re: openGame Project Update Thread

Postby Vitor » 30 Sep 2015 16:01

What happened to the project? why so much delay for an update. :roll:

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 01 Oct 2015 09:09

Partially because I had become extremely busy (and the updates were taking about an hour to commit to repo, write, etc), and partially because I wasn't sure how many people were monitoring this thread. Forum activity seemed to wane a bit, so I moved on in private.

There have been massive speed and reliability improvements, amongst others - I will come back to do a proper update sometime later this week. I managed a test with 3 people (Ghat and Nox) a little while ago - the first time testing with more than 2 peers. We ran into a few bugs, one that was easily fixed during our session (misreading IPs from the initial tracker response), and another that required a bit of work (intelligent IP allocation in the event of collision).

I haven't finished the protocol for the second bug; fortunately I expected this would eventually occur and began thinking of a solution months ago. Currently, if you have 3 peers, but not all of them can 'see' each other, the program acts as if any failed peers don't exist, and assigns internal IP addresses likewise. Ideal behavior would have the program check with successful peers and coordinate internal IPs, and the next natural extension would have a 'flexible topology' (meaning that peers could talk to failed peers through successful peers).

I can update the github as well - I probably should be updating that regardless of whether or not anyone is viewing it.

Disregarding trouble with multiple peers and disconnections, GC still has trouble forming rooms over it, and at this point I am getting a bit discouraged. I have actually downloaded specialized point-to-point integrity testing tools, amongst other network tools, and the openGame tunnel passes the data without error, so I am looking more to the arcane now. I am starting to believe there may actually something non-standard with the way that GC handles connections, but yet we also know that network conditions affects this to some unknown degree. I am to the point where I am XORing the data to see if routers (the routers that form the internet, not your home router) may be tampering or discarding data, or if setting special packet flags helps me track down the problem.

More info on this later though, way too tired right now.
-Statrill

Board-General
User avatar
Posts: 105
Joined: 27 Jan 2012 17:21

Re: openGame Project Update Thread

Postby Sarvik » 01 Oct 2015 14:33

Maybe a dumb question, but wouldn't it be possible to somehow check how GR handles passing that data which you are having trouble with?

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 01 Oct 2015 20:58

Sarvik wrote:Maybe a dumb question, but wouldn't it be possible to somehow check how GR handles passing that data which you are having trouble with?
Yes, and I have managed to capture a few traces of this with the help of my brother and random people I find on Weekends. The catch is that the data is the same data GR is having trouble with - the room joining process.

The major issue (and what is majorly disappointing) is that I was originally writing openGame to solve the the GR problems, and ended up running into them myself. When I compare traces, they are similar. The host sends map data to the client, and then at some point the host decides to terminate the connection. It doesn't terminate in the same place every time, and sometimes it doesn't terminate and you join the room.

I originally was wondering if it was an openGame problem, but with some comparison testing I realized that I can't connect to anyone over GR either (which is part of why you guys don't see me as much anymore). That said, I am wondering what network (read:internet) conditions can cause some people to have better 'connections' than others, and what can cause certain people to be completely unable to connect with certain other people. I have been doing extensive research and testing with regards to network conditions and how GC handles it over a tunnel, etc, but I had to give it a break for life reasons earlier - this is the part that is actually incredibly time consuming.

As an aside, I find it strange that the host terminates the connection with the client while sending the client data - that seems almost like an bug of some sort. The only thing the client sends back are acknowledgement packets. It is actually kind of funny - it is almost like the host doesn't want the ACK packets. When joining a room fails, the host usually sends a FIN packet (a polite way to request that the connection be terminated) and asks for an ACK packet, and when the client sends it, the host sends a RST packet (a very harsh way to terminate a connection - about equivalent to telling the client to shut up).

-Stat

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 24 Oct 2015 18:07

Okay, so this update is rather large, as I hadn't even been updating the Github versions when I would make updates (oops)!

I am starting to get much more confident about this project. The most important change of the below is saving hard logs to disk in order of execution. Details on why below!

Changelog, from the Github commit: <-(This is a link, btw, to the changelog history)
-Program now saves logs, uses queues to ensure log order is preserved
-Randomly assigned signal socket is now within private ports allocated for loopback
-MAC address polled from adapter, rather than from watching network traffic
-Improvements to asyncIO
-File handle to adapter now opened with no-buffering flag set (need to investigate benefit of this)
-Improvements to thread-safety: all thread variables now use threading.local
-Streamlining in adapter read/write threads
-Fixed bug with indexing multiple returned peers from tracker
-Various small bugfixes


Many of the above are old changes that I just hadn't committed yet, like most of the threadsafety, asyncIO, etc. Most of the trouble I have been having for a while is that I don't actually know what is going on in the internals of the program when it runs. I put a 'verbose' alert setting in, but I leave it off as the terminal can only hold so much data (plus it is a pita to get data out of the terminal), and writing that much data to terminal actually slows the program down a lot. Using wireshark to grab traces hasn't been helpful as there is no built-in way to compare two traces afaik, and I am not even sure wireshark records them in order of their arrival, or in their packet number order, etc. The new logs let me actually look into the program, and, boy, was this a good idea.

For starts, I can test my out-of-order hypothesis easily - I simply wrote another program to strip the logs to just data, compare the orders, and easily highlight the ones that are wrong.

Additionally, I have learned something else: I tried to write the program to be as thread safe as possible, where everything is atomic, etc. But I additionally learned (from the logs) that, under high load, sometimes things can execute in the wrong order. The problem seems to stem from the only being able to execute in one thread at a time, even though it is multithreaded. If one thread takes longer than expected on a job, then things can become a bit mixed up as the interpreter abandons the thread to jump to another thread, etc. I think I may need to implement processes (instead of threads) on mission-critical components, and I can't wait to test this. This seems like the 'hardest' lead I have so far.

I have also switched focus a bit - I am working on getting upnp working (what a royal pita this is, story later), and I am preparing to start getting a GUI up. I am also planning to experiment with XORing data, and setting diffserv flags - these may or may not be helpful, but I can't wait to test! (XORing data could help with a strange case where some older routers tamper with data, and diffserv *MAY* help the wider internet as a whole handle our data packets better.


The upnp story: I am trying to use a package called miniupnpc. It is a very streamlined package, with low overhead, fast execution, etc. The only catch is you have to compile it yourself. No problem right?

Well, the code writers originally wrote (and tested, and wrote documentation on) it on linux, using python version 2.7, in 32 bit. (This may be giving them too much credit, they originally wrote it for C++, python is barely supported). And, of course, I am using python, version 3.4, in 64 bits on windows. I finally got it to compile, but the linker that makes the python extension wont have any of this nonsense. After banging my head against a wall for nearly 3 hours, I am on the forums trying to get some assistance. What a nightmare.

-Stat[/quote]

Board-General
User avatar
Posts: 303
Joined: 05 Mar 2012 06:32
Location: United States

Re: openGame Project Update Thread

Postby stAtrill » 25 Oct 2015 07:21

So I got in a quick test with my brother. The output of the comparator is below. The left column is the packet numbers from my brother, and the right column is corresponding packets from my log.

0: NO MATCH
[Truncated: I hadn't actually connected to him by this point]
63: NO MATCH
64: 1
65: NO MATCH
66: 0
67: 0
68: 4
69: 23
70: 24
71: 26
72: 28
73: 29
74: 35
75: 36
76: 37
77: 38
78: 39
79: 40
80: NO MATCH
81: 41
82: 42
83: NO MATCH
84: 43
85: 44
86: 45
87: NO MATCH
88: 46
89: 47
90: 48
91: 49
92: 50
93: 51
94: 52
95: 53
96: 54
97: 55
98: 56
99: 57
100: 59
101: 60
102: 58
103: 65
104: 67
105: 68
106: 61
107: 62
108: 69
109: 70
110: 63
111: 64
112: 71
113: 72
114: 66
115: 73
116: 74
117: 75
118: 76
119: 78
120: 79
121: 81
122: 77
123: 84
124: 80
125: 86
126: 87
127: 88
128: 82
129: 83
130: 89
131: 90
132: 91
133: 85
134: 92
135: 93
136: 94
137: 95
138: 96
139: 98
140: 99
141: 97
142: 104
143: 106
144: 107
145: 100
146: 101
147: 108
148: 109
149: 102
150: 103
151: 110
152: 111
153: 105
154: 112
155: 113
156: 114
157: 115
158: 116
159: NO MATCH


I shortened the log in places where I hadn't connected to him yet, etc. Curiously, we do see quite a bit of packet coming in all sorts of orders (the missing packets in the middle of the log weren't data), which led to me doing a bunch of research how python handles threads vs processes, etc.

Long story short, looks like the mission-critical sections (waiting on data from sockets or the adapter) will need to be rewritten with processes instead. This is advantageous as we can give the processes elevated priority as well, and see if we don't see an improvement. Yet another upside is that the program will get even faster!

I am also studying deeper my use of File_Flag_No_Buffering, in addition to using raw sockets (since this may allow us to rule out one more source of possible issues, the TCPIP stack).

I am excited to test the results of this,
-Stat

PreviousNext

Return to Ground Control - General

Who is online

Users browsing this forum: No registered users and 1 guest