[OpenVPN home] [Date Prev] [Date Index] [Date Next]
[OpenVPN mailing lists] [Thread Prev] [Thread Index] [Thread Next]
Google
 
Web openvpn.net

Re: [Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!


  • Subject: Re: [Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!
  • From: Denis Vlasenko <vda@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
  • Date: Sat, 29 May 2004 01:04:43 +0300

On Saturday 29 May 2004 00:15, James Yonan wrote:
> Denis,
>
> That looks like a possible bug in the coarse timer update logic.  The
> coarse timer deals with events scheduled at a resolution denoted by an
> integer number of seconds, such as pings.  A timeout of one year is used as
> kind of "effectively infinite" time interval.  If you see this large
> timeout, it means that the ping event is losing its place in the coarse
> timer event queue -- probably a bug.

Yes. It clearly happened here:

static void
check_coarse_timers_dowork (struct context *c)
{
  const struct timeval save = c->c2.timeval;
  c->c2.timeval.tv_sec = BIG_TIMEOUT;
  c->c2.timeval.tv_usec = 0;
  process_coarse_timers (c);
  c->c2.coarse_timer_wakeup = now + c->c2.timeval.tv_sec;

  msg (D_INTERVAL, "TIMER: coarse timer wakeup %d seconds", (int) c->c2.timeval.tv_sec);

  /* Is the coarse timeout NOT the earliest one? */
  if (c->c2.timeval.tv_sec > save.tv_sec)
    c->c2.timeval = save;
}

I can see that TIMER: ... message with 1 year worth of seconds printed.
Obviosly, process_coarse_timers (c) did not do it's job.
I instrumented it this way now:


process_coarse_timers (struct context *c)
{
#ifdef USE_CRYPTO
  /* flush current packet-id to file once per 60
     seconds if --replay-persist was specified */
  check_packet_id_persist_flush (c);
  msg(D_PING, "PING: process_coarse_timers 1 tv=t=%ld", (long)c->c2.timeval.tv_sec);
#endif

  /* process connection establishment items */
  check_connection_established (c);
  msg(D_PING, "PING: process_coarse_timers 2 tv=t=%ld", (long)c->c2.timeval.tv_sec);

#if P2MP
  /* see if we should send a push_request in response to --pull */
  check_push_request (c);
  msg(D_PING, "PING: process_coarse_timers 3 tv=t=%ld", (long)c->c2.timeval.tv_sec);
#endif

  /* process --route options */
  check_add_routes (c);
  msg(D_PING, "PING: process_coarse_timers 4 tv=t=%ld", (long)c->c2.timeval.tv_sec);

  /* possibly exit due to --inactive */
  check_inactivity_timeout (c);
  msg(D_PING, "PING: process_coarse_timers 5 tv=t=%ld", (long)c->c2.timeval.tv_sec);
  if (c->sig->signal_received)
    return;

  /* restart if ping not received */
  check_ping_restart (c);
  msg(D_PING, "PING: process_coarse_timers 6 tv=t=%ld", (long)c->c2.timeval.tv_sec);
  if (c->sig->signal_received)
    return;

  /* Should we send an OCC_REQUEST message? */
  check_send_occ_req (c);
  msg(D_PING, "PING: process_coarse_timers 7 tv=t=%ld", (long)c->c2.timeval.tv_sec);

  /* Should we send an MTU load test? */
  check_send_occ_load_test (c);
  msg(D_PING, "PING: process_coarse_timers 8 tv=t=%ld", (long)c->c2.timeval.tv_sec);

  /* Should we ping the remote? */
  check_ping_send (c);
  msg(D_PING, "PING: process_coarse_timers 9 tv=t=%ld", (long)c->c2.timeval.tv_sec);
}


Output:


Sat May 29 00:54:51 2004 RANDOM USEC=412366
Sat May 29 00:55:01 2004 PING: process_coarse_timers 1 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 2 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 3 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 4 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 5 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 6 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 7 tv=t=31536000
Sat May 29 00:55:01 2004 PING: process_coarse_timers 8 tv=t=31536000
Sat May 29 00:55:01 2004 EVENT event_timeout_wakeup (20/30)
Sat May 29 00:55:01 2004 PING: !event_timeout_trigger(1085781291,20)
Sat May 29 00:55:01 2004 PING: process_coarse_timers 9 tv=t=20

	okay, check_ping_send (c) have done it's job.

Sat May 29 00:55:01 2004 TIMER: coarse timer wakeup 20 seconds
Sat May 29 00:55:01 2004 RANDOM USEC=113731
Sat May 29 00:55:18 2004 RANDOM USEC=994739
Sat May 29 00:55:22 2004 PING: process_coarse_timers 1 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 2 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 3 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 4 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 5 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 6 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 7 tv=t=31536000
Sat May 29 00:55:22 2004 PING: process_coarse_timers 8 tv=t=31536000
Sat May 29 00:55:22 2004 PING: c->c2.to_link.len <> 0
Sat May 29 00:55:22 2004 PING: process_coarse_timers 9 tv=t=31536000
Sat May 29 00:55:22 2004 TIMER: coarse timer wakeup 31536000 seconds

	BOOM!

Sat May 29 00:55:22 2004 PING: select timeout=31536000
Sat May 29 00:55:22 2004 PING: select timeout=31536000
Sat May 29 00:55:22 2004 PING: select timeout=31536000
Sat May 29 00:55:22 2004 PING: select timeout=31536000
Sat May 29 00:55:22 2004 PING: select timeout=31536000

For your reference:

static inline void
check_ping_send (struct context *c)
{
  void check_ping_send_dowork (struct context *c);
  if (!c->options.ping_send_timeout) { msg (D_PING, "PING: !c->options.ping_send_timeout"); return; }
  if (c->c2.to_link.len) { msg (D_PING, "PING: c->c2.to_link.len <> 0"); return; }
  if (!event_timeout_trigger (&c->c2.ping_send_interval, &c->c2.timeval))
    { msg (D_PING, "PING: !event_timeout_trigger(%ld,%ld)",(long)(c->c2.ping_send_interval.last), (long)c->c2.timeval.tv_sec); return; }
    check_ping_send_dowork (c);
}

"if (c->c2.to_link.len)" case does not set timeout as it ought to do.

Is my analysis right?
--
vda