Merge lp:~chipaca/ubuntuone-client/lean-action-queue-commands into lp:ubuntuone-client

Proposed by John Lenton
Status: Merged
Approved by: Eric Casteleijn
Approved revision: 133
Merged at revision: not available
Proposed branch: lp:~chipaca/ubuntuone-client/lean-action-queue-commands
Merge into: lp:ubuntuone-client
Diff against target: None lines
To merge this branch: bzr merge lp:~chipaca/ubuntuone-client/lean-action-queue-commands
Reviewer Review Type Date Requested Status
Eric Casteleijn (community) Approve
John O'Brien (community) Approve
Review via email: mp+9931@code.launchpad.net

Commit message

improve worst-case memory use, mostly via __slots__

To post a comment you must log in.
Revision history for this message
John Lenton (chipaca) wrote :

This makes a lot of classes of which we have many instances start using __slots__, as well as some minor refactoring and compressing some large strings we store for debugging purposes. The end result is bringing down memory usage by ∼⅓ on a test run of disconnected creation of 10k files (of what's left, ∼⅓ is twisted, ∼⅓ is the syncdaemon itself, and ∼⅓ are the 10k operations).

Also included is a small fix for the .note and .trace loggers, which were broken.

129. By John Lenton

no longer needing functools (thanks jdo!)

130. By John Lenton

merged with trunk

Revision history for this message
John O'Brien (jdobrien) wrote :

Sadly I can't approve these changes as they break trunk.

Lots of errors: https://pastebin.canonical.com/20978/

review: Needs Fixing
Revision history for this message
Elliot Murphy (statik) wrote :

> Sadly I can't approve these changes as they break trunk.
>
> Lots of errors: https://pastebin.canonical.com/20978/

I discussed with John Lenton, these changes expose some broken tests on trunk, so we'll need to wait to land this branch until there is a corresponding trunk branch, and then coordinate the landing to minimize PQM interruption. This branch won't be included in the Aug 11 client rollout. Yay for huge memory use reduction though!

131. By John Lenton

merged with trunk

132. By John Lenton

adding new attributes to __slots__

133. By John Lenton

when refactoring a chain of callbacks, it pays to follow the chain of errbacks too.

Revision history for this message
John Lenton (chipaca) wrote :

It turns out the tests weren't broken; they were exposing brokenness in the client! wow, imagine that!

I believe I've fixed it, now.

Revision history for this message
John O'Brien (jdobrien) wrote :

This no longer breaks trunk! w00t!

review: Approve
Revision history for this message
Eric Casteleijn (thisfred) wrote :

Tests pass, code looks good. Out of curiosity: I've never used __slots__ before, or seen it used much, so for educational purposes: what does it do here? I understand it can be used to save memory in cases where you have a huge number of class instances. Is that the use case here, or is there another reason to use it? (please ignore if you don't have time, I'm really just curious, and this is not a polite/devious way to question the code ;)

review: Approve
Revision history for this message
Nicola Larosa (teknico) wrote :

> Out of curiosity: I've never used __slots__
> before, or seen it used much, so for educational purposes: what does it do
> here? I understand it can be used to save memory in cases where you have a
> huge number of class instances. Is that the use case here, or is there another
> reason to use it?

__slots__ forces the object attributes to be statically predefined instead of dynamic: it does away with __dict__, so it has the double effect to conserve memory and to block dynamic addition/deletion of attributes.

Revision history for this message
John Lenton (chipaca) wrote :

On Wed, Aug 12, 2009 at 01:44:55PM -0000, Eric Casteleijn wrote:
> Review: Approve
> Tests pass, code looks good. Out of curiosity: I've never used __slots__ before, or seen it used much, so for educational purposes: what does it do here? I understand it can be used to save memory in cases where you have a huge number of class instances. Is that the use case here, or is there another reason to use it? (please ignore if you don't have time, I'm really just curious, and this is not a polite/devious way to question the code ;)

in worst-case scnearios there can easily be tens of thousands of
instances of these classes. The test case which showed a 33%
improvement in memory use with this change was based on creating 10k
files while disconnected; memory usage is pretty much linear in that
case (you end up with 10k MakeFile 10k Upload objects). For reference,
the 'drivers' directory of the linux kernel is at ~9k files; a
checkout of the bazaar trunk is only ~1k files, so the impact of this
branch on that would be less.

Still, the worst case is the one we want to avoid killing your
computer, right?

Revision history for this message
Eric Casteleijn (thisfred) wrote :

Thanks Nicola and John. I see now that the merge proposal already answered my question, sorry for being oblivious. But thanks for expanding!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ubuntuone/syncdaemon/action_queue.py'
2--- ubuntuone/syncdaemon/action_queue.py 2009-07-30 18:23:53 +0000
3+++ ubuntuone/syncdaemon/action_queue.py 2009-08-08 19:03:32 +0000
4@@ -445,6 +445,7 @@
5 we can keep track of the number of bytes that have been written to
6 the store.
7 """
8+ __slots__ = ('fd', 'data_dict', 'n_bytes_read')
9 def __init__(self, fd, data_dict):
10 """
11 fd is the file-like object used for uploads. data_dict is the
12@@ -741,7 +742,11 @@
13 retryable_errors = set(['Cleaned up', 'TRY_AGAIN',
14 'Connection was closed cleanly.'])
15
16- start_done = False
17+ __slots__ = ('action_queue', 'start_done', 'running')
18+ def __init__(self, action_queue):
19+ self.action_queue = action_queue
20+ self.start_done = False
21+ self.running = False
22
23 def demark(self, *maybe_markers):
24 """
25@@ -848,11 +853,22 @@
26 else:
27 d = self._start()
28 d.addCallback(self.store_marker_result)
29- d.addCallback(lambda _: setattr(self, 'start_done', True))
30- d.addCallback(lambda _: self.log.debug('running'))
31- d.addCallback(lambda _: self.running and self._run())
32+ d.addCallback(self._ready_to_run)
33+ d.addCallback(self._done_running)
34+ return d
35+
36+ def _done_running(self, x):
37+ self.running = False
38+ return x
39+
40+ def _ready_to_run(self, _):
41+ self.log.debug('running')
42+ if self.running:
43+ d = self._run()
44+ else:
45+ d = defer.succeed(None)
46 d.addCallbacks(self.end_callback, self.end_errback)
47- d.addBoth(passit(lambda _: setattr(self, 'running', False)))
48+ d.addBoth(self._done_running)
49 return d
50
51 def handle_success(self, success):
52@@ -890,6 +906,7 @@
53 Base of metadata-related commands (commands that are queued in the
54 meta queue)
55 """
56+ __slots__ = ()
57 @property
58 def _queue(self):
59 """
60@@ -903,6 +920,7 @@
61 Base of content-related commands (commands that are queued in the
62 content queue)
63 """
64+ __slots__ = ()
65 @property
66 def _queue(self):
67 """
68@@ -915,8 +933,10 @@
69 """
70 Base of MakeFile and MakeDir
71 """
72+ __slots__ = ('share_id', 'parent_id', 'name', 'marker', 'log')
73+
74 def __init__(self, action_queue, share_id, parent_id, name, marker):
75- self.action_queue = action_queue
76+ super(MakeThing, self).__init__(action_queue)
77 self.share_id = share_id
78 self.parent_id = parent_id
79 # Unicode boundary! the name is Unicode in protocol and server, but
80@@ -939,6 +959,7 @@
81 """
82 self.share_id = share_id
83 self.parent_id = parent_id
84+ self.start_done = True
85
86 def _run(self):
87 """
88@@ -979,6 +1000,7 @@
89 """
90 Make a file
91 """
92+ __slots__ = ()
93 ok_event_name = 'AQ_FILE_NEW_OK'
94 error_event_name = 'AQ_FILE_NEW_ERROR'
95 client_method = 'make_file'
96@@ -988,6 +1010,7 @@
97 """
98 Make a directory
99 """
100+ __slots__ = ()
101 ok_event_name = 'AQ_DIR_NEW_OK'
102 error_event_name = 'AQ_DIR_NEW_ERROR'
103 client_method = 'make_dir'
104@@ -997,9 +1020,11 @@
105 """
106 Move a file or directory
107 """
108+ __slots__ = ('share_id', 'node_id', 'old_parent_id',
109+ 'new_parent_id', 'new_name', 'log')
110 def __init__(self, action_queue, share_id, node_id, old_parent_id,
111 new_parent_id, new_name):
112- self.action_queue = action_queue
113+ super(Move, self).__init__(action_queue)
114 self.share_id = share_id
115 self.node_id = node_id
116 self.old_parent_id = old_parent_id
117@@ -1025,6 +1050,7 @@
118 self.share_id = share_id
119 self.node_id = node_id
120 self.new_parent_id = new_parent_id
121+ self.start_done = True
122
123 def _run(self):
124 """
125@@ -1060,8 +1086,9 @@
126 """
127 Unlink a file or dir
128 """
129+ __slots__ = ('share_id', 'node_id', 'parent_id', 'log')
130 def __init__(self, action_queue, share_id, parent_id, node_id):
131- self.action_queue = action_queue
132+ super(Unlink, self).__init__(action_queue)
133 self.share_id = share_id
134 self.node_id = node_id
135 self.parent_id = parent_id
136@@ -1081,6 +1108,7 @@
137 self.share_id = share_id
138 self.node_id = node_id
139 self.parent_id = parent_id
140+ self.start_done = True
141
142 def _run(self):
143 """
144@@ -1113,12 +1141,13 @@
145 """
146 Ask about the freshness of server hashes
147 """
148+ __slots__ = ('log', 'items')
149 def __init__(self, action_queue, items):
150+ super(Query, self).__init__(action_queue)
151 self.log = MultiProxy(
152 [mklog(logger, '(unrolled) query', share, node,
153 share=share, node=node, hash=hash, index=i)
154 for (i, (share, node, hash)) in enumerate(items)])
155- self.action_queue = action_queue
156 self.items = items
157
158 def store_marker_result(self, items):
159@@ -1126,6 +1155,7 @@
160 Called when all the markers are realized.
161 """
162 self.items = items
163+ self.start_done = True
164
165 def _start(self):
166 """
167@@ -1169,8 +1199,9 @@
168 """
169 List shares shared to me
170 """
171+ __slots__ = ('log',)
172 def __init__(self, action_queue):
173- self.action_queue = action_queue
174+ super(ListShares, self).__init__(action_queue)
175 self.log = mklog(logger, 'list_shares', UNKNOWN, UNKNOWN)
176
177 def _run(self):
178@@ -1198,8 +1229,9 @@
179 """
180 Answer a share offer
181 """
182+ __slots__ = ('share_id', 'answer', 'log')
183 def __init__(self, action_queue, share_id, answer):
184- self.action_queue = action_queue
185+ super(AnswerShare, self).__init__(action_queue)
186 self.share_id = share_id
187 self.answer = answer
188 self.log = mklog(logger, 'answer_share', share_id, UNKNOWN)
189@@ -1234,9 +1266,11 @@
190 """
191 Offer a share to somebody
192 """
193+ __slots__ = ('node_id', 'share_to', 'name', 'access_level',
194+ 'marker', 'log', 'use_http')
195 def __init__(self, action_queue, node_id, share_to, name, access_level,
196 marker):
197- self.action_queue = action_queue
198+ super(CreateShare, self).__init__(action_queue)
199 self.node_id = node_id
200 self.share_to = share_to
201 self.name = name
202@@ -1253,6 +1287,7 @@
203 Called when all the markers are realized.
204 """
205 self.node_id = node_id
206+ self.start_done = True
207
208 def _start(self):
209 """
210@@ -1325,14 +1360,16 @@
211 error=failure.getErrorMessage())
212
213
214-class GetContentMixin(object):
215+class GetContentMixin(ActionQueueCommand):
216 """
217 Base for ListDir and Download. It's a mixin (ugh) because
218 otherwise things would be even more confusing
219 """
220+ __slots__ = ('share_id', 'node_id', 'server_hash', 'fileobj_factory',
221+ 'fileobj', 'gunzip', 'log')
222 def __init__(self, action_queue, share_id, node_id, server_hash,
223 fileobj_factory):
224- self.action_queue = action_queue
225+ super(GetContentMixin, self).__init__(action_queue)
226 self.share_id = share_id
227 self.node_id = node_id
228 self.server_hash = server_hash
229@@ -1356,6 +1393,7 @@
230 Called when all the markers are realized.
231 """
232 self.node_id = node_id
233+ self.start_done = True
234
235 def _run(self):
236 """
237@@ -1449,22 +1487,27 @@
238 """
239 Get a listing of a directory's contents
240 """
241+ __slots__ = ()
242
243 class Download(GetContentMixin, ActionQueueContentCommand):
244 """
245 Get the contents of a file.
246 """
247+ __slots__ = ()
248
249 class Upload(ActionQueueContentCommand):
250 """
251 Upload stuff to a file
252 """
253+ __slots__ = ('share_id', 'node_id', 'previous_hash', 'hash', 'crc32',
254+ 'size', 'fileobj_factory', 'tempfile_factory', 'deflated_size',
255+ 'tempfile', 'cancelled', 'upload_req', 'log')
256 retryable_errors = (ActionQueueContentCommand.retryable_errors
257 | set(['UPLOAD_IN_PROGRESS']))
258
259 def __init__(self, action_queue, share_id, node_id, previous_hash, hash,
260 crc32, size, fileobj_factory, tempfile_factory):
261- self.action_queue = action_queue
262+ super(Upload, self).__init__(action_queue)
263 self.share_id = share_id
264 self.node_id = node_id
265 self.previous_hash = previous_hash
266@@ -1521,7 +1564,7 @@
267 self.node_id))
268 self.node_id = node_id
269 self.action_queue.uploading[self.share_id, node_id] = uploading
270-
271+ self.start_done = True
272
273 def _run(self):
274 """
275
276=== modified file 'ubuntuone/syncdaemon/fsm/fsm.py'
277--- ubuntuone/syncdaemon/fsm/fsm.py 2009-07-02 20:22:51 +0000
278+++ ubuntuone/syncdaemon/fsm/fsm.py 2009-08-10 16:20:52 +0000
279@@ -469,6 +469,9 @@
280 get one of these. with the corresponding attributes for all sections
281 and event name.
282 """
283+ __slots__ = ('event', 'line', 'source',
284+ 'target', 'parameters', 'action_func')
285+
286 def __init__(self, event, line):
287 """Create a transition for event event from line.
288
289
290=== modified file 'ubuntuone/syncdaemon/logger.py'
291--- ubuntuone/syncdaemon/logger.py 2009-08-06 02:03:53 +0000
292+++ ubuntuone/syncdaemon/logger.py 2009-08-10 16:48:13 +0000
293@@ -23,6 +23,7 @@
294 import sys
295 import os
296 import re
297+import zlib
298 import xdg.BaseDirectory
299
300 from logging.handlers import TimedRotatingFileHandler
301@@ -40,9 +41,15 @@
302
303 class Logger(logging.Logger):
304 """Logger that support out custom levels."""
305+ def note(self, msg, *args, **kwargs):
306+ """log at NOTE level"""
307+ if self.isEnabledFor(NOTE):
308+ self._log(NOTE, msg, args, **kwargs)
309
310- note = functools.partial(logging.Logger.log, NOTE)
311- trace = functools.partial(logging.Logger.log, TRACE)
312+ def trace(self, msg, *args, **kwargs):
313+ """log at TRACE level"""
314+ if self.isEnabledFor(TRACE):
315+ self._log(TRACE, msg, args, **kwargs)
316
317 # use our logger as the default Logger class
318 logging.setLoggerClass(Logger)
319@@ -72,6 +79,7 @@
320 Create a logger that keeps track of the method where it's being
321 called from, in order to make more informative messages.
322 """
323+ __slots__ = ('logger', 'zipped_desc')
324 def __init__(self, _logger, _method, _share, _uid, *args, **kwargs):
325 # args are _-prepended to lower the chances of them
326 # conflicting with kwargs
327@@ -86,9 +94,15 @@
328 all_args.append("%s=%r" % (k, v))
329 args = ", ".join(all_args)
330
331- self.desc = "%-28s share:%-40r node:%-40r %s(%s) " \
332- % (_method, _share, _uid, _method, args)
333+ desc = "%-28s share:%-40r node:%-40r %s(%s) " % (_method, _share,
334+ _uid, _method, args)
335+ self.zipped_desc = zlib.compress(desc, 9)
336 self.logger = _logger
337+
338+ @property
339+ def desc(self):
340+ return zlib.decompress(self.zipped_desc)
341+
342 def _log(self, logger_func, *args):
343 """
344 Generalized form of the different logging functions.
345
346=== modified file 'ubuntuone/syncdaemon/main.py'
347--- ubuntuone/syncdaemon/main.py 2009-08-06 04:24:04 +0000
348+++ ubuntuone/syncdaemon/main.py 2009-08-08 19:03:32 +0000
349@@ -96,7 +96,7 @@
350
351 def log_mark(self):
352 """ log a "mark" that includes the current AQ state and queue size"""
353- self.logger.info("%s %s (state: %s; queues: metadata: %d; content: %d;"
354+ self.logger.note("%s %s (state: %s; queues: metadata: %d; content: %d;"
355 " hash: %d, fsm-cache: hit=%d miss=%d) %s" % ('-'*4,
356 'MARK', self.state.name,
357 len(self.action_q.meta_queue),
358
359=== modified file 'ubuntuone/syncdaemon/sync.py'
360--- ubuntuone/syncdaemon/sync.py 2009-08-04 20:00:44 +0000
361+++ ubuntuone/syncdaemon/sync.py 2009-08-08 19:03:32 +0000
362@@ -37,6 +37,7 @@
363 class FSKey(object):
364 """This encapsulates the problem of getting the metadata with different
365 keys."""
366+ __slots__ = ('fs', 'keys')
367
368 def __init__(self, fs, **keys):
369 """create"""
370@@ -204,6 +205,7 @@
371
372 class FileLogger(object):
373 """A logger that knows about the file and its state."""
374+ __slots__ = ('logger', 'key')
375
376 def __init__(self, logger, key):
377 """Create a logger for this guy"""

Subscribers

People subscribed via source and target branches